Getting Started

Getting Started#

Install Kinetic, point it at a cluster, and run your first remote function. If your team has already provisioned a Kinetic cluster, skip ahead to Run your first job.

Prerequisites#

Python 3.11+.
uv, used for the install command below.
Google Cloud SDK (gcloud): install guide.
kubectl: install guide. Kinetic auto-installs the gke-gcloud-auth-plugin for you on first use, but kubectl itself must already be on your PATH.
A Google Cloud project with billing enabled.

Authenticate with Google Cloud once:

gcloud auth login
gcloud auth application-default login

Install#

uv pip install keras-kinetic

This installs the @kinetic.run() decorator and the kinetic CLI, which provisions and manages infrastructure (Pulumi and the GCP plugins).

Note: The Pulumi CLI (used for infrastructure provisioning) is bundled and managed automatically. It will be installed to ~/.kinetic/pulumi on first use if not already present.

Set up your environment#

kinetic init

kinetic init checks your local tools, auth, and project, then routes you down one of two paths:

Join — if any Kinetic clusters already exist in this GCP project (provisioned by you or a teammate), init lists them, lets you pick one, and configures kubectl for it. Cluster discovery reads the project’s shared state bucket (gs://{project}-kinetic-state), so collaborators with access to the bucket all see the same set.
Create — if no clusters exist yet, init calls kinetic up to enable APIs, provision a GKE cluster with an accelerator node pool, and wire up Docker / kubectl access.

Either way, init ends by saving a profile and making it active. A profile is your saved infrastructure context like project, zone, cluster, and namespace - persisted at ~/.kinetic/profiles.json. The active profile is what every kinetic command and every @kinetic.run() invocation targets, so you don’t need to export env vars or pass --project / --zone / --cluster on the command line. Switch contexts with kinetic profile use <profile-name>, and see what’s saved with kinetic profile ls.

Cleanup reminder: when you’re done, run kinetic down to tear down all resources and stop incurring costs. See the CLI Reference for the full set of commands.

Sharing infrastructure with teammates? Kinetic stores Pulumi state in a per-project GCS bucket (gs://{project}-kinetic-state), so any teammate with roles/storage.objectAdmin on the bucket sees the same stack. The first kinetic up creates the bucket; the first admin needs roles/storage.admin on the project. See Pulumi state for the full IAM story.

Run your first job#

import kinetic


@kinetic.run(accelerator="tpu-v5litepod-1")
def train_fashion_mnist():
  import keras
  import numpy as np

  # Load and preprocess the Fashion MNIST dataset
  (x_train, y_train), (x_test, y_test) = (
    keras.datasets.fashion_mnist.load_data()
  )
  x_train = x_train.astype("float32") / 255.0
  x_test = x_test.astype("float32") / 255.0
  x_train = np.expand_dims(x_train, -1)
  x_test = np.expand_dims(x_test, -1)

  # Build a simple convolutional model
  model = keras.Sequential(
    [
      keras.layers.Input(shape=(28, 28, 1)),
      keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
      keras.layers.MaxPooling2D(pool_size=(2, 2)),
      keras.layers.Flatten(),
      keras.layers.Dense(128, activation="relu"),
      keras.layers.Dense(10, activation="softmax"),
    ]
  )

  model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer="adam",
    metrics=["accuracy"],
  )

  # Train for a few epochs on the remote TPU
  model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

  # Evaluate and return results
  score = model.evaluate(x_test, y_test, verbose=0)
  return f"Test loss: {score[0]:.4f}, Test accuracy: {score[1]:.4f}"


if __name__ == "__main__":
  result = train_fashion_mnist()
  print(result)

Run it:

python fashion_mnist.py

Note

Expected timing:

First run: ~5 minutes. The slow part is the first container build via Cloud Build, which freezes your dependencies into an image tagged by their hash.
Subsequent runs (same dependencies): under a minute. The cached image is reused; only your code changes get re-uploaded.
Subsequent runs (changed dependencies): ~5 minutes again, since a new hash forces a fresh build.

Tip

Recommended defaults:

Stay in bundled mode (the default — you don’t need to pass container_image=). It’s the only mode that works without publishing your own base image.
Use direct calls to @kinetic.run() decorated functions while you’re iterating; switch to calling run_async() once your jobs run for more than a few minutes and you’d rather not block your local shell.
Write any artifacts you want to keep under KINETIC_OUTPUT_DIR, not under /tmp.

Next steps#

After your first run works, the most useful follow-ups are:

Examples: a catalog of runnable scripts that cover async jobs, data, checkpoints, parallel sweeps, and LLM fine-tuning. The fastest way to see real patterns end to end.
Execution Modes: bundled vs prebuilt vs custom image, and when to switch.
Detached Jobs: run_async(), reattach, and the job lifecycle for long-running work.
Data and Checkpointing: kinetic.Data(...) for inputs and KINETIC_OUTPUT_DIR for durable outputs and resumable checkpoints.