Getting Started#
Install Kinetic, point it at a cluster, and run your first remote function. If your team has already provisioned a Kinetic cluster, skip ahead to Run your first job.
Prerequisites#
Python 3.11+.
uv, used for the install command below.
Google Cloud SDK (
gcloud): install guide.kubectl: install guide. Kinetic auto-installs thegke-gcloud-auth-pluginfor you on first use, butkubectlitself must already be on yourPATH.A Google Cloud project with billing enabled.
Authenticate with Google Cloud once:
gcloud auth login
gcloud auth application-default login
Install#
uv pip install "keras-kinetic[cli]"
The base keras-kinetic package installs the @kinetic.run()
decorator. The [cli] extra adds the dependencies the kinetic CLI
needs to provision and manage infrastructure (Pulumi and the GCP
plugins). We recommend installing the [cli] extra even if you’re only
submitting jobs against an already-provisioned cluster — the CLI makes
troubleshooting and job management much easier. Drop it only if you’re
sure you won’t need CLI access.
Note: The Pulumi CLI (used for infrastructure provisioning) is bundled and managed automatically. It will be installed to
~/.kinetic/pulumion first use if not already present.
Set up your environment#
kinetic init
kinetic init checks your local tools, auth, and project, then routes
you down one of two paths:
Join — if any Kinetic clusters already exist in this GCP project (provisioned by you or a teammate),
initlists them, lets you pick one, and configureskubectlfor it. Cluster discovery reads the project’s shared state bucket (gs://{project}-kinetic-state), so collaborators with access to the bucket all see the same set.Create — if no clusters exist yet,
initcallskinetic upto enable APIs, provision a GKE cluster with an accelerator node pool, and wire up Docker /kubectlaccess.
Either way, init ends by saving a profile and making it active.
A profile is your saved infrastructure context like project, zone,
cluster, and namespace - persisted at ~/.kinetic/profiles.json. The
active profile is what every kinetic command and every
@kinetic.run() invocation targets, so you don’t need to export env vars or pass --project / --zone / --cluster on
the command line. Switch contexts with kinetic profile use <profile-name>, and
see what’s saved with kinetic profile ls.
Cleanup reminder: when you’re done, run
kinetic downto tear down all resources and stop incurring costs. See the CLI Reference for the full set of commands.
Sharing infrastructure with teammates? Kinetic stores Pulumi
state in a per-project GCS bucket (gs://{project}-kinetic-state),
so any teammate with roles/storage.objectAdmin on the bucket sees
the same stack. The first kinetic up creates the bucket; the first
admin needs roles/storage.admin on the project. See
Pulumi state for the full IAM story.
Run your first job#
import kinetic
@kinetic.run(accelerator="tpu-v5litepod-1")
def train_fashion_mnist():
import keras
import numpy as np
# Load and preprocess the Fashion MNIST dataset
(x_train, y_train), (x_test, y_test) = (
keras.datasets.fashion_mnist.load_data()
)
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
# Build a simple convolutional model
model = keras.Sequential(
[
keras.layers.Input(shape=(28, 28, 1)),
keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(128, activation="relu"),
keras.layers.Dense(10, activation="softmax"),
]
)
model.compile(
loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"],
)
# Train for a few epochs on the remote TPU
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)
# Evaluate and return results
score = model.evaluate(x_test, y_test, verbose=0)
return f"Test loss: {score[0]:.4f}, Test accuracy: {score[1]:.4f}"
if __name__ == "__main__":
result = train_fashion_mnist()
print(result)
Run it:
python fashion_mnist.py
Note
Expected timing:
First run: ~5 minutes. The slow part is the first container build via Cloud Build, which freezes your dependencies into an image tagged by their hash.
Subsequent runs (same dependencies): under a minute. The cached image is reused; only your code changes get re-uploaded.
Subsequent runs (changed dependencies): ~5 minutes again, since a new hash forces a fresh build.
Tip
Recommended defaults:
Stay in bundled mode (the default — you don’t need to pass
container_image=). It’s the only mode that works without publishing your own base image.Use direct calls to
@kinetic.run()decorated functions while you’re iterating; switch to callingrun_async()once your jobs run for more than a few minutes and you’d rather not block your local shell.Write any artifacts you want to keep under
KINETIC_OUTPUT_DIR, not under/tmp.
Next steps#
After your first run works, the most useful follow-ups are:
Examples: a catalog of runnable scripts that cover async jobs, data, checkpoints, parallel sweeps, and LLM fine-tuning. The fastest way to see real patterns end to end.
Execution Modes: bundled vs prebuilt vs custom image, and when to switch.
Detached Jobs:
run_async(), reattach, and the job lifecycle for long-running work.Data and Checkpointing:
kinetic.Data(...)for inputs andKINETIC_OUTPUT_DIRfor durable outputs and resumable checkpoints.