Examples#
A catalog of runnable example scripts using Kinetic. Every example below is rendered directly on this site and is also available as a raw Python script in the GitHub repository.
Tier badges:
Quickstart: your first run. Minimal setup, sensible defaults.
Core: the everyday product surface: async jobs, data, checkpoints, parallel sweeps.
Advanced: multi-host Pathways jobs, LLM fine-tuning, anything that needs special quota or external credentials.
To run any example: clone the repo, install Kinetic, set KINETIC_PROJECT,
and python examples/<file>.py.
git clone https://github.com/keras-team/kinetic.git
cd kinetic
uv pip install -e .
export KINETIC_PROJECT="your-project-id"
python examples/fashion_mnist.py
Quickstart#
The first thing to run after kinetic up. A small Keras classifier on
Fashion-MNIST that confirms your cluster can schedule a TPU pod and
stream a real result back to your shell.
The cheapest sanity check there is. Keras-on-JAX on a CPU node — no accelerator quota needed, useful for verifying your install before you ask for hardware.
Core#
Walks through every part of the detached-job API end-to-end: run_async(),
status()/tail()/result(), reattach from another shell with
kinetic.attach(), and enumerate jobs with list_jobs().
Wrap a local directory in kinetic.Data(...) and let it land as a
plain filesystem path on the remote — your training code doesn’t have
to know whether the bytes started on your laptop or in GCS.
JAX training that picks up where it left off. Writes Orbax checkpoints
to KINETIC_OUTPUT_DIR and proves the resume path by relaunching the
same function and seeing it skip already-completed steps.
Auto-resumable Keras training. Round-trips model.get_weights() through
Orbax so a restarted job picks up at the right step without any custom
save/load code.
Fan out a grid of jobs with run_async_map(), batch submissions to keep
the cluster happy, and gather results — including how to handle the
job that inevitably fails halfway through.
One driver script that successively schedules work on CPU, TPU, and GPU pools — handy for verifying which hardware your cluster will actually serve.
Advanced#
The reference for scaling beyond a single TPU host. A short JAX program that verifies cross-host collectives are actually wired up before you trust them with a real workload.
End-to-end SFT of Gemma 2B with LoRA across multiple TPU hosts. The realistic LLM workload to model your own fine-tuning runs after — pulls weights from Kaggle and runs on Pathways.
Compact Gemma 3 1B SFT on a single TPU. A good baseline for getting an LLM workload running before scaling out to Pathways, and a worked example of forwarding Kaggle credentials into the remote pod.
SFT of Gemma 3 with LoRA/QLoRA on TPU v5litepod. Demonstrates how to run the Tunix SFT script on a remote cluster with environment variable capture for credentials.