Examples

Examples#

A catalog of runnable example scripts using Kinetic. Every example below is rendered directly on this site and is also available as a raw Python script in the GitHub repository.

Tier badges:

Quickstart: your first run. Minimal setup, sensible defaults.
Core: the everyday product surface: async jobs, data, checkpoints, parallel sweeps.
Advanced: multi-host Pathways jobs, LLM fine-tuning, anything that needs special quota or external credentials.

To run any example: clone the repo, install Kinetic, set KINETIC_PROJECT, and python examples/<file>.py.

git clone https://github.com/keras-team/kinetic.git
cd kinetic
uv pip install -e .
export KINETIC_PROJECT="your-project-id"
python examples/fashion_mnist.py

Quickstart#

Fashion-MNIST on a TPU

The first thing to run after kinetic up. A small Keras classifier on Fashion-MNIST that confirms your cluster can schedule a TPU pod and stream a real result back to your shell.

Keras TPU

examples/fashion_mnist.md

Keras + JAX smoke test

The cheapest sanity check there is. Keras-on-JAX on a CPU node — no accelerator quota needed, useful for verifying your install before you ask for hardware.

Keras JAX CPU

examples/simple_demo.md

Core#

Submit, monitor, and reattach

Walks through every part of the detached-job API end-to-end: run_async(), status()/tail()/result(), reattach from another shell with kinetic.attach(), and enumerate jobs with list_jobs().

Async Reattach

examples/example_async_jobs.md

Ship local files into the job

Wrap a local directory in kinetic.Data(...) and let it land as a plain filesystem path on the remote — your training code doesn’t have to know whether the bytes started on your laptop or in GCS.

Data GCS

examples/example_data_api.md

Resumable JAX training with Orbax

JAX training that picks up where it left off. Writes Orbax checkpoints to KINETIC_OUTPUT_DIR and proves the resume path by relaunching the same function and seeing it skip already-completed steps.

JAX Checkpointing Orbax

examples/example_checkpoint.md

Resumable Keras training

Auto-resumable Keras training. Round-trips model.get_weights() through Orbax so a restarted job picks up at the right step without any custom save/load code.

Keras Checkpointing Orbax

examples/example_keras_checkpoint.md

Parallel hyperparameter sweep

Fan out a grid of jobs with run_async_map(), batch submissions to keep the cluster happy, and gather results — including how to handle the job that inevitably fails halfway through.

Sweep Parallel

examples/example_collections.md

Mix accelerators in one driver

One driver script that successively schedules work on CPU, TPU, and GPU pools — handy for verifying which hardware your cluster will actually serve.

Multi-accelerator Cluster

examples/example_gke.md

Advanced#

Multi-host JAX on Pathways

The reference for scaling beyond a single TPU host. A short JAX program that verifies cross-host collectives are actually wired up before you trust them with a real workload.

JAX Pathways Distributed

examples/pathways_example.md

Distributed Gemma 2B fine-tune

End-to-end SFT of Gemma 2B with LoRA across multiple TPU hosts. The realistic LLM workload to model your own fine-tuning runs after — pulls weights from Kaggle and runs on Pathways.

LLM Pathways Distributed

examples/gemma_sft_pathways_distributed.md

Single-TPU Gemma 3 fine-tune

Compact Gemma 3 1B SFT on a single TPU. A good baseline for getting an LLM workload running before scaling out to Pathways, and a worked example of forwarding Kaggle credentials into the remote pod.

LLM TPU

examples/gemma3_sft_demo.md

Tunix SFT Example

SFT of Gemma 3 with LoRA/QLoRA on TPU v5litepod. Demonstrates how to run the Tunix SFT script on a remote cluster with environment variable capture for credentials.

LLM TPU LoRA

examples/tunix_sft.md

Examples

Contents

Examples#

Quickstart#

Core#

Advanced#

Related pages#