PyTorch Training#
Kinetic can run PyTorch workloads on cloud GPUs. Since Kinetic executes arbitrary Python functions remotely, any PyTorch code that runs locally will run the same way on a provisioned GPU node.
Setup#
Add torch to your project’s requirements.txt:
torch
torchvision
Kinetic will install these in the remote container automatically. See Managing Dependencies for details on how dependency detection works.
Basic Usage#
import kinetic
@kinetic.run(accelerator="gpu-l4")
def train():
import torch
import torch.nn as nn
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Training on: {device}")
# Simple feedforward network
model = nn.Sequential(
nn.Linear(10, 64),
nn.ReLU(),
nn.Linear(64, 1),
).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()
# Dummy data
x = torch.randn(512, 10, device=device)
y = torch.randn(512, 1, device=device)
for epoch in range(20):
pred = model(x)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch % 5 == 0:
print(f"epoch {epoch}: loss={loss.item():.4f}")
return loss.item()
final_loss = train()
Multi-GPU Training#
For nodes with multiple GPUs, use torch.nn.DataParallel to split batches across devices.
import kinetic
@kinetic.run(accelerator="gpu-a100x4")
def train_multi_gpu():
import torch
import torch.nn as nn
device = torch.device("cuda")
print(f"GPUs available: {torch.cuda.device_count()}")
model = nn.Sequential(
nn.Linear(10, 128),
nn.ReLU(),
nn.Linear(128, 1),
)
model = nn.DataParallel(model).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()
x = torch.randn(2048, 10, device=device)
y = torch.randn(2048, 1, device=device)
for epoch in range(20):
pred = model(x)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return loss.item()
GPU Selection#
See Accelerator Support for the full list of GPUs, multi-GPU counts, and TPU configurations.
Use spot=True to reduce costs for fault-tolerant workloads:
@kinetic.run(accelerator="gpu-a100", spot=True)
def train():
...