Gradient

Git commits for ML training

Stop restarting from scratch. Snapshot, rewind, and fork your training runs so iteration becomes commit + branch + compare.

The problem with ML training today

Training a model is a long, expensive, stochastic process. When your team discovers a late-stage issue like bad data, leakage, or a buggy transform, iteration breaks.

01

Days of compute wasted

Start from scratch every time you need to test a change.

02

Restart and pray

No way to recover or fork from a specific training state.

03

Blocked iteration

Cannot quickly experiment with different approaches.

The missing version-control layer

With Gradient, you can take a state-complete snapshot at step t, then rewind and fork from that exact moment - transforming iteration into a git-like workflow.

Before Gradient

xDiscover issue at step 50,000
xDelete everything
xRestart training for 3 days
xHope it works this time

With Gradient

Snapshot at step 50,000
Rewind to any previous step
Fork and test multiple changes
Compare results side by side

State-complete snapshots

The technical core is what we mean by "snapshot." It is not just weights.

A Gradient snapshot includes the full training state that actually determines the trajectory:

Model weights

Current parameters of your neural network.

Optimizer state

Momentum, velocity, and adaptive learning rates.

Scheduler position

Current learning rate schedule state.

Dataloader and sampler state

Exact position in your training data.

RNG state

Random number generator state for reproducible forks.

This is what makes forks meaningfully comparable. You are changing one variable at a time from the same exact starting point, turning training into a controlled experiment rather than a prayer.

Stop burning compute on restarts

Join ML teams who are iterating faster with version-controlled training runs.