One policy model for cost, uptime, and execution control

Autopilot is the Phase 3 control plane: it coordinates migration triggers, spot recovery behavior, and schedule guardrails automatically.

How It Works

Step 1

Set budget cap, migration threshold, spot preference, and reliability rules for each workload class.

Step 2

Autopilot selects matching capacity and launches according to your policy constraints.

Step 3

Policy remains active after launch: optimize spend, recover preemptions, and respect schedule/cost limits.

Threshold-based (default 15%)

Migration trigger

Detection ~30s + automated restore

Spot recovery

Cron launch + max time/cost

Schedule controls

A team spending $2,000/month on GPU training can often recover 20–35% by applying thresholded migration and spot-first policy where safe.

With checkpoint + recovery policy enabled, spot preemptions become recoverable events instead of full incident escalations.

Long-running training with migration + spot recovery to reduce spend while protecting progress.

Reliability-first policy for serving paths, with optional cost optimization in background.

Schedule-bound pipelines with strict run and cost ceilings for predictable monthly spend.

Create a free account and launch your first GPU instance in minutes.