One policy model for cost, uptime, and execution control

Autopilot is the Phase 3 control plane: it coordinates migration triggers, spot recovery behavior, and schedule guardrails automatically.

How It Works

Step 1

Define Policy

Set budget cap, migration threshold, spot preference, and reliability rules for each workload class.

Step 2

Launch + Route

Autopilot selects matching capacity and launches according to your policy constraints.

Step 3

Enforce Continuously

Policy remains active after launch: optimize spend, recover preemptions, and respect schedule/cost limits.

Policy Inputs & Outcomes

Threshold-based (default 15%)

Migration trigger

Detection ~30s + automated restore

Spot recovery

Cron launch + max time/cost

Schedule controls

Example outcome: cost savings

A team spending $2,000/month on GPU training can often recover 20–35% by applying thresholded migration and spot-first policy where safe.

Example outcome: reliability

With checkpoint + recovery policy enabled, spot preemptions become recoverable events instead of full incident escalations.

Use Cases

ML Training

Long-running training with migration + spot recovery to reduce spend while protecting progress.

Inference Serving

Reliability-first policy for serving paths, with optional cost optimization in background.

Batch Processing

Schedule-bound pipelines with strict run and cost ceilings for predictable monthly spend.

Ready to get started?

Create a free account and launch your first GPU instance in minutes.