Preempted spot instance? Recovery starts in seconds.
Use lower-cost spot capacity without accepting brittle operations. Recovery behavior is explicit and policy-driven.
Recovery Timeline
Detect (~0-30s)
Pulserun polls and receives preemption signals, typically detecting interruption within 30 seconds.
Select replacement (~30-60s)
Matching capacity is selected using your policy: budget, region, spot/on-demand preference, and GPU type.
Provision + restore (~1-3 min)
New instance boots, configuration applies, and latest checkpoint is restored automatically.
Resume workload
Training/inference resumes with minimal manual intervention and no full-run reset.
Example outcome: reliability
A preempted 12-hour training run resumes from checkpoint instead of restarting from zero, protecting model progress and team time.
Example outcome: economics
Teams can hold spot-first posture for steady savings while retaining near on-demand continuity for critical pipelines.
Need recurring runs with the same protection?
See Scheduled Jobs behavior →