Cloud
MLOps
Minimizing Downtime During Cloud Run Revisions
Ensure zero-downtime deployments for your AI services on Cloud Run by configuring startup probes and CPU boosts.
•
6 min read
The Cold Start Problem
When deploying new revisions of heavy AI applications, cold starts can lead to 503 errors if the load balancer routes traffic before the application is fully ready. To mitigate this, Cloud Run provides essential configuration options.
Startup Probes & CPU BoostBy configuring a custom HTTP startup probe, Cloud Run will hold traffic until the container successfully responds to the health check. Additionally, enabling Startup CPU Boost allocates more CPU during initialization, drastically reducing the time it takes for heavy frameworks (like FastAPI or Gunicorn) to spin up.
yaml
annotations:
run.googleapis.com/startup-cpu-boost: "true"
# ...
startupProbe:
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 24
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
Properly tuned startup probes are the secret to seamless traffic migration in serverless environments.