Cloud MLOps

Minimizing Downtime During Cloud Run Revisions

Ensure zero-downtime deployments for your AI services on Cloud Run by configuring startup probes and CPU boosts.

6 min read
The Cold Start Problem

When deploying new revisions of heavy AI applications, cold starts can lead to 503 errors if the load balancer routes traffic before the application is fully ready. To mitigate this, Cloud Run provides essential configuration options.

Startup Probes & CPU Boost

By configuring a custom HTTP startup probe, Cloud Run will hold traffic until the container successfully responds to the health check. Additionally, enabling Startup CPU Boost allocates more CPU during initialization, drastically reducing the time it takes for heavy frameworks (like FastAPI or Gunicorn) to spin up.

yaml
annotations:
  run.googleapis.com/startup-cpu-boost: "true"
# ...
startupProbe:
  timeoutSeconds: 5
  periodSeconds: 5
  failureThreshold: 24
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
Properly tuned startup probes are the secret to seamless traffic migration in serverless environments.

More Recent Posts