Zero-Downtime Deployments on Cloud Run: Startup Probes and CPU Boost
How proper startup probe configuration and CPU boost annotations eliminate 503 errors during Cloud Run revision rollouts for heavy AI services.
AI microservices built with FastAPI + Gunicorn typically need 15-30 seconds to initialize: loading ML model artifacts, establishing database connection pools, warming Elasticsearch clients, and initializing OpenTelemetry instrumentation. Without proper configuration, Cloud Run routes traffic to the new revision before it's ready, causing a cascade of 503 errors during every deployment, frustrating users and triggering false-positive alerts.
The Fix: Startup Probes + CPU BoostTwo Cloud Run annotations solve this entirely. run.googleapis.com/startup-cpu-boost: "true" temporarily allocates additional CPU during container initialization, cutting startup time in half. The startupProbe defines an HTTP health check that must succeed before traffic is routed — Cloud Run will attempt the probe every 5 seconds for up to 24 failures (2 minutes) before giving up, ensuring only fully-initialized containers receive traffic.
# Production service.yaml — startup configuration
spec:
template:
metadata:
annotations:
run.googleapis.com/startup-cpu-boost: "true" # Extra CPU during init
run.googleapis.com/cpu-throttling: "true" # Throttle between requests
spec:
containerConcurrency: 60
timeoutSeconds: 300
containers:
- name: my-ai-service
resources:
limits:
cpu: "1"
memory: 1Gi
startupProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 24 # Total window: 5 + (24 × 5) = 125s
For latency-sensitive services like the Knowledge Base API, we also set run.googleapis.com/minScale: "1" to keep at least one instance always warm. Combined with maxScale: "5", this ensures the service handles baseline traffic without cold starts while still scaling for bursts. The trade-off is cost — a minimum instance is always billed — but for services that back real-time AI agents, the reliability is worth every penny.
Properly tuned startup probes are the difference between a seamless deployment and a page from your on-call rotation.Lessons Learned: The Startup Timeout Trap
A subtle gotcha when configuring startup probes is neglecting the database migration step. If your build pipeline runs database migrations as part of the container startup command (e.g. alembic upgrade head && gunicorn...), a slow migration or database lock will cause the startup probe to time out repeatedly, forcing Cloud Run to mark the container as unhealthy and restart it. This creates a crash loop that prevents the new revision from ever deploying. The lesson: always run database migrations as a separate step (like a Cloud Run Job or pre-deploy hook) rather than inside the application container startup process.