Cloud MLOps

Targeted Log Routing with GCP Log Sinks and Terraform

How to use Terraform-managed GCP Log Sinks to surgically route specific AI agent events — chat completions, LLM invocations, errors — to BigQuery for analytics and audit, without the cost of shipping everything.

12 min read
The Problem: AI Agents Generate Too Many Logs

A single AI Agent chat session generates hundreds of log entries: HTTP requests, LLM API calls, embedding generations, retrieval queries, token counts, and response formatting. Sending everything to BigQuery would be prohibitively expensive and mostly noise. What we actually need is surgical extraction of specific events: successful chat completions (for analytics), LLM invocation metrics (for cost tracking), and error events (for debugging). This keeps storage costs low and search speeds high.

Terraform-Driven Log Sink Configuration

Our Terraform module uses a for_each loop over a map of log sinks, each with its own filter expression. The filter language uses GCP's SEARCH() function to match specific log messages, combined with severity filters and resource label selectors to target individual Cloud Run services. Each sink gets a unique writer identity that's automatically granted roles/bigquery.dataEditor on the destination dataset, ensuring secure compliance.

hcl
# envs/prod.tfvars — Surgical log extraction
log_sinks = {
  "AGENT_CHAT_RAW_LOG" = {
    description            = "Chat completion events for analytics"
    destination            = "bigquery.googleapis.com/projects/my-project/datasets/agent_raw_log"
    filter                 = <<EOT
(
  SEARCH("agent chat completed") OR
  (SEARCH("unexpected agent error") AND severity = "ERROR")
)
AND trace:*
AND resource.labels.configuration_name="my-ai-agent-service"
EOT
    unique_writer_identity = true
    use_partitioned_tables = true
  }
  "AGENT_LLM_RAW_LOG" = {
    description            = "LLM invocation logs for cost tracking"
    destination            = "bigquery.googleapis.com/projects/my-project/datasets/agent_raw_log"
    filter                 = <<EOT
SEARCH("llm invocation completed")
AND resource.labels.configuration_name="my-ai-agent-service"
EOT
    unique_writer_identity = true
    use_partitioned_tables = true
  }
}
Auto-Provisioned IAM for Sink Writers

The Terraform module automatically detects which sinks target BigQuery (using a regex against the destination URL) and creates the necessary IAM binding for the sink's writer identity. This means adding a new log sink to the tfvars file is all you need — no manual IAM configuration, no forgotten permissions, no broken pipelines. It enforces pure security as code.

hcl
resource "google_project_iam_member" "sink_bigquery_data_editor" {
  for_each = {
    for k, v in var.log_sinks : k => v
    if length(regexall("^bigquery\\.googleapis\\.com", v.destination)) > 0
  }
  project = var.project_id
  role    = "roles/bigquery.dataEditor"
  member  = google_logging_project_sink.log_sinks[each.key].writer_identity
}
Good observability isn't about collecting every log — it's about knowing exactly which events matter and getting them to the right place.
Lessons Learned: Log Format Serialization Gotchas

One major gotcha when using log sinks with BigQuery is handling schema mismatch. If your python application logs structured JSON (e.g. logging.info(json.dumps(my_dict))), BigQuery's log ingestion will attempt to parse the JSON fields into individual columns. If a developer changes a field from an integer to a string in their application code, BigQuery will immediately reject the log with a schema mismatch error, dropping subsequent logs silently! The lesson: always log critical audit events under a standardized, flat schema, or log the dynamic body as a single serialized string column to prevent schema drift from breaking ingestion.

More Recent Posts