Introduction: Production-Ready Logging and Debugging

In the previous lesson, you successfully deployed your containerized application as a Cloud Run service. Your service is now running, handling requests, and automatically recovering from failures. However, having a running service is just the beginning of your production journey. When issues arise — and they will — you need robust logging and debugging capabilities to quickly identify and resolve problems.

Cloud Logging integration with Cloud Run provides a powerful foundation for monitoring your containerized applications. Cloud Run automatically captures all output your application writes to stdout and stderr, routing it to Cloud Logging without requiring any explicit configuration. This creates a centralized location for all your application logs, making it easy to track what is happening across all your service instances.

In this lesson, you will master the essential skills for production logging and debugging. You will learn how to manage log retention to control costs, access real-time logs as they stream from your containers, write sophisticated queries to find specific events or errors, and systematically debug failed deployments. By the end of this lesson, you will have a complete toolkit for maintaining and troubleshooting your Cloud Run workloads in production environments.

Cloud Logging Configuration and Retention

Cloud Run automatically sends all container logs to Cloud Logging without requiring any configuration. When your service runs, each request and instance generates logs that include the service name, revision, and instance identifier. These logs are stored in Cloud Logging's default log bucket, which retains logs for 30 days by default.

However, logs can accumulate quickly and become expensive to store long term. Setting appropriate retention policies helps you balance debugging needs with cost control. Cloud Logging uses log buckets to organize and manage log retention. You can configure buckets to automatically delete older logs after a specified period, ranging from 1 day to 3,650 days (10 years).

To view your current log buckets and their retention settings, use the gcloud logging buckets list command:

This command shows all log buckets in your project:

The _Default bucket stores most Cloud Run logs and has a 30-day retention period. To adjust the retention period for development and testing environments, you can update the bucket configuration. This example sets logs to expire after 7 days:

You can verify the retention policy was applied by describing the bucket:

Real-Time Log Access and Tailing

When debugging active issues or monitoring application behavior, you often need to see logs as they happen in real time. The gcloud CLI provides powerful commands that stream live logs directly to your terminal, similar to the Unix tail -f command.

To follow live logs from your Cloud Run service, use the gcloud run services logs tail command. This command will show recent logs and continue streaming new entries as they arrive:

When you run this command, you'll see output similar to this:

Each log entry includes a timestamp and the actual log message from your application. Cloud Run automatically adds metadata to each log entry, including the service name, revision, and instance ID, which you can view using the more detailed gcloud logging tail command.

For more control over log filtering and formatting, use the gcloud logging tail command with Cloud Run-specific filters:

This command filters logs to show only entries from your specific Cloud Run service and displays them in a table format with timestamps and log messages.

You can also filter logs by time range and specific patterns. For example, to see only error messages from the past 30 minutes:

Structured Log Queries with Cloud Logging

While real-time log tailing is excellent for immediate debugging, you often need to analyze historical logs or perform complex searches across large volumes of log data. Cloud Logging provides a powerful query language that lets you search, filter, and analyze your logs using structured filters.

The gcloud logging read command allows you to query historical logs with sophisticated filters. Here's a fundamental query that retrieves recent log entries from your Cloud Run service:

This query filters logs to show only entries from your Cloud Run service within the past 1 hour, limits the results to 50 entries, and displays them in a readable table format with timestamps, revision names, and log messages.

You can enhance queries with more specific filters to find particular events. For example, to find all HTTP requests that took longer than 100 milliseconds:

This query uses a regular expression (indicated by =~) to match log entries containing response times of 100 ms or more. The filter capability makes it easy to isolate specific types of events from your application logs.

For error analysis, you might search for specific error patterns:

Cloud Logging Query Cheat Sheet

Here are common filter patterns you can use with gcloud logging read or gcloud logging tail for quick troubleshooting:

Basic Service Filters:

Severity-Based Filters:

Text Pattern Matching:

Time-Range Filters:

Structured JSON Logs:

Combined Filters:

Use these patterns as starting points and combine them to create precise queries that match your specific troubleshooting needs.

Debugging Failed Revisions and Instances

When Cloud Run services fail to deploy or instances crash unexpectedly, understanding how to extract and interpret diagnostic information is crucial for effective debugging. Cloud Run uses a revision-based deployment model, where each deployment creates a new revision. Failed revisions and crashed instances leave diagnostic information that helps you determine what went wrong.

Start by checking the status of your service and its revisions:

This command shows the service's current state and identifies which revision is serving traffic versus which was most recently deployed:

When a revision fails to deploy, the latestCreatedRevisionName will differ from latestReadyRevisionName, indicating that the newest revision never became ready to serve traffic.

To investigate a failed revision, list all revisions and their status:

This command shows all revisions with their readiness status and any failure reasons:

The output reveals that revision my-web-service-00003-xyz failed because the container did not start properly and listen on the expected port. This is a common Cloud Run failure mode.

To get more detailed information about a specific failed revision:

Service Health and Deployment Issues

Cloud Run services generate events and maintain status information that provide insight into deployment progress, scaling activities, and operational issues. These details are invaluable for understanding service behavior and diagnosing problems that affect the service as a whole.

To view your service's current health and configuration, use the gcloud run services describe command:

This command returns comprehensive information about your service. Focus on the status section to understand the current state:

The conditions array shows three key health indicators: Ready indicates the service is operational, ConfigurationsReady shows the latest revision deployed successfully, and RoutesReady confirms traffic routing is configured correctly. When any of these conditions shows status: 'False', it indicates a problem.

To check the scaling status and see how many instances are currently running:

This command extracts the concurrency setting and min/max instance configuration:

Summary: Building Your Debugging Toolkit

You now have a comprehensive toolkit for logging and debugging Cloud Run applications in production. You learned how to manage Cloud Logging retention using log buckets to balance debugging capabilities with cost control, access real-time logs for immediate troubleshooting, and write sophisticated queries to analyze historical log data using Cloud Logging's query language.

Your debugging skills now include systematically investigating failed revisions by examining status conditions and failure reasons, correlating container failures with their specific logs, and understanding the difference between cold start failures and runtime failures. You also learned how to interpret service-level health indicators and use Cloud Run's revision-based deployment model to roll back to working versions or force new deployments when needed.

These logging and debugging techniques form the foundation of effective Cloud Run operations. Proactive log monitoring helps you identify issues before they impact users, while systematic debugging approaches help you quickly resolve problems when they occur. As you continue working with Cloud Run, these skills will become second nature, enabling you to maintain reliable containerized applications at scale.

In the next lesson, you will explore advanced Cloud Run features, including traffic splitting for gradual rollouts, custom domains for production URLs, and integration with Cloud Load Balancing for more sophisticated routing scenarios.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal