Distributed Tracing Fundamentals

Introduction: What Is Distributed Tracing and Why Use It?

Welcome to the lesson on distributed tracing with OpenTelemetry. So far in this course, you have learned how to keep your GCP credentials secure, protect your APIs, and monitor your applications with Cloud Monitoring. Now, we will focus on understanding how requests move through your application, especially when it uses multiple GCP services.

Distributed tracing helps you see the path a request takes as it travels through different parts of your system. This is important because modern cloud applications often use many services, and it can be hard to know where problems or slowdowns happen. OpenTelemetry is an open-source standard that helps you trace these requests, find bottlenecks, and understand how your services work together.

Here's how the tracing flow works:

In this lesson, we'll use ConsoleSpanExporter to print traces directly to your terminal. This makes it easy to see what's happening as you learn. By the end of this lesson, you will know how to add distributed tracing to a Python application that uses GCP services, and you'll understand how to export traces to different destinations for development and production use.

Recall: Using GCP Client Libraries in Python

Before we dive into tracing, let's quickly remind ourselves how we use GCP client libraries in Python. In previous lessons, you used libraries like google-cloud-firestore to interact with GCP services like Cloud Firestore and Cloud Monitoring.

For example, to connect to a Firestore database, you might write:

This code creates a Firestore client and gets a reference to the orders collection. You can then use this collection object to read or write data.

In this lesson, we will build on this by adding tracing to these operations.

Setting Up OpenTelemetry Tracing in Your Python Application

To use distributed tracing in your Python application, you need to install and configure the OpenTelemetry SDK. On CodeSignal, the SDK is already installed for you, but it's good to know how to do this on your own machine.

You would normally install the required packages with:

Next, you need to import and configure the OpenTelemetry tracer in your Python code. For learning purposes, we'll use ConsoleSpanExporter, which prints trace data directly to your terminal as JSON. This makes it easy to see what's being traced without needing to set up external tools.

Here's how you start:

TracerProvider is the main object that manages tracing.
Resource.create({"service.name": "orders-worker"}) adds metadata about your service to all traces.
ConsoleSpanExporter prints trace data to stdout in JSON format, making it perfect for learning and debugging.
SimpleSpanProcessor processes each span immediately and sends it to the exporter (good for development).
tracer is the object you use to record traces for your service.

To make tracing work automatically with GCP services like Firestore, you can use OpenTelemetry's auto-instrumentation:

Understanding Span Processors

Before we explore different exporters, let's understand span processors. A span processor is responsible for collecting completed spans and sending them to an exporter. OpenTelemetry provides two main types of span processors, and choosing the right one is important for your application's performance.

SimpleSpanProcessor (For Learning and Debugging)

The SimpleSpanProcessor processes and exports each span immediately as soon as it completes:

This processor is synchronous, meaning your application waits for each span to be exported before continuing. This is useful when:

Learning: You see trace data instantly in your console, making it easy to understand what's being traced.
Debugging: Each span is exported immediately, so you won't lose trace data if your application crashes.
Console output: When printing to stdout, immediate output is helpful for following along.

However, SimpleSpanProcessor should not be used in production because it can slow down your application. Every traced operation must wait for the export to complete before continuing.

BatchSpanProcessor (For Production)

The BatchSpanProcessor collects spans in memory and exports them in batches:

This processor is asynchronous and batches multiple spans together before exporting them. This approach:

Reduces overhead: Your application doesn't wait for each span to be exported.
Improves performance: Batching reduces the number of network calls to the trace backend.
Handles failures better: If exporting fails temporarily, batching can retry without blocking your application.

The BatchSpanProcessor is recommended for production because it minimizes the performance impact of tracing on your application.

When to Use Each Processor

Here's a quick guide:

Processor Type	Use Case	Benefit
SimpleSpanProcessor	Learning, debugging, console output	Immediate trace visibility, simple setup
BatchSpanProcessor	Production, OTLP, Cloud Trace	Better performance, lower overhead

In this lesson, we use SimpleSpanProcessor with ConsoleSpanExporter to help you learn. When you move to production with OTLP or Cloud Trace exporters, you should switch to BatchSpanProcessor.

Understanding Span Exporters

OpenTelemetry uses span exporters to send trace data to different destinations. Each exporter serves a different purpose:

ConsoleSpanExporter (For Learning)

The ConsoleSpanExporter is the simplest exporter. It prints trace data to your terminal:

This is perfect for:

Learning how tracing works
Debugging your instrumentation
Quick local testing

When you run your code, you'll see JSON output in your terminal showing the trace details.

OTLPSpanExporter (For Local Development)

The OTLPSpanExporter sends traces to an OpenTelemetry Collector running on your machine:

The OpenTelemetry Collector is a service that:

Receives traces from your application
Can batch, filter, and transform traces
Can send traces to multiple backends (like Cloud Trace, Jaeger, Zipkin)

This is useful for:

Local development with visualization tools
Testing before deploying to production
Running multiple services that all send traces to one place

CloudTraceSpanExporter (For Production)

The CloudTraceSpanExporter sends traces directly to Google Cloud Trace:

This is ideal for production because:

Cloud Trace stores and visualizes your traces
It integrates with other GCP monitoring tools
No need to run an OpenTelemetry Collector

Adding Tracing to Your Code

Now, let's see how to add tracing to your code step by step. We will use a simple example in which we write a document to a Firestore collection and trace this operation.

Step 1: Import and Configure OpenTelemetry

First, import the necessary modules and configure OpenTelemetry:

This sets up OpenTelemetry with ConsoleSpanExporter so you can see trace data in your terminal.

Step 2: Instrument gRPC and Connect to Firestore

Before creating the Firestore client, instrument gRPC so that all Firestore operations are automatically traced:

Important: Instrument gRPC before creating the Firestore client. This ensures that all gRPC calls are traced automatically.

Step 3: Add a Traced Operation

Now, let's write a function that adds a document to the orders collection. We want to trace this specific operation using a span. A span is a part of a trace that shows what happens during a specific step.

with tracer.start_as_current_span("firestore-write"): creates a span named firestore-write. This helps you see exactly how long the Firestore write takes and whether there are any issues.
db.collection("orders").document("o-200").set(...) writes a new order to the collection.

Step 4: Run the Function and Cleanup

Finally, call the function, shut down the tracer provider to flush any remaining spans, and print a message when done:

provider.shutdown() ensures all spans are exported before the program exits. This is important to avoid losing trace data.

When you run this code, you'll see JSON output in your terminal showing the trace data, followed by:

The trace data shows you details about the firestore-write span and any automatically traced gRPC calls to Firestore.

Moving from Console to Production Exporters

While ConsoleSpanExporter is great for learning, you'll want to use different exporters as you move to development and production:

For Local Development with Visualization

Install the OTLP exporter:

Then modify your configuration to use the OTLP exporter:

You'll need to run an OpenTelemetry Collector on your machine (on port 4317) to receive and process these traces. The collector can then forward them to visualization tools like Jaeger or Zipkin.

For Production with Google Cloud Trace

Install the Cloud Trace exporter:

Then modify your configuration to use the Cloud Trace exporter:

The rest of your code remains the same. The key differences are:

Learning/Debugging: Uses ConsoleSpanExporter with SimpleSpanProcessor to print traces immediately
Local development: Uses OTLPSpanExporter with BatchSpanProcessor to send traces to a local collector
Production: Uses CloudTraceSpanExporter with BatchSpanProcessor to send traces directly to Google Cloud Trace

This approach gives you flexibility: start learning with console output, develop locally with the OpenTelemetry Collector and visualization tools, then deploy to production with Cloud Trace without changing your instrumentation code.

Review and What's Next

In this lesson, you learned how to:

Set up distributed tracing in a Python application using OpenTelemetry.
Use ConsoleSpanExporter to see trace data in your terminal for learning and debugging.
Instrument gRPC calls to automatically trace GCP service calls.
Use spans to trace specific operations, such as writing to Firestore.
Understand the different exporters: Console (learning), OTLP (local development), and Cloud Trace (production).

You saw how to build up the code step by step, and how each part helps you trace what your application is doing. In the practice exercises, you will get hands-on experience adding tracing to your own code and exploring the trace data printed to your console.

Congratulations on reaching the end of this course! You now have a strong foundation in developer security and observability on GCP. Keep practicing and applying these skills to build secure, reliable, and observable cloud applications.

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal