Welcome to the final unit of Agent SDK! Over the past four units, we have built a comprehensive understanding of the Agent SDK ecosystem. We started with the foundations of programmatic agent control, explored the Python SDK's built-in tools and configuration options, extended our agents with custom tools and hooks, and ventured into TypeScript for production patterns. We learned about retry logic, error handling, and containerization.
In this fifth unit, we bring everything together by focusing on production deployment and integration. While previous lessons taught us how to build reliable agents, this unit focuses on deploying them into real-world environments where they must operate continuously, integrate with existing workflows, and provide visibility into their behavior. We will explore comprehensive logging strategies, implement monitoring hooks that track agent activities, integrate with CI/CD pipelines using GitHub Actions, and set up metrics collection with Prometheus.
By the end of this lesson, we will have built a complete production system: a documentation review bot that runs automatically on every pull request, logs detailed information about its operations, tracks performance metrics, and handles errors gracefully. This represents the culmination of our learning journey, transforming our agent development skills into production-ready capabilities.
Before diving into code, let us understand the components that make up a production agent deployment. A robust production system consists of several interconnected layers, each serving a specific purpose.
At the core, we have the Agent SDK application itself, which can be written in either Python or TypeScript. This application defines our agent's behavior, configures its tools and permissions, and orchestrates the conversation with the model. Surrounding this core are several supporting components.
Custom tools extend the agent's capabilities beyond built-in functions. We create these using the @tool decorator, allowing our agents to interact with domain-specific systems and APIs. Hooks provide visibility and control, acting as Python callbacks that execute at key points during the agent's lifecycle, such as before and after tool execution.
MCP integration connects our agents to external services and data sources. This enables retrieval-augmented generation, database queries, and API integrations. Error handling ensures our agents recover gracefully from failures, implement retry strategies, and report issues clearly. Finally, monitoring and logging provide observability, capturing metrics about performance, costs, and behavior patterns.
These components work together to create a system that is not just functional but also observable, maintainable, and reliable in production environments.
Production systems require comprehensive logging to understand what happens during agent execution. Let us set up logging for our production documentation bot:
This configuration establishes structured logging for our application. The basicConfig() function sets up the logging system at the INFO level, which captures important events without overwhelming us with debug messages. The format string includes timestamps, log levels, and messages, making it easy to track when events occurred and their severity.
We create a logger specific to our module using __name__, which allows us to filter and route logs based on their source. In production environments, these logs can be sent to centralized systems like Elasticsearch or CloudWatch for analysis and alerting.
Hooks allow us to intercept and observe agent behavior at critical points. The pre_tool_hook executes just before the agent uses any tool, providing visibility into what actions the agent plans to take:
This hook receives three parameters: input_data contains information about the tool call, including the tool_name and arguments; tool_use_id uniquely identifies this specific tool invocation; and context provides additional state information. We extract the tool_name and log that the tool is starting.
The hook returns an empty dictionary, which is standard practice when we are not modifying the execution flow. However, we could return data here to influence how the tool executes or to pass information to subsequent hooks. This pattern is invaluable for auditing, security monitoring, and debugging agent behavior in production.
The post_tool_hook complements the pre-hook by executing after tool completion, allowing us to observe results and track execution patterns:
This hook mirrors the structure of the pre-hook but executes after the tool has finished. By logging tool completion, we can calculate execution durations by comparing timestamps from pre and post hooks. This information helps identify performance bottlenecks and tools that frequently fail or time out.
Together, the pre and post hooks create a complete audit trail of tool usage. In production systems, we might extend these hooks to record tool arguments, validate tool outputs, track error rates, or enforce usage policies. The hooks architecture provides flexibility to add observability without modifying the core agent logic.
Now let us construct the main class for our production documentation bot. This class encapsulates configuration, validation, and the hook methods we defined:
The constructor performs essential validation by checking for the ANTHROPIC_API_KEY environment variable. Production agents must fail fast during initialization if critical configuration is missing, rather than failing silently during execution. This validation ensures we catch configuration issues before the agent attempts to make API calls.
We log successful initialization, which helps confirm that the ProductionDocBot started correctly in production environments. This simple logging statement can be invaluable when debugging deployment issues or verifying that environment variables were properly configured.
The review_file method implements the core functionality of reviewing a single documentation file. Let us examine how it configures the agent:
This configuration demonstrates several production best practices. The permission_mode="bypassPermissions" setting allows Claude to execute tools without prompting for approval, which is essential for automated CI/CD environments. However, we use disallowed_tools to explicitly block Bash and Write tools that could modify the system or files, even with bypassPermissions enabled. This deny-list approach provides a security boundary while allowing flexibility for the agent to use other tools like Read, Grep, and WebFetch as needed.
The system_prompt provides clear instructions about what to check and how to report findings, including severity levels for prioritization. The hooks dictionary registers our pre and post hooks using , which allows pattern-based hook selection. Each hook is wrapped in a list because we can register multiple hooks for the same event. The limit prevents runaway execution, capping the agent's reasoning steps at five iterations.
With the agent configured, we process its response stream to extract the review results:
We use a context manager (async with) to ensure proper resource cleanup with ClaudeSDKClient. The agent query calls the file for quality issues, and we iterate through the response stream looking for ResultMessage instances. When we find a success result, we return a comprehensive dictionary containing the filepath, the result, total_cost_usd, and duration_ms.
If the message.subtype indicates failure, we log the error and return a dictionary with an error flag. This structured return format makes it easy for calling code to handle both success and failure cases consistently. The inclusion of cost and duration data enables tracking of operational metrics over time.
The format_report method transforms raw review data into a human-readable markdown report:
This method checks if the review encountered an error and formats an appropriate message. For successful reviews, it builds a markdown document with the file name as a header, the full result content, a separator line, and metadata about cost and duration_ms.
The cost formatting uses .4f to display four decimal places, providing precision for tracking expenses across many reviews. Duration in milliseconds helps identify performance patterns. This formatted report can be saved to files, posted as comments on pull requests, or sent to team communication channels.
The main function demonstrates how to use the production bot from the command line:
The function validates command-line arguments via sys.argv, requiring a filepath parameter. It instantiates the bot, performs the review, formats the report, and both prints and saves the results. The try-except block ensures that any fatal errors are logged and the process exits with a non-zero status code, signaling failure to orchestration systems.
When we run python agents/production_doc_bot.py docs/guide.md, we see:
The output shows our logging in action, tracking initialization, file review, tool usage through hooks, and completion. The formatted report provides actionable feedback about documentation quality issues with severity levels, helping teams prioritize improvements.
To automate documentation reviews in our development workflow, we integrate the bot with GitHub Actions. The workflow configuration defines when and how reviews run:
This workflow triggers on pull_request events that modify markdown files in the docs directory. The paths filter ensures we only run reviews when documentation actually changes, saving CI minutes and reducing noise. The jobs run on ubuntu-latest, providing a consistent Linux environment for execution.
This selective triggering is crucial for CI efficiency. If a pull request only modifies code files, the workflow remains idle. When documentation changes, the workflow activates automatically, providing immediate feedback to authors.
The workflow steps install dependencies, run reviews, and upload results:
The workflow checks out code using actions/checkout@v3, sets up Python 3.11 via actions/setup-python@v4, and installs the claude-agent-sdk. The review step cleverly identifies changed files using git diff, filters for markdown files, and reviews each one. The ANTHROPIC_API_KEY comes from GitHub Secrets, keeping credentials secure.
Finally, the workflow uploads all generated markdown reports as artifacts, making them accessible from the GitHub Actions interface. Team members can download these reports to review findings, and we could extend the workflow to post them as pull request comments for even more immediate feedback.
Beyond logging, we need quantitative metrics to understand agent performance and costs over time. Prometheus provides a standard approach for collecting and querying time-series metrics:
We define three key metrics: a Counter for requests_total, a Histogram for request_duration, and another Counter for errors_total. Counters monotonically increase, making them ideal for counting events. Histograms track distributions of values, allowing us to calculate percentiles and understand latency patterns.
These metrics use descriptive names following Prometheus conventions: lowercase with underscores, ending in _total for counters or _seconds for time measurements. The description strings appear in Prometheus dashboards, helping teammates understand what each metric represents.
Now we wrap our agent queries with metric tracking to capture performance data:
The run_with_metrics function increments the requests_total counter immediately using inc(), recording that an attempt occurred. We capture the start time using the event loop's monotonic clock. The agent configuration uses bypassPermissions for automated execution while blocking potentially destructive tools via disallowed_tools. As we process the response stream, we calculate duration and record it in the histogram using observe().
If the ResultMessage indicates failure or if an exception occurs, we increment the errors_total counter. This tracking happens transparently around our existing agent code, demonstrating how metrics can be added without disrupting core functionality. The resulting data flows to Prometheus for visualization and alerting.
To expose metrics to Prometheus, we start an HTTP server that serves metrics in Prometheus format:
This single line start_http_server starts a web server on port 8000 that exposes all our metrics at the /metrics endpoint. Prometheus scrapes this endpoint periodically, collecting the latest values and storing them in its time-series database.
When we access http://localhost:8000/metrics, we see output like:
This format shows our metrics in action: 142.0 total requests with a histogram of durations bucketed by time ranges, and 3.0 errors. Prometheus uses this data to calculate rates, percentiles, and trends. We can visualize these metrics in Grafana dashboards and configure alerts when error rates exceed thresholds or when latency increases unexpectedly.
We have completed our journey through production deployment and integration, learning how to transform agents from development prototypes into robust production systems. We explored comprehensive logging strategies that provide visibility into agent behavior, implemented hooks that track tool usage and enable auditing, and integrated agents into CI/CD pipelines using GitHub Actions for automated workflow execution.
We also covered operational monitoring with Prometheus metrics, enabling us to track request volumes, latency distributions, error rates, and costs over time. This observability foundation allows teams to maintain agent systems confidently, quickly diagnosing issues and understanding usage patterns.
The key insight is that production readiness requires more than just functional code: it demands observability, integration with existing workflows, and robust error handling. The patterns we learned, such as structured logging, lifecycle hooks, CI/CD integration, and metrics collection, apply universally to production agent systems regardless of the specific tools or frameworks used.
With these skills, we are equipped to deploy agents that operate reliably in real-world environments, integrate seamlessly with development workflows, and provide the visibility needed for ongoing maintenance and optimization. The upcoming practice exercises will challenge you to implement your own production deployment pipeline, complete with monitoring, CI/CD integration, and comprehensive observability, bringing together everything we've learned throughout this course into a complete production system!
