Loading...

Introduction & Context

In the previous lessons, you've built a comprehensive foundation for securing OpenAI agent workflows. You learned how to securely handle sensitive data using RunContextWrapper, monitor agent execution with lifecycle hooks, and protect against harmful inputs using input guardrails. Now, you're ready to implement the final critical layer of your security framework: output guardrails.

While input guardrails protect your agents from problematic user requests, output guardrails serve as your last line of defense by validating what your agents actually generate before those responses reach end users. This is particularly crucial in production applications, where agents might generate content that violates company policies, contains sensitive information, or includes inappropriate material despite passing input validation.

Consider real-world scenarios where output guardrails become essential. Your travel assistant might generate a perfectly reasonable response to a legitimate question about nightlife but inadvertently include references to adult entertainment venues. A customer service agent could accidentally expose internal company information while trying to be helpful. Or a content creation agent might produce material that, while technically responding to an appropriate prompt, crosses boundaries that weren't anticipated during input validation.

Output guardrails complete your security pipeline by ensuring that every response your agents generate undergoes final validation before reaching users. This creates a comprehensive protection system where you control both what goes into your agents and what comes out of them, giving you confidence to deploy sophisticated AI workflows in production environments.

Understanding Output Guardrails vs Input Guardrails

As a reminder from the previous lesson, input guardrails operate before your agent begins processing, validating user requests and blocking inappropriate inputs before any computational resources are consumed. Output guardrails work differently — they execute after your agent has completed its processing and generated a response, but before that response is delivered to the user.

This timing difference is crucial for understanding when and why to use each type of guardrail. Input guardrails are your first line of defense, preventing obviously problematic requests from wasting computational resources or potentially corrupting your agent's reasoning process. Output guardrails serve as your final quality gate, catching issues that might emerge during the agent's generation process even when the original input seemed perfectly acceptable.

In multi-agent workflows, output guardrails become even more important because they validate the final output regardless of how many agents were involved in generating it. An agent might receive a clean input, process it appropriately, but still produce output that needs validation due to the complex interactions between different agents or unexpected emergent behaviors in the generation process.

The complementary nature of input and output guardrails means they work together to provide comprehensive protection. Input guardrails prevent bad requests from entering your system, while output guardrails ensure that only appropriate responses leave your system. This dual-layer approach gives you maximum control over your agent's behavior and helps maintain trust with your users.

The @output_guardrail Decorator and Function Structure

Creating an output guardrail follows a similar pattern to the input guardrails you learned about in the previous lesson, but with a different function signature that reflects the post-generation validation context. You'll use the @output_guardrail decorator to register your function with the SDK's guardrail system.

The basic structure of an output guardrail function looks like this:

The key difference from input guardrails is that your function receives the agent’s generated output (output parameter) for validation, rather than the user’s input. The ctx parameter is still your secure context, and agent refers to the agent that produced the output.

The @output_guardrail decorator is required to register your function with the SDK’s output validation system—without it, your function won’t be called after agent response generation.

As with input guardrails, your function must return a GuardrailFunctionOutput object. Set output_info to a human-readable validation message, and tripwire_triggered to True to block the output or False to allow it through.

LLM-Based Output Guardrails

Just like you did for input guardrails in the previous lesson, you can use an LLM-based agent to validate outputs. You’ll reuse the same ContentCheckOutput class for structured validation, but this time, you’ll modify the agent’s instructions to focus on analyzing the agent’s generated output rather than the user’s request.

For output guardrails, the key difference is in the instructions you give to your guardrail agent. Instead of asking it to analyze a user request, you instruct it to review the agent’s response for policy violations or inappropriate content:

By reusing the same output model and simply updating the instructions, you ensure consistency in how validation results are structured, while adapting the guardrail agent’s focus to the output validation context. This approach allows you to leverage the same validation logic for both input and output guardrails, with only minor changes to the agent’s instructions.

Implementing Output Guardrail Functions

The implementation of an output guardrail function closely mirrors what you did for input guardrails. The main difference is that you decorate the function with @output_guardrail and pass the agent’s generated output (instead of the user’s input) to your guardrail agent for validation.

The only real change from the input guardrail pattern is the use of the @output_guardrail decorator and the fact that you are validating the agent’s output instead of the user’s request. The rest of the logic—running the guardrail agent, interpreting its structured response, and returning a GuardrailFunctionOutput—remains the same.

Attaching Guardrails and Exception Handling

Once you've created your output guardrail function, you need to attach it to your agent using the output_guardrails parameter when creating your agent:

When you attach output guardrails to an agent, they become an integral part of that agent's response pipeline. Every time the agent generates a response, the output guardrails automatically execute, validating the response before it is delivered to the user. If a guardrail determines that the response should be blocked by setting tripwire_triggered=True, the SDK will raise an OutputGuardrailTripwireTriggered exception instead of returning the agent's response. This ensures that any inappropriate or policy-violating output is intercepted and never reaches the end user.

Testing Output Guardrail Behavior

Let's test your complete output guardrail implementation with both inappropriate and appropriate requests to see how the system behaves:

When you run this test with the inappropriate request, you'll see output similar to this:

For the appropriate hiking request, you'll see the guardrail agent's analysis followed by the travel agent's actual response:

This demonstrates how output guardrails work seamlessly with legitimate requests while blocking inappropriate content, ensuring that your agent can provide helpful responses while maintaining safety standards.

Summary: Your Complete Agent Security Framework

You've now mastered all four layers of comprehensive agent security in the OpenAI Agents SDK. Your security framework includes secure data handling through RunContextWrapper, comprehensive workflow monitoring through lifecycle hooks, proactive input validation through input guardrails, and final output validation through output guardrails.

The combination of these security mechanisms gives you the confidence to deploy sophisticated AI workflows in real-world applications where safety, compliance, and reliability are paramount. In the upcoming practice exercises, you'll apply these output guardrail implementation skills to build more complex validation scenarios and explore advanced patterns for protecting your agent systems.

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal