Loading...

Introduction & Context

In the previous lessons, you learned how to securely handle sensitive data using RunContextWrapper and monitor agent workflows with lifecycle hooks. Now you're ready to tackle the next critical layer of agent security: input guardrails.

While context management protects your internal data and hooks give you visibility into agent behavior, input guardrails protect your agents from potentially harmful, inappropriate, or malicious user inputs before they even begin processing. Think about real-world scenarios where your agents might face problematic inputs. A travel assistant might receive requests for illegal activities, a customer service bot could be asked to perform tasks outside its scope, or a content creation agent might be prompted to generate inappropriate material. Without proper input validation, your agents could waste computational resources, violate company policies, or even expose security vulnerabilities.

The most common and critical use of guardrails is at the input stage—validating user requests before your agent begins processing. Input guardrails serve as your first line of defense, ensuring that only safe, appropriate, and policy-compliant inputs are allowed to reach your agents.

Understanding Guardrails

As you build more capable and autonomous OpenAI agents, security becomes a multi-layered challenge. Guardrails are a foundational security mechanism designed to protect your agents from a wide range of problematic scenarios—whether that’s malicious user input, requests that violate business policies, or attempts to push your agent outside its intended scope.

Guardrails act as checkpoints that enforce your rules and policies before, during, or after agent execution. They can validate, block, or even rewrite data at critical points in your workflow, ensuring that your agents operate safely and predictably. For example, input guardrails can prevent agents from processing requests for illegal activities or block attempts to access sensitive information, while output guardrails can sanitize or filter agent responses before they reach end users, ensuring that no inappropriate or policy-violating content is returned.

By implementing guardrails, you create a proactive defense system that reduces the risk of security breaches, policy violations, and unintended agent behavior. This approach is essential for maintaining trust, compliance, and reliability in any production AI system.

Guardrails vs. Hooks

It’s important to distinguish between guardrails and hooks, as both play vital but different roles in agent security and control.

Hooks are general-purpose lifecycle callbacks that give you visibility and control over agent execution. They’re designed for monitoring, logging, dynamic context injection, and observability across your entire workflow. Hooks can run at various points during agent execution and are primarily focused on understanding and controlling what happens during the agent’s processing.
Guardrails are specialized functions that enforce validation or policy checks at specific points in the workflow. While hooks can observe and modify behavior throughout the agent lifecycle, guardrails are designed to make go/no-go decisions or transform data at well-defined checkpoints.

Input guardrails serve as your first line of defense by providing three key capabilities: validation (checking if inputs meet your criteria), blocking (preventing inappropriate requests from reaching your agents), and rewriting (modifying inputs to make them acceptable). This lesson will teach you how to implement these protective mechanisms using the OpenAI Agents SDK's specialized guardrail system.

The @input_guardrail Decorator and Function Structure

Creating an input guardrail in the OpenAI Agents SDK requires following a specific function signature and using the @input_guardrail decorator. This decorator tells the SDK that your function is designed to validate inputs and should be called before agent processing begins.

The basic structure of an input guardrail function looks like this:

Let's break down each parameter in the function signature:

The ctx parameter is the same RunContextWrapper you learned about in the other lesson — it provides access to your secure context data and allows you to share state across your workflow.
The agent parameter is a reference to the specific agent that would process this input, giving you access to the agent's name, instructions, and other properties for context-aware validation.
The input parameter contains the actual user input that needs validation, which can be either a simple string or a more complex list of input items.

The @input_guardrail decorator is essential because it registers your function with the SDK's guardrail system and ensures it gets called at the right time in the agent workflow. Without this decorator, your function would just be a regular async function that doesn't integrate with the agent's input validation pipeline.

After performing its checks, your guardrail function must return a GuardrailFunctionOutput object. This object communicates the validation result back to the system and determines what happens next. It has two critical fields:

output_info: a human-readable string that explains the validation decision. This is useful for logging, debugging, or providing feedback to users.
tripwire_triggered: a boolean value that decides whether to block the input (True) or allow it to proceed (False).

By following this structure and using the decorator, your guardrail function becomes an integral part of the agent's input validation pipeline, ensuring that every input is checked and handled according to your security and policy requirements before any agent processing begins.

Simple Rule-Based Input Validation

Now that you understand the basic structure of input guardrails, let's implement a simple example that demonstrates these concepts in action. Before moving to sophisticated LLM-based validation, we'll start with a straightforward rule-based approach using keyword detection.

Here's a basic input guardrail that checks for inappropriate travel-related content:

This simple guardrail follows the exact function signature you learned about: it uses the @input_guardrail decorator, accepts the required ctx, agent, and input parameters, and returns a GuardrailFunctionOutput object with appropriate output_info and tripwire_triggered values.

Attaching Guardrails to Agents

Once you've created your input guardrail function, you need to attach it to your agent so that it runs automatically before the agent processes any input. You do this by including your guardrail function in the input_guardrails parameter when creating your agent:

When you attach guardrails to an agent, they become an integral part of that agent's processing pipeline. Every time someone tries to run that agent with user input, the guardrails will automatically execute first, ensuring that your validation logic is consistently applied without requiring manual intervention or remembering to call validation functions.

Testing Guardrail Behavior with Blocked Inputs

Let's see what happens when you try to run your agent with an input that triggers the guardrail. When your guardrail sets tripwire_triggered=True, the SDK raises an InputGuardrailTripwireTriggered exception instead of proceeding with normal agent execution.

Here's how to test and handle this behavior:

When you run this code, you'll see that the request gets blocked because it contains the prohibited term "drugs", and the output will be:

The InputGuardrailTripwireTriggered exception allows you to handle blocked inputs gracefully and provide appropriate feedback to users. This exception handling pattern is essential for production applications where you need to differentiate between successful agent execution and blocked requests, allowing you to log security incidents, provide user feedback, or redirect users to alternative resources.

Understanding Concurrent Execution with Guardrails

It's important to understand that when you run an agent with input guardrails, both the guardrail validation and your main agent begin executing simultaneously. This concurrent execution means that if you're streaming the response from your main agent, you might see the beginning of the agent's response being printed before the guardrail completes its validation.

If the guardrail determines that the input should be blocked, the main agent's execution will be terminated and you'll receive the InputGuardrailTripwireTriggered exception, even if some output was already generated. This behavior is designed to maximize performance by not waiting for guardrail validation to complete before starting the main agent, but it means you should be prepared to handle cases where partial output might be generated before a request is ultimately blocked.

In production applications, you may want to consider whether to show partial responses to users or wait for complete validation before displaying any output, depending on your specific security and user experience requirements.

Creating LLM-Based Content Validation

While rule-based guardrails are effective for obvious violations, they can be limited by their reliance on exact keyword matches. For more sophisticated content analysis, you can use another AI agent to intelligently evaluate user inputs. This creates a "guardrail agent" that can understand context, detect subtle attempts to circumvent restrictions, and make nuanced decisions about input appropriateness.

First, you need to define a structured output model that your guardrail agent will use to communicate its decisions:

This ContentCheckOutput model ensures that your guardrail agent provides both a clear boolean decision (contains_prohibited_content) and human-readable reasoning for that decision. The reasoning field is particularly valuable for debugging, logging, and providing feedback to users when their inputs are blocked.

Next, you create the guardrail agent itself, which is a specialized agent designed specifically for content validation:

This guardrail agent has focused instructions that tell it exactly what to look for in user inputs. By specifying the output_type=ContentCheckOutput, you ensure that the agent's response will always follow your structured format, making it easy to programmatically process the validation results.

Implementing LLM-Based Guardrail Functions

Now let's implement the actual guardrail function that uses your guardrail agent to validate inputs:

This function demonstrates the complete flow of LLM-based input validation. You use Runner.run() to execute your guardrail agent with the user's input, just like running any other agent. The guardrail agent analyzes the input and returns a structured response containing both the decision and reasoning.

The GuardrailFunctionOutput object you return uses the guardrail agent's reasoning as the output_info and its boolean decision as the tripwire_triggered value. When tripwire_triggered=True, the SDK will prevent the input from reaching your main agent and raise an InputGuardrailTripwireTriggered exception.

Testing LLM-Based Validation

You can now update your travel agent to use this more sophisticated guardrail:

Let's test the LLM-based guardrail with an inappropriate input:

When you run this code, you'll see output like this:

This output demonstrates how the LLM-based guardrail agent analyzes the input, provides reasoning for its decision, and blocks the inappropriate request. The guardrail agent can understand the context and intent behind the request, even when it doesn't use obvious prohibited keywords.

Combining Multiple Guardrails

The input_guardrails parameter accepts a list of guardrail functions, which means you can attach multiple guardrails to a single agent for layered validation. When multiple guardrails are attached, they execute in the order they appear in the list, and all guardrails must pass for the input to proceed:

This design allows you to create layered validation strategies where fast rule-based checks catch obvious violations first, followed by more sophisticated LLM-based analysis for nuanced content evaluation. Each guardrail can focus on a specific aspect of validation while working together to provide comprehensive input protection.

Summary & Preparing for Practice

In this lesson, you've mastered the implementation of input guardrails as a critical security layer for your OpenAI agent workflows. You learned how guardrails differ from the lifecycle hooks covered in the previous lesson, serving as specialized validation-focused functions that protect your agents from inappropriate or malicious inputs before processing begins.

Your security foundation now includes three complementary layers: secure data handling through RunContextWrapper, comprehensive workflow monitoring through lifecycle hooks, and proactive input validation through guardrails. In the upcoming practice exercises, you'll apply these input guardrail implementation skills to build more sophisticated validation scenarios, test edge cases, and explore advanced patterns for protecting your agent systems. After mastering input validation through hands-on practice, you'll learn about output guardrails in the next lesson to complete your comprehensive agent security toolkit.

Previous Lesson

Next Lesson: Securing Agent Responses with Output Guardrails

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal