Protecting Agents with Input Guardrails

Introduction & Context

In the previous lessons, you learned how to securely handle sensitive data using context objects and how to monitor agent workflows with event handlers. Now, you're ready to tackle the next critical layer of agent security: input guardrails.

While context management protects your internal data and event handlers give you visibility into agent behavior, input guardrails protect your agents from potentially harmful, inappropriate, or malicious user inputs before they even begin processing. Imagine real-world scenarios where your agents might face problematic inputs: a travel assistant could receive requests for illegal activities, a customer service bot might be asked to perform tasks outside its scope, or a content creation agent could be prompted to generate inappropriate material. Without proper input validation, your agents could waste resources, violate company policies, or even expose security vulnerabilities.

The most common and critical use of guardrails is at the input stage — validating user requests before your agent begins processing. Input guardrails serve as your first line of defense, ensuring that only safe, appropriate, and policy-compliant inputs are allowed to reach your agents.

Guardrails vs. Event Handlers

As you build more capable and autonomous OpenAI agents, security becomes a multi-layered challenge. Guardrails are a foundational security mechanism designed to protect your agents from a wide range of problematic scenarios — whether that’s malicious user input, requests that violate business policies, or attempts to push your agent outside its intended scope.

It's important to distinguish between event handlers and guardrails:

Event Handlers are general-purpose lifecycle callbacks that give you visibility and control over agent execution. They're designed for monitoring, logging, dynamic context injection, and observability across your entire workflow. Event handlers can run at various points during agent execution and are primarily focused on understanding and controlling what happens during the agent’s processing.
Guardrails are specialized functions (or objects) that enforce validation or policy checks at specific points in the workflow. While event handlers can observe and modify behavior throughout the agent lifecycle, guardrails are designed to make go/no-go decisions or transform data at well-defined checkpoints.

Input guardrails serve as your first line of defense by providing three key capabilities: validation (checking if inputs meet your criteria), blocking (preventing inappropriate requests from reaching your agents), and rewriting (modifying inputs to make them acceptable). This lesson will teach you how to implement these protective mechanisms using the OpenAI Agents SDK's specialized guardrail system in TypeScript.

Defining Input Guardrails in TypeScript

In the OpenAI Agents SDK for TypeScript, an input guardrail is defined as an object that implements the InputGuardrail interface. This object must have a name property and an execute method. The execute method is called before agent processing begins and is responsible for validating the user input.

The basic structure of an input guardrail in TypeScript looks like this:

Let's break down the key parts:

The name property is a human-readable identifier for your guardrail.
The execute method receives an object with input (the user input to validate) and context (the current run context).
The method returns an object with two fields:
- outputInfo: a string explaining the validation decision (useful for logging, debugging, or user feedback).

Simple Rule-Based Input Validation

Now that you understand the basic structure of input guardrails, let's implement a simple example that demonstrates these concepts in action. Before moving to sophisticated LLM-based validation, we'll start with a straightforward rule-based approach using keyword detection.

Here's a basic input guardrail that checks for inappropriate travel-related content:

This guardrail follows the required structure: it has a name, an execute method, and returns an object with outputInfo and tripwireTriggered.

Attaching Guardrails to Agents

Once you've created your input guardrail, you need to attach it to your agent so that it runs automatically before the agent processes any input. You do this by including your guardrail in the inputGuardrails property when creating your agent:

When you attach guardrails to an agent, they become an integral part of that agent's processing pipeline. Every time someone tries to run that agent with user input, the guardrails will automatically execute first, ensuring that your validation logic is consistently applied.

Testing Guardrail Behavior with Blocked Inputs

Let's see what happens when you try to run your agent with an input that triggers the guardrail. When your guardrail sets tripwireTriggered: true, the SDK throws an InputGuardrailTripwireTriggered exception instead of proceeding with normal agent execution.

Here's how to test and handle this behavior in TypeScript:

When you run this code, you'll see that the request gets blocked because it contains the prohibited term "gamble," and the output will be:

The InputGuardrailTripwireTriggered exception allows you to handle blocked inputs gracefully and provide appropriate feedback to users. This exception handling pattern is essential for production applications where you need to differentiate between successful agent execution and blocked requests.

Understanding Concurrent Execution with Guardrails

It's important to understand that when you run an agent with input guardrails, both the guardrail validation and your main agent begin executing simultaneously. This concurrent execution means that if you're streaming the response from your main agent, you might see the beginning of the agent's response being printed before the guardrail completes its validation.

If the guardrail determines that the input should be blocked, the main agent's execution will be terminated and you'll receive the InputGuardrailTripwireTriggered exception, even if some output was already generated. This behavior is designed to maximize performance by not waiting for guardrail validation to complete before starting the main agent, but it means you should be prepared to handle cases where partial output might be generated before a request is ultimately blocked.

In production applications, you may want to consider whether to show partial responses to users or wait for complete validation before displaying any output, depending on your specific security and user experience requirements.

Creating LLM-Based Content Validation

While rule-based guardrails are effective for obvious violations, they can be limited by their reliance on exact keyword matches. For more sophisticated content analysis, you can use another AI agent to intelligently evaluate user inputs. This creates a "guardrail agent" that can understand context, detect subtle attempts to circumvent restrictions, and make nuanced decisions about input appropriateness.

First, you need to define a structured output model that your guardrail agent will use to communicate its decisions. In TypeScript, you can use the zod library for this:

This ContentCheckOutput schema ensures that your guardrail agent provides both a clear boolean decision (containsProhibitedContent) and human-readable reasoning for that decision.

Next, you create the guardrail agent itself, which is a specialized agent designed specifically for content validation:

This guardrail agent has focused instructions that tell it exactly what to look for in user inputs. By specifying the outputType: ContentCheckOutput, you ensure that the agent's response will always follow your structured format.

Implementing LLM-Based Guardrail Functions

Now let's implement the actual guardrail function that uses your guardrail agent to validate inputs:

This function demonstrates the complete flow of LLM-based input validation. You use run() to execute your guardrail agent with the user's input. The guardrail agent analyzes the input and returns a structured response containing both the decision and reasoning.

The object you return uses the guardrail agent's reasoning as the outputInfo and its boolean decision as the tripwireTriggered value. When tripwireTriggered is true, the SDK will prevent the input from reaching your main agent and throw an InputGuardrailTripwireTriggered exception.

Testing LLM-Based Validation

You can now update your travel agent to use this more sophisticated guardrail:

Let's test the LLM-based guardrail with an inappropriate input:

When you run this code, you'll see output like this:

This output demonstrates how the LLM-based guardrail agent analyzes the input, provides reasoning for its decision, and blocks the inappropriate request. The guardrail agent can understand the context and intent behind the request, even when it doesn't use obvious prohibited keywords.

Combining Multiple Guardrails

The inputGuardrails property accepts an array of guardrail objects, which means you can attach multiple guardrails to a single agent for layered validation. When multiple guardrails are attached, they execute in the order they appear in the array, and all guardrails must pass for the input to proceed:

This design allows you to create layered validation strategies, where fast rule-based checks catch obvious violations first, followed by more sophisticated LLM-based analysis for nuanced content evaluation. Each guardrail can focus on a specific aspect of validation while working together to provide comprehensive input protection.

Summary & Preparing for Practice

In this lesson, you've mastered the implementation of input guardrails as a critical security layer for your OpenAI agent workflows in TypeScript. You learned how guardrails differ from event handlers, serving as specialized validation-focused functions that protect your agents from inappropriate or malicious inputs before processing begins.

Your security foundation now includes three complementary layers: secure data handling through context objects, comprehensive workflow monitoring through event handlers, and proactive input validation through guardrails. In the upcoming practice exercises, you'll apply these input guardrail implementation skills to build more sophisticated validation scenarios, test edge cases, and explore advanced patterns for protecting your agent systems. After mastering input validation through hands-on practice, you'll learn about output guardrails in the next lesson to complete your comprehensive agent security toolkit.

Previous Lesson

Next Lesson: Securing Agent Responses with Output Guardrails

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal