Protecting Agents with Input Guardrails

Introduction & Context

In the previous lessons, you learned how to securely handle sensitive data using context objects and how to monitor agent workflows with event handlers. Now, you're ready to tackle the next critical layer of agent security: input guardrails.

While context management protects your internal data and event handlers give you visibility into agent behavior, input guardrails protect your agents from potentially harmful, inappropriate, or malicious user inputs before they even begin processing. Imagine real-world scenarios where your agents might face problematic inputs: a travel assistant could receive requests for illegal activities, a customer service bot might be asked to perform tasks outside its scope, or a content creation agent could be prompted to generate inappropriate material. Without proper input validation, your agents could waste resources, violate company policies, or even expose security vulnerabilities.

The most common and critical use of guardrails is at the input stage — validating user requests before your agent begins processing. Input guardrails serve as your first line of defense, ensuring that only safe, appropriate, and policy-compliant inputs are allowed to reach your agents.

Guardrails vs. Event Handlers

As you build more capable and autonomous OpenAI agents, security becomes a multi-layered challenge. Guardrails are a foundational security mechanism designed to protect your agents from a wide range of problematic scenarios — whether that’s malicious user input, requests that violate business policies, or attempts to push your agent outside its intended scope.

It's important to distinguish between event handlers and guardrails:

Event Handlers are general-purpose lifecycle callbacks that give you visibility and control over agent execution. They're designed for monitoring, logging, dynamic context injection, and observability across your entire workflow. Event handlers can run at various points during agent execution and are primarily focused on understanding and controlling what happens during the agent’s processing.
Guardrails are specialized validation functions dedicated to keeping agents safe. They evaluate inputs (or outputs) to ensure requests comply with your rules before the agent starts working or before a response is returned to the user.

Both are essential, but they serve different purposes. Event handlers enhance visibility and coordination, while guardrails directly protect against unsafe or policy-violating behavior.

How Input Guardrails Work

Input guardrails in the OpenAI Agents SDK are functions that execute before your agent receives the user’s request. Each guardrail implements an execute method that receives two arguments: the raw input and the current context. The function returns an object containing:

outputInfo: human-readable details describing the guardrail’s decision
tripwireTriggered: a boolean indicating whether the input should be blocked (true) or allowed (false)

When a guardrail returns tripwireTriggered: true, the SDK stops the request, throws an InputGuardrailTripwireTriggered exception, and prevents the agent from running. If all guardrails return false, the agent executes normally.

Here's the basic structure of an input guardrail:

Let's break down the key parts:

The name property is a human-readable identifier for your guardrail.

Simple Rule-Based Input Validation

Now that you understand the basic structure of input guardrails, let's implement a simple example that demonstrates these concepts in action. Before moving to sophisticated LLM-based validation, we'll start with a straightforward rule-based approach using keyword detection.

Here's a basic input guardrail that checks for inappropriate travel-related content:

This guardrail follows the required structure: it has a name, an execute method, and returns an object with outputInfo and tripwireTriggered.

Attaching Guardrails to Agents

Once you've created your input guardrail, you need to attach it to your agent so that it runs automatically before the agent processes any input. You do this by including your guardrail in the inputGuardrails property when creating your agent:

When you attach guardrails to an agent, they become an integral part of that agent's processing pipeline. Every time someone tries to run that agent with user input, the guardrails execute first, ensuring that your validation logic is consistently applied.

Testing Guardrail Behavior with Blocked Inputs

Let's see what happens when you try to run your agent with an input that triggers the guardrail. When your guardrail sets tripwireTriggered: true, the SDK throws an InputGuardrailTripwireTriggered exception instead of proceeding with normal agent execution.

Here's how to test and handle this behavior in JavaScript:

When you run this code, you'll see that the request gets blocked because it contains the prohibited term "gamble," and the output will be:

The InputGuardrailTripwireTriggered exception allows you to handle blocked inputs gracefully and provide appropriate feedback to users. This exception handling pattern is essential for production applications where you need to differentiate between successful agent execution and blocked requests.

Understanding Concurrent Execution with Guardrails

It's important to understand that when you run an agent with input guardrails, both the guardrail validation and your main agent begin executing simultaneously. This concurrent execution means that if you're streaming the response from your main agent, you might see the beginning of the agent's response being printed before the guardrail completes its validation.

If the guardrail determines that the input should be blocked, the main agent's execution will be terminated and you'll receive the InputGuardrailTripwireTriggered exception, even if some output was already generated. This behavior is designed to maximize performance by not waiting for guardrail validation to complete before starting the main agent, but it means you should be prepared to handle cases where partial output might be generated before a request is ultimately blocked.

In production applications, you may want to consider whether to show partial responses to users or wait for complete validation before displaying any output, depending on your specific security and user experience requirements.

Creating LLM-Based Content Validation

While rule-based guardrails are effective for obvious violations, they can be limited by their reliance on exact keyword matches. For more sophisticated content analysis, you can use another AI agent to intelligently evaluate user inputs. This creates a "guardrail agent" that can understand context, detect subtle attempts to circumvent restrictions, and make nuanced decisions about input appropriateness.

First, define a structured output model that your guardrail agent will use to communicate its decisions. In JavaScript, you can use the zod library for this:

This ContentCheckOutput schema ensures that your guardrail agent provides both a clear boolean decision (containsProhibitedContent) and human-readable reasoning for that decision.

Next, create the guardrail agent itself, which is a specialized agent designed specifically for content validation:

This guardrail agent has focused instructions that tell it exactly what to look for in user inputs. By specifying outputType: ContentCheckOutput, you ensure the agent's response always follows your structured format.

Implementing LLM-Based Guardrail Functions

Now implement the guardrail function that uses your guardrail agent to validate inputs:

This function demonstrates the complete flow of LLM-based input validation. You use run() to execute your guardrail agent with the user's input. The guardrail agent analyzes the input and returns a structured response containing both the decision and reasoning.

The object you return uses the guardrail agent's reasoning as the outputInfo and its boolean decision as tripwireTriggered. When tripwireTriggered is true, the SDK prevents the input from reaching your main agent and throws an InputGuardrailTripwireTriggered exception.

Testing LLM-Based Validation

You can now update your travel agent to use this more sophisticated guardrail:

Let's test the LLM-based guardrail with an inappropriate input:

When you run this code, you'll see output like this:

This output demonstrates how the LLM-based guardrail agent analyzes the input, provides reasoning for its decision, and blocks the inappropriate request. The guardrail agent can understand the context and intent behind the request, even when it doesn't use obvious prohibited keywords.

Combining Multiple Guardrails

The inputGuardrails property accepts an array of guardrail objects, which means you can attach multiple guardrails to a single agent for layered validation. When multiple guardrails are attached, they execute in the order they appear in the array, and all guardrails must pass for the input to proceed:

This design allows you to create layered validation strategies, where fast rule-based checks catch obvious violations first, followed by more sophisticated LLM-based analysis for nuanced content evaluation. Each guardrail can focus on a specific aspect of validation while working together to provide comprehensive input protection.

Summary & Preparing for Practice

In this lesson, you've mastered the implementation of input guardrails as a critical security layer for your OpenAI agent workflows in JavaScript. You learned how guardrails differ from event handlers, serving as specialized validation-focused functions that protect your agents from inappropriate or malicious inputs before processing begins.

Your security foundation now includes three complementary layers: secure data handling through context objects, comprehensive workflow monitoring through event handlers, and proactive input validation through guardrails. In the upcoming practice exercises, you'll apply these input guardrail implementation skills to build more sophisticated validation scenarios, test edge cases, and explore advanced patterns for protecting your agent systems. After mastering input validation through hands-on practice, you'll learn about output guardrails in the next lesson to complete your comprehensive agent security toolkit.

Previous Lesson

Next Lesson: Securing Agent Responses with Output Guardrails

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal