Introduction

Welcome to the lesson on adding input rails to enhance the safety and reliability of interactions with language models. Input rails play a crucial role in ensuring that user interactions with chatbots remain appropriate and secure. In this lesson, we will explore how input rails can be configured and implemented to filter and validate user inputs effectively. This builds upon the foundational knowledge of NeMo and Colang you gained in the previous lesson.

Understanding Input Rails

Input rails are mechanisms that guide and control user interactions with language models by filtering and validating inputs. They are essential for maintaining the integrity of chatbot conversations by preventing inappropriate or harmful interactions. Input validation ensures that user messages comply with predefined policies, safeguarding both the user and the system.

Crafting Effective Prompts for Input Validation

Crafting clear and comprehensive prompts is vital for effective input validation. To set up input rails, we use a new YAML configuration file, called prompts.yml. Let's analyze the prompt used for checking user input:

We create a prompt called self_check_input, with the content defined as above. A well-crafted prompt includes the following three areas:

  1. Task Definition: The task is to verify if the user message adheres to the defined policy.

  2. Policy Guidelines: The policy outlines what constitutes inappropriate or unsafe content.

  3. User Request Evaluation: The prompt evaluates the user request against the policy, determining if it should be blocked.

When implementing actions such as the input rails, it's important to note that action names, such as self_check_input, are predefined and mapped to specific functions written in Python source code, and must be used consistently within the configuration files to ensure proper functionality.

Actions

On top of the basic rails and flows in Colang, NeMo Guardrails supports actions. Actions are custom functions written in Python, available to be invoked from flows. NeMo Guardrails comes bundles with a list actions, and it is possible to add even more complex functionalities yourself.

Two commonly used actions are self_check_input and self_check_output.

Variables

Building on top of actions, we can get into variables. In Colang, context variables are used to store dynamic information.

You can set the value of a context variable directly or by using the result of an action execution. For example:

Context variables in Colang are dynamically typed, allowing them to hold various data types, including booleans, integers, floats, and strings. While variables can also store complex types like lists and dictionaries, these types must be assigned through the return value of an action, as they cannot be directly initialized to such values.

For example, if an action returns a list of recommended destinations, the variable assignment might look like: $destinations = execute fetch_recommendations. You can later iterate through or access specific elements in this list in your logic, though Colang’s syntax for this is currently limited and more advanced manipulations may require support within the action itself.

Conditions

We can use these variables in conditions to craft more complex flows. Colang supports two types of conditions: if/else and when/else when.

To alter the greeting based on variables, use if:

For branching based on user response, use when:

The if/else statement evaluates context variables, while when/else when branches based on user intent.

Implementing Input Flow Checks

With this context covered, it is time to define the flow that is responsible for the input check. Let's examine the logic behind the self check input flow:

We execute the self_check_input action, which reads the prompt defined in the prompts.yml file, and store the result (Yes or No) in the safe variable. Next, we check whether it is safe or not. If it is not safe, we call the pre-defined utterance bot refuse to respond, and stop the thread.

The bot refuse to respond is defined as the utterance I'm sorry, I can't respond to that.

Configuring Input Rails in YAML

We are finally ready to utilize our implemented input rail. To set it up, we use our YAML configuration file. Let's walk through the structure of such a file using the provided example, building on top of our config/config.yml file. By adding the flow name (self check input) under the section rails.input.flows, we activate it:

With the current setup, NeMo Guardrails will run the self check input flow, which in turn calls the self_check_input action with the accompanying prompt, verifying the safety of the user prompt and whether it should be blocked or not.

Input flows are referenced by name in the rails.input.flows list. The order in which they appear in this list matters when using multiple flows, as they will execute sequentially. Consider placing flows with the highest-priority checks (e.g., detecting offensive content) at the top to minimize processing time.

Additionally, due to the non-deterministic nature of language models, the bot may occasionally fail to follow the prompt exactly, which can impact response accuracy. Keep this in mind when evaluating the bot's performance.

Summary and Preparation for Practice

In this lesson, you learned how to configure and implement input rails to enhance the safety of interactions with language models. By understanding the structure of YAML configuration files and the logic of input flow checks, you are now equipped to apply these concepts in practice. As you proceed to the practice exercises, remember to configure and test input rails in a simulated environment. Congratulations on reaching the end of the course! Your dedication to learning and applying these skills is commendable. Keep exploring and refining your prompt engineering abilities.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal