Introduction

Welcome to today's lesson, where we will explore the concept of output rails. Output rails play a crucial role in ensuring the safety and reliability of responses generated by language models. Building on the foundation of input rails, output rails help filter and manage the content produced by the model, ensuring it aligns with predefined safety policies. In this lesson, we will delve into configuring output rails within the NVIDIA NeMo Guardrails framework and understand their significance in maintaining safe interactions.

Understanding Output Rails

Output rails are mechanisms designed to filter and manage the responses generated by language models. They ensure that the content produced by the model adheres to safety policies, preventing the dissemination of explicit, abusive, or harmful language. By implementing output rails, you can maintain a professional and respectful tone in all interactions, enhancing the overall safety and reliability of the language model.

Configuring Output Rails

To configure output rails, you will need to modify the config/config.yaml file, just like when configuring input rails. This file contains the necessary instructions and tasks to evaluate the assistant's responses. Begin by defining the output flow in the config/config.yaml file. This flow will check the assistant's responses against the safety policies:

Here, the self check output flow is responsible for evaluating the assistant's responses.

Crafting Effective Output Prompts

Next, implement the self_check_output prompt in the config/prompts.yaml file. This prompt will guide the output rail's decision-making process. Creating effective prompts is essential for guiding the output rail's decision-making process. Let's explore how to craft prompts that ensure compliance with safety policies.

  1. Define Clear Policies:
    Clearly define the safety policies that the assistant's responses must adhere to. This ensures that the model understands the criteria for evaluating responses.

  2. Provide Examples:
    Use examples to illustrate the types of responses that comply with the safety policies. This helps the model generalize the decision-making process. By providing clear policies and examples, you guide the model to evaluate responses accurately and consistently.

Defining the Flow

The most basic flow for checking the output is identical to the input check flow; it executes the prompt, verifies whether it is safe or not, and decides what to do based on it.

Introducing Subflows

Subflows are specialized types of flows designed for explicit invocation by other flows or subflows. Unlike regular flows, which are automatically triggered based on conversation context, subflows require a direct call to be executed. This is done using the do keyword followed by the subflow's name.

To create a subflow, define it with the necessary logic that can be reused across different parts of the conversation. For instance, a subflow for checking inventory might look like this:

In a main flow, you can call this subflow to ensure the item is available before proceeding with an order:

Combining our Output Flow with Subflows

To effectively combine a subflow with the output flow, you can create a subflow that handles specific tasks and then integrate it into the main output flow. For example, let's create a subflow that logs any blocked responses for further analysis:

Now, integrate this subflow into the self check output flow to ensure that any blocked responses are logged:

In this example, if the response is not allowed, the log blocked response subflow is called to log the occurrence of a blocked response before the bot refuses to respond. This integration allows for better tracking and analysis of blocked responses.

Note: Output flows, like input flows, are referenced by name in the rails.output.flows list. The order in which they appear in this list matters when using multiple flows, as they will execute sequentially. Consider placing flows with the highest-priority checks (e.g., detecting offensive content) at the top to minimize processing time.

Summary and Preparation for Practice

In this lesson, you learned about the importance of output rails in ensuring the safety and reliability of language model responses. We explored how to configure output rails in the config/config.yaml file and craft effective prompts to guide the output rail's decision-making process. By implementing output rails, you can maintain a professional and respectful tone in all interactions, enhancing the overall safety of the language model.

Congratulations on reaching the end of the course! You have gained valuable knowledge and skills in prompt engineering and LLM safety. As you move forward, I encourage you to apply what you've learned in real-world scenarios and continue refining your skills. Well done!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal