Welcome to the second unit of our course on building a RAG-powered chatbot! In the previous lesson, we built a document processor that forms the retrieval component of our RAG system. Today, we'll focus on the conversational aspect by creating a chat engine that can maintain conversation history.
While our document processor is excellent at finding relevant information, a complete RAG system needs a way to interact with users in a natural, conversational manner. This is where our chat engine comes in. The chat engine is responsible for managing the conversation flow, formatting prompts with relevant context, and maintaining a history of the interaction.
The chat engine we'll build today will:
- Manage interactions with the language model
- Maintain a history of the conversation
- Format prompts with relevant context from our document processor
- Provide methods to reset the conversation when needed
By the end of this lesson, you'll have a fully functional chat engine that can be integrated with the document processor we built previously to create a complete RAG system. Let's get started!
Let's begin by setting up the basic structure of our ChatEngine
class. This class will encapsulate all the functionality needed for managing conversations with the language model.
Python1from langchain_openai import ChatOpenAI 2from langchain.schema.messages import SystemMessage, HumanMessage, AIMessage 3from langchain.prompts import ChatPromptTemplate 4 5 6class ChatEngine: 7 def __init__(self): 8 self.chat_model = ChatOpenAI() 9 self.system_message = ( 10 "You are a helpful assistant that ONLY answers questions based on the " 11 "provided context. If no relevant context is provided, politely inform " 12 "the user that you don't have the necessary information to answer their " 13 "question accurately." 14 ) 15 16 # Initialize conversation history with system message 17 self.conversation_history = [SystemMessage(content=self.system_message)] 18 19 # Define the prompt template 20 self.prompt = ChatPromptTemplate.from_template( 21 "Answer the following question based ONLY on the provided context. " 22 "If the context doesn't contain relevant information to answer the " 23 "question, respond with 'I don't have enough information in the " 24 "provided context to answer this question.'\n\n" 25 "Context:\n{context}\n\n" 26 "Question: {question}" 27 )
In this initialization method, we're setting up several important components:
- Chat Model: We initialize
self.chat_model
usingChatOpenAI()
to create an instance of the OpenAI chat model for generating responses. - System Message: We define strict instructions that guide the AI's behavior, telling it to only answer questions based on provided context and to politely decline if no relevant context is available.
- Conversation History: We initialize this as a list containing our system message, which will store the entire conversation using LangChain's message schema classes.
- Prompt Template: We create a template using
ChatPromptTemplate.from_template()
that defines how we'll format prompts with placeholders for context and questions, with clear instructions to only use the provided context.
This structure ensures our chat engine can properly communicate with the language model while maintaining conversation state. The system message establishes strict boundaries for the AI's responses, which is particularly important in a RAG system where we want answers based solely on retrieved information.
Now that we have our basic class structure, let's implement the core functionality: sending messages and receiving responses. The send_message
method will handle this process.
Python1def send_message(self, user_message, context=""): 2 """Send a message to the chat engine and get a response""" 3 # Format the messages using the prompt template 4 messages = self.prompt.format_messages( 5 context=context, 6 question=user_message 7 ) 8 9 # Add the current message to the conversation history 10 self.conversation_history.append(HumanMessage(content=user_message)) 11 12 # Get the response from the model 13 response = self.chat_model.invoke(messages) 14 15 # Add the response to conversation history 16 self.conversation_history.append(AIMessage(content=response.content)) 17 18 return response.content
The send_message
method is the heart of our chat engine. It takes two parameters: user_message
(the question from the user) and context
(optional relevant information from our document processor).
- Format Messages: We use our prompt template to fill in placeholders with the provided context and question.
- Update History: We add the user's message to our conversation history as a
HumanMessage
. - Get Response: We invoke the chat model with our formatted messages using
self.chat_model.invoke(messages)
. - Store Response: We add the AI's response to conversation history as an
AIMessage
. - Return Result: We return the content of the response to be displayed to the user.
While we maintain conversation history, we don't currently use it in prompts to the model. This is intentional for our RAG system, where each query is answered based on retrieved context rather than previous exchanges. In future enhancements, you could include relevant parts of conversation history in the prompt.
An important aspect of any chat system is the ability to manage the conversation state. Let's implement a method to reset the conversation when needed:
Python1def reset_conversation(self): 2 """Reset the conversation history""" 3 self.conversation_history = [SystemMessage(content=self.system_message)]
The reset_conversation
method is straightforward but essential. It resets the conversation history to its initial state, containing only the system message. This is useful in several scenarios:
- When starting a new topic or conversation
- When the conversation has gone on for too long and might be approaching token limits
- When the user explicitly requests to start fresh
Managing conversation state is crucial for long-running chat applications. Language models have context window limitations, meaning they can only process a certain number of tokens at once. By providing a reset mechanism, we ensure that users can continue using the system indefinitely without running into these limitations. In a more advanced implementation, you might consider automatically truncating the conversation history when it reaches a certain length or implementing a more sophisticated memory system that summarizes previous exchanges. For our current purposes, however, a simple reset functionality is sufficient.
Now that we've built our chat engine, let's test it with some examples to see how it works in practice.
First, let's see how the chat engine responds when we don't provide any context:
Python1from chat_engine import ChatEngine 2 3# Initialize the chat engine 4chat_engine = ChatEngine() 5 6# Send a message without context 7query = "What is the capital of Spain?" 8 9# Get response without providing any context 10response = chat_engine.send_message(query) 11 12# Display the question and answer 13print(f"Question without context: {query}") 14print(f"Answer: {response}")
When you run this code, you'll see output similar to:
Plain text1Question without context: What is the capital of Spain? 2Answer: I don't have enough information in the provided context to answer this question.
As expected, the chat engine follows the instructions in our system message and refuses to answer without context. This is the desired behavior for our RAG system, which should only provide information based on the context it's given.
Now, let's see how the chat engine responds when we provide relevant context:
Python1# Define context about Madrid 2context = """Madrid is the capital and most populous city of Spain. 3The Royal Palace, Plaza Mayor, and Prado Museum are among its most famous landmarks.""" 4 5# Ask the same question but with context this time 6query = "What is the capital of Spain?" 7 8# Get response with context provided 9response = chat_engine.send_message(query, context) 10 11# Display the question and answer 12print(f"\nQuestion with context: {query}") 13print(f"Answer: {response}")
The output will look something like:
Plain text1Question with context: What is the capital of Spain? 2Answer: Madrid
With context provided, the chat engine now gives a detailed and accurate response based on the information available. This demonstrates how our RAG system will use retrieved documents to answer user queries.
Finally, let's test the reset functionality:
Python1# Reset the conversation history 2chat_engine.reset_conversation() 3 4# Define completely new context about a different topic 5context = """Python is a high-level, interpreted programming language created by 6Guido van Rossum and first released in 1991. It emphasizes code readability with 7its notable use of significant whitespace.""" 8 9# Ask a question about the new topic 10query = "When was Python first released?" 11 12# Get response after reset with new context 13response = chat_engine.send_message(query, context) 14 15# Display the new question and answer 16print(f"\nNew question with context: {query}") 17print(f"Answer: {response}")
The output might look like this:
Plain text1New question with context: When was Python first released? 2Answer: Python was first released in 1991.
After resetting the conversation, the chat engine will have completely cleared its previous context about Madrid and will now respond based on entirely new information about Python programming language. This reset functionality is important for managing long conversations and allowing users to switch topics cleanly.
In this lesson, we've built a powerful chat engine for our RAG chatbot. We've learned how to:
- Create a
ChatEngine
class that manages conversations with a language model - Define system messages to guide the AI's behavior
- Maintain conversation history using LangChain's message schema
- Format prompts with context and questions using templates
- Implement methods to send messages and reset conversations
- Test our chat engine with various scenarios
Our chat engine complements the document processor we built in the previous lesson. While the document processor handles the retrieval of relevant information, the chat engine manages the conversation flow and presents this information to the user in a natural way. In the next unit, we'll integrate the document processor and chat engine to create a complete RAG system. This integration will allow our chatbot to automatically retrieve relevant context from documents based on user queries, creating a seamless experience where users can ask questions about their documents and receive informed, contextual responses.
Get ready to practice what you've learned and take your RAG chatbot to the next level!