Introduction

Welcome to the third lesson in our journey through "Scaling Up RAG with Vector Databases"! Well done, you're halway through this course. In the previous lesson, you learned how to split or chunk your text data and store those chunks in a vector database collection. Now, we'll delve into retrieving the most relevant chunks for any given query and building an LLM prompt to produce more accurate, context-driven answers.

Retrieving the Most Relevant Chunks

Before your LLM can generate a coherent, context-rich answer, you need to fetch the right information. Your vector database (for instance, using Chroma) will rank which document chunks are most relevant for a given query.

Python
1def retrieve_top_chunks(query, collection, top_k=2): 2 """ 3 Retrieves the top_k chunks relevant to the given query from 'collection'. 4 Returns a list of retrieved chunks, each containing 'chunk' text, 5 'doc_id', and 'distance'. 6 """ 7 # Search for top_k results matching the user's query 8 results = collection.query( 9 query_texts=[query], 10 n_results=top_k 11 ) 12 13 retrieved_chunks = [] 14 15 # Safeguard in case no results are found 16 if not results['documents'] or not results['documents'][0]: 17 return retrieved_chunks 18 19 # Gather each retrieved chunk, along with its distance score 20 for i in range(len(results['documents'][0])): 21 retrieved_chunks.append({ 22 "chunk": results['documents'][0][i], 23 "doc_id": results['ids'][0][i], 24 "distance": results['distances'][0][i] 25 }) 26 return retrieved_chunks

Let's break down this code in detail:

  • Function Definition:

    • retrieve_top_chunks takes three parameters:
      • query: The user's question or search term;
      • collection: The Chroma collection object containing our embedded documents;
      • top_k: The number of most relevant chunks to retrieve (default is 2).
  • Vector Search:

    • collection.query() function performs a vector-based similarity search to pinpoint which chunks are most aligned with the query.
    • query_texts=[query] passes the user's query as a list (Chroma's API expects a list).
    • n_results=top_k specifies how many matching chunks to return.
  • Results Structure:

    • The query returns a dictionary with multiple keys:
      • 'documents': Contains the actual text chunks;
      • 'ids': Contains the document identifiers;
      • 'distances': Each result includes a distance, which indicates how semantically close a chunk is to your query — the lower the distance, the better the match.
    • Each of these keys maps to a nested list structure: [[item1, item2, ...]].
  • Processing Results:

    • For each result, the function creates a dictionary with three key pieces of information:
      • "chunk": The actual text content from results['documents'][0][i];
      • "doc_id": The document identifier from results['ids'][0][i];
      • "distance": The similarity score from results['distances'][0][i];
    • These dictionaries are appended to the retrieved_chunks list, which is then ultimately returned.
Building a Prompt for the LLM

Once you have your relevant chunks, the next step is constructing a prompt that ensures the LLM focuses on precisely those chunks. This helps maintain factual accuracy and a tight context.

Python
1def build_prompt(query, retrieved_chunks): 2 """ 3 Constructs a prompt by restating the 'query' and adding the retrieved chunks 4 as an inline context for the LLM. 5 """ 6 prompt = f"Question: {query}\nAnswer using only the following context:\n" 7 for rc in retrieved_chunks: 8 prompt += f"- {rc['chunk']}\n" 9 prompt += "Answer:" 10 return prompt

Why is this important?

  1. Controlled Context: By explicitly instructing the LLM to focus on the given context, you reduce the probability of hallucinations.
  2. Flexibility: You can modify the prompt format — like adding bullet points or rewording instructions — to direct the LLM's style or depth of response.
  3. Clarity: Including the question upfront reminds the model of the exact query it must address.

We'll be seeing an actual prompt example later in the lesson!

Integrating the Corpus and Creating the Collection

To see this in action, you'll first need to load your corpus data and create a collection in your vector database. This ensures your text chunks are accessible for the retrieval process.

Below is an example of how you might load documents from a JSON file, initialize an embedding model, and create (or retrieve) a collection in your chosen vector database:

Python
1# Load corpus data from JSON file 2with open('data/corpus.json', 'r') as f: 3 corpus_data = json.load(f) 4 5model_name = 'sentence-transformers/all-MiniLM-L6-v2' 6embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name) 7client = Client(Settings()) 8collection = client.get_or_create_collection("rag_collection", embedding_function=embed_func) 9 10# Batch add documents from the corpus data 11documents = [doc['content'] for doc in corpus_data] 12ids = [f"chunk_{doc['id']}_0" for doc in corpus_data] 13collection.add(documents=documents, ids=ids)

Key Details

  • Embedding Function: Here, SentenceTransformerEmbeddingFunction is used for generating vector representations of your text. You can replace it with another embedding model suited to your needs.
  • Collection: Instead of manually creating a new collection each time, get_or_create_collection either retrieves an existing one or initializes a fresh collection for you.
  • Bulk Ingestion: By batching documents, you efficiently add multiple items to your vector database at once.
Querying the Database and Generating Answers

With your collection in place, it's time to retrieve the most relevant chunks and put them to use in your prompt. The snippet below ties everything together: from forming the query, to constructing the prompt, and finally getting the answer from your Large Language Model.

Python
1query = "What are some recent technological breakthroughs?" 2retrieved_docs = retrieve_top_chunks(query, collection, top_k=5) 3final_prompt = build_prompt(query, retrieved_docs) 4answer = get_llm_response(final_prompt) 5 6print("Prompt:\n") 7print(final_prompt) 8print("\nLLM Answer:", answer)

Here's what's happening step by step:

  1. Formulating the Query: We define a query string that reflects the user's question or information request.
  2. Retrieving Chunks: Using retrieve_top_chunks, you get the top five chunks that closely match the query based on semantic similarity.
  3. Prompt Construction: The function build_prompt takes the user's question and the retrieved chunks to assemble a cohesive prompt.
  4. LLM Response: Finally, get_llm_response is called with the constructed prompt, prompting the model to generate a context-informed answer.

By printing both the prompt and the answer, you can debug, refine, and further tailor your approach to retrieval and prompt design.

Examining the Output

Below is an example of the system's final output after retrieving the most relevant chunks and assembling them into a prompt:

1Prompt: 2 3Question: What are some recent technological breakthroughs? 4Answer using only the following context: 5- The Industrial Revolution brought significant technological and social changes. It reshaped economies and altered the fabric of society. Scholars examine its impact on labor, innovation, and modern industrial practices. 6- Breakthroughs in renewable energy technologies are reducing global dependence on fossil fuels. Solar and wind systems are becoming more efficient and affordable. These innovations are crucial to combating climate change and ensuring a sustainable future. 7- The digital revolution is transforming how we approach health and wellness. Technological innovations, from fitness trackers to health apps, are empowering individuals to manage their well-being. This integration of technology and lifestyle is reshaping daily habits for a healthier future. 8- Advances in medical technology are revolutionizing patient care through new diagnostic and treatment methods. Breakthroughs in imaging and robotics are enhancing the precision of medical procedures. Healthcare professionals are optimistic about the potential for improved outcomes. 9- Scientists are developing renewable materials that could replace traditional plastics. Innovations in biopolymers are leading to sustainable manufacturing practices. These breakthroughs promise to reduce environmental waste and support a circular economy. 10Answer: 11 12LLM Answer: Recent technological breakthroughs include advancements in renewable energy technologies, which are making solar and wind systems more efficient and affordable, thereby reducing global dependence on fossil fuels and aiding in the fight against climate change. Additionally, the digital revolution is enhancing health and wellness through innovations like fitness trackers and health apps, empowering individuals to better manage their well-being. In the medical field, new diagnostic and treatment methods, along with improvements in imaging and robotics, are revolutionizing patient care and enhancing the precision of medical procedures. Furthermore, scientists are developing renewable materials, such as biopolymers, to replace traditional plastics, promoting sustainable manufacturing practices and supporting a circular economy.

In this snippet, the prompt clearly instructs the LLM to focus on the listed chunks. By doing so, the final LLM Answer highlights the key points about recent breakthroughs in renewable energy, healthcare innovations, and sustainable materials, reflecting the relevance of the context. Interestingly, the chunk referencing the Industrial Revolution is not directly invoked in the final answer, showcasing the LLM's ability to select and incorporate only the most suitable context. Notice how each retrieved chunk contributes to a coherent, context-based response, demonstrating how RAG systems help reduce hallucinations and maintain factual alignment.

Conclusion and Next Steps

In this lesson, you discovered how to:

  • Retrieve the most relevant text chunks in your vector database through semantic similarity.
  • Construct a well-structured prompt so the LLM stays true to the provided text.

These steps are central to building a robust Retrieval-Augmented Generation pipeline. By creating focused, context-driven prompts, your LLM's responses tend to be more accurate and trustworthy.

Next, you'll have the opportunity to practice and solidify this knowledge. Look for the exercises that follow to test retrieving chunks with different queries, adjusting the prompt format, and experimenting with how the LLM responds. Keep pushing those boundaries — your mastery of RAG systems is well underway!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal