Retrieving and Utilizing Relevant Chunks with Java in RAG Systems

Introduction

Welcome to the last lesson in our journey through Scaling Up RAG with Vector Databases! Well done, you're at the end of this course. In the previous lesson, you learned how to split or chunk your text data and store those chunks in a vector database collection. Now, we'll delve into retrieving the most relevant chunks for any given query and building an LLM prompt to produce more accurate, context-driven answers.

Metadata Filtering

Chroma supports filtering by metadata and document contents using the where filter. This filter allows you to specify conditions that the metadata must meet for a document to be included in the results of a query. The where filter is structured as follows:

Supported Operators

$eq: Equal to (string, int, float)
$ne: Not equal to (string, int, float)
$gt: Greater than (int, float)
$gte: Greater than or equal to (int, float)
$lt: Less than (int, float)
$lte: Less than or equal to (int, float)
$in: A value is in a predefined list (string, int, float, bool)
$nin: A value is not in a predefined list (string, int, float, bool)

You can combine multiple filters using logical operators $and and $or.

$and: Returns results that match all of the filters in the list.
$or: Returns results that match any of the filters in the list.

For example, this map will select all the documents with the category "food" and written by one of the authors of the list ["Robert", "John", "Daniel"]

Filtering Documents with Full Text Search

To filter based on document contents, you need to provide a where_document filter dictionary in your query. This dictionary supports two filtering keys: $contains and $not_contains. The structure of the dictionary is as follows:

For example, this map will select all the documents that contain the word "technology" but not "AI"

Retrieving the Most Relevant Chunks

Before your LLM can generate a coherent, context-rich answer, you need to fetch the right information. Your vector database will rank which document chunks are most relevant for a given query.

Let's break down this code in detail:

Function Definition:
- retrieveTopChunks takes three parameters:
  - query: The user's question or search term;
  - collection: The vector database object containing our embedded documents;
  - topK: The number of most relevant chunks to retrieve.
Vector Search:
- The collection.query() function performs a vector-based similarity search to pinpoint which chunks are most aligned with the .

Building a Prompt for the LLM

Once you have your relevant chunks, the next step is constructing a prompt that ensures the LLM focuses on precisely those chunks. This helps maintain factual accuracy and a tight context.

Why is this important?

Controlled Context: By explicitly instructing the LLM to focus on the given context, you reduce the probability of hallucinations.
Flexibility: You can modify the prompt format — like adding bullet points or rewording instructions — to direct the LLM's style or depth of response.
Clarity: Including the question upfront reminds the model of the exact query it must address.

We'll be seeing an actual prompt example later in the lesson!

Querying the Database and Generating Answers

With your collection in place, it's time to retrieve the most relevant chunks and put them to use in your prompt. The snippet below ties everything together: from forming the query, to constructing the prompt, and finally getting the answer from your Large Language Model.

Here's what's happening step by step:

Formulating the Query: We define a query string that reflects the user's question or information request.
Retrieving Chunks: Using retrieveTopChunks, you get the top five chunks that closely match the query based on semantic similarity.
Prompt Construction: The function buildPrompt takes the user's question and the retrieved chunks to assemble a cohesive prompt.
LLM Response: Finally, get_llm_response is called with the constructed prompt, prompting the model to generate a context-informed answer.

By printing both the prompt and the answer, you can debug, refine, and further tailor your approach to retrieval and prompt design.

Examining the Output

Below is an example of the system's final output after retrieving the most relevant chunks and assembling them into a prompt:

In this snippet, the prompt clearly instructs the LLM to focus on the listed chunks. By doing so, the final LLM Answer highlights the key points about recent breakthroughs in renewable energy, healthcare innovations, and sustainable materials, reflecting the relevance of the context. Interestingly, the chunk referencing the Industrial Revolution is not directly invoked in the final answer, showcasing the LLM's ability to select and incorporate only the most suitable context. Notice how each retrieved chunk contributes to a coherent, context-based response, demonstrating how RAG systems help reduce hallucinations and maintain factual alignment.

Conclusion and Next Steps

In this lesson, you discovered how to:

Retrieve the most relevant text chunks in your vector database through semantic similarity.
Construct a well-structured prompt so the LLM stays true to the provided text.

These steps are central to building a robust Retrieval-Augmented Generation pipeline. By creating focused, context-driven prompts, your LLM's responses tend to be more accurate and trustworthy.

Next, you'll have the opportunity to practice and solidify this knowledge. Look for the exercises that follow to test retrieving chunks with different queries, adjusting the prompt format, and experimenting with how the LLM responds. Keep pushing those boundaries — your mastery of RAG systems is well underway!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal