Introduction

We are now in the fourth and final lesson of this course on Beyond Basic RAG: Improving Our Pipeline! Up to this point, we have explored ways to enhance Retrieval-Augmented Generation (RAG) systems by refining chunking strategies and leveraging advanced retrieval methods. In this lesson, you will learn how to merge a lexical-based retrieval approach (using Okapi BM25) with your existing embedding-based retrieval mechanism, creating a powerful hybrid retrieval pipeline.

By the end of this lesson, you should be able to:

  1. Grasp the intuition behind Okapi BM25 for lexical retrieval.
  2. Construct a BM25 index on your corpus.
  3. Combine BM25 scores with embedding-based retrieval scores using a configurable weight parameter, alpha.

Let’s start by understanding what Okapi BM25 is and why it’s useful for retrieval.

Understanding the Okapi BM25 Algorithm

Within the category of lexical-based search methods, Okapi BM25 is a popular choice. It focuses on the presence of specific keywords, rewarding relevant chunks that contain more occurrences of the query terms. At the same time, it avoids overemphasizing repeated words by incorporating a saturation effect.

A few core ideas behind BM25:

  • Term Frequency (TF): More keyword matches in a chunk can signal higher relevance.
  • Document Length Normalization: BM25 accounts for chunk length, ensuring that very long chunks with many repeated words are not unfairly scored.

BM25's handling of term saturation via the k1 parameter (not shown here) limits the score contribution of repeated terms, preventing documents from being unfairly ranked just because they contain many repetitions of the same keyword. Even though this parameter isn't directly configured in the example, its default behavior is important when interpreting why a document with many keyword repetitions might not rank as high as expected.

Although the underlying formula has several parameters and normalizations, the general purpose is straightforward: favor chunks containing the search terms, but don't let them dominate purely by repeating keywords.

Now that you have a sense of what BM25 does, let’s see how to build a BM25 index for your chunked corpus.

Building a BM25 Index

To create a BM25 index from your chunked corpus, you can use the Bm25Index struct. It precomputes sparse embeddings from your text, allowing for efficient scoring of queries against your document chunks.

In this snippet, the Bm25Index struct holds both the embedder and the precomputed document embeddings. The new function builds the index from your chunks, and the score function computes similarity scores between a query and each chunk using a dot product over sparse vectors. This approach allows you to efficiently retrieve chunks that share keywords with the query.

With a BM25 index in place, the next step is to see how we can combine its scores with those from an embedding-based retrieval system.

Merging BM25 and Embedding-Based Retrieval: BM25 Scoring & Similarity Calculation

To combine scores from BM25 and embedding-based retrieval, we define a hybrid_retrieval function. This function brings together the strengths of both approaches by calculating and normalizing their respective scores.

In this function, we first compute the raw BM25 scores for all chunks and normalize them to a 0–1 range. We also run the embedding-based retrieval, which returns distances between the query and each chunk; these are converted to similarity scores using the formula 1 / (1 + distance). The results are stored in a hash map for efficient lookup. This setup prepares us to merge the two types of scores in the next step.

Let’s move on to how we actually combine and rank these scores to produce the final hybrid retrieval results.

Merging BM25 and Embedding-Based Retrieval: Normalizing Scores & Final Ranking

Now we combine and rank the results from both retrieval methods. The following code snippet demonstrates how to merge the normalized BM25 and embedding-based similarity scores using a weighted average controlled by the alpha parameter.

Here, each chunk receives a final score that is a weighted sum of its normalized BM25 score and its embedding similarity score. The alpha parameter determines the balance between the two. When tuning alpha, remember that BM25 and embedding scores are normalized separately. If one method’s scores are often zero or very sparse, it may contribute less to the final ranking—even with a higher weight. It’s helpful to log or plot both score distributions during development to spot any imbalance.

After calculating the combined scores, we sort the chunks in descending order and select the top k results. This approach ensures that both exact keyword matches and semantically similar content are considered in the final ranking.

With the merging logic in place, let’s see how to integrate this hybrid retrieval into a complete RAG workflow.

Putting It All Together

The main program demonstrates how to use this hybrid strategy in a RAG workflow. It shows how to load your dataset, build the BM25 index and embedding collection, and run a hybrid retrieval query.

In this example, we load and chunk the dataset, build both the BM25 and embedding-based indices, and then perform a hybrid retrieval for a sample query. The results are printed out, showing the top-ranked chunks along with their scores. This workflow can be adapted to your own datasets and queries.

Before you start experimenting, it’s important to understand how the alpha parameter affects the retrieval results.

Choosing the Alpha Parameter

The alpha parameter determines how much weight is given to BM25 (lexical) vs embedding-based (semantic) similarity.

  • High alpha (0.7+): Prioritize exact keyword matches.
  • Low alpha (0.3–): Focus more on meaning and semantic relevance.
  • Balanced alpha (≈ 0.5): A good default when you want both word match and contextual relevance.

For example, if your queries are likely to use the same terminology as your documents, a higher alpha may be beneficial. If you expect users to phrase things differently, a lower alpha will help the system find semantically similar content. You can tweak alpha depending on your data and goals.

Now that you know how to tune the hybrid retrieval, you’re ready to experiment and see the impact of different settings.

Conclusion and Next Steps

In this lesson, you explored how to enhance retrieval accuracy by combining Okapi BM25 with embedding-based methods. This hybrid retrieval strategy helps ensure you retrieve both exact matches and semantically related content. By adjusting the alpha parameter, you can fine-tune your system to prioritize precision, context, or both.

Next, you’ll have the opportunity to experiment with various settings and queries in practice exercises. Keep testing and iterating — you're now equipped to build more flexible and accurate RAG pipelines!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal