Introduction: Inspecting Distance in Vector Search Results

Welcome back! In the last lesson, you learned how to combine full-text search with vector similarity to make your queries more precise and relevant. As a quick reminder, you used SQL to filter products by keywords in their descriptions and then ranked those results by how semantically similar they were to your search intent.

Now, let’s take your understanding a step further. In this lesson, you will learn how to inspect the actual distance values that pgvector calculates between your query embedding and each product’s embedding. Seeing these distance values in your results is important because it helps you understand why certain products are ranked higher than others. It also gives you more confidence in the results, since you can see exactly how similar each product is to your search query. By the end of this lesson, you will know how to write a query that not only returns the most relevant products but also shows you the distance score for each one.

Understanding the Distance Value in Results

You have already seen the <-> operator in action, which calculates the distance between two vectors. As a reminder, in the context of pgvector, this operator is used to measure how similar or different two embeddings are. The result is a numeric value called the "distance." The smaller this value is, the more similar the two vectors are. In other words, a product with a distance of 0.12 to your query embedding is more similar to your search intent than a product with a distance of 0.45.

When you include the distance value in your query results, you can see exactly how close each product is to your search. This is especially useful when you want to understand the ranking of your results or when you need to set a threshold for what counts as a "good match." Remember, the distance value is just a number, but it tells you a lot about the relationship between your query and each product.

Example: Querying and Ranking with Distance Output

Let’s look at a practical example. Suppose you want to see not just the top products for your search, but also how similar each one is to your query. You can do this by selecting the distance value in your query and giving it a clear name, such as distance. Here is how you would write this query:

In this query, you are selecting the product_id and product_name from the products table, just as before. The new part is embedding <-> ${QUERY_EMBEDDING} AS distance. This calculates the distance between each product’s embedding and your query embedding, and then labels the result as distance in the output. The ORDER BY distance clause ensures that the most similar products (those with the smallest distance) appear at the top of your results. Finally, LIMIT 10 restricts the output to the top 10 matches.

Here is what a sample output might look like:

product_idproduct_namedistance
1AI Smart Speaker0.12
3AI Camera0.19
5Smart Thermostat0.27
7Home Assistant0.33

In this table, you can see that "AI Smart Speaker" is the most similar to your query, with a distance of 0.12. The other products are ranked in order of increasing distance, so you can quickly see which ones are the best matches.

Summary and Practice Preview

In this lesson, you learned how to inspect the distance values that pgvector calculates between your query embedding and each product’s embedding. By including the distance in your query results, you gain a clearer understanding of why certain products are ranked higher than others. You also learned how to interpret these values: the smaller the distance is, the more similar the product is to your search intent.

Next, you will get a chance to practice writing queries that include and interpret distance values. This hands-on experience will help you build confidence in using distance inspection to fine-tune your search results and make your vector search engine even more effective.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal