Now that you have learned how to run nearest neighbor queries using different distance metrics in pgvector, it is important to go a step further and understand the actual numbers behind those results. In the previous lesson, you saw how to retrieve the most similar products to a given embedding, but the queries only showed you the product IDs and names, ordered by similarity. While this is useful, sometimes you need to see the raw distance or similarity scores themselves. These scores can help you understand how close or far apart items are in the embedding space, set thresholds for filtering results, or debug your search system.
In this lesson, you will learn how to modify your queries to display these distance and similarity values directly in your results. This will give you more insight into how your vector search is working and help you make better decisions about which products to show or recommend.
Let’s start by looking at how to view the actual L2 (Euclidean) distance values in your search results. As a reminder, the <->
operator in pgvector is used to calculate the Euclidean distance between two vectors. In the previous lesson, you used this operator to order your results, but you did not display the distance values themselves.
To include the distance in your output, you can add an extra column to your SELECT
statement. Here is an example query that shows the product_id
, product_name
, and the L2 distance from your query embedding:
In this query, ${QUERY_EMBEDDING}
should be replaced with the embedding vector you want to compare against. The embedding <-> ${QUERY_EMBEDDING}
part calculates the Euclidean distance between each product’s embedding and your query embedding, and the result is shown in a column called distance
. The results are ordered so that the products with the smallest distance (i.e., most similar) appear first.
For example, your output might look like this:
Here, you can see not only which products are most similar to your query but also how close they are in the embedding space. A smaller distance means a higher similarity.
Another common way to measure similarity between vectors is cosine similarity. In pgvector, the <=>
operator gives you the cosine distance, which is a value between 0 and 2, where 0 means the vectors are identical in direction. However, in many applications, it is more useful to see the cosine similarity, which ranges from 1 (most similar) to -1 (most dissimilar).
To convert cosine distance to cosine similarity, you can subtract the distance from 1. Here is how you can write a query to show the cosine similarity for each product:
In this query, embedding <=> ${QUERY_EMBEDDING}
calculates the cosine distance, and 1 - (...)
converts it to cosine similarity. The results are ordered so that the products with the highest similarity appear first.
A sample output might look like this:
Here, a higher cosine similarity means the product is more similar to your query in terms of direction in the embedding space. This is especially useful in semantic search, where you care about the meaning or context rather than the exact values.
When you look at the results from both queries, you will notice that the order of products is usually the same or very similar, but the scores themselves are different. L2 distance gives you a sense of how far apart two products are in the embedding space, while cosine similarity tells you how closely aligned they are in terms of direction.
For example, if you see a product with a very small L2 distance or a very high cosine similarity, you can be confident that it is highly relevant to your query. On the other hand, if the distance is large or the similarity is low, the product is less relevant. These scores can help you set thresholds for filtering results, such as only showing products with a cosine similarity above 0.95.
It is also helpful to compare the actual values to get a feel for what is considered "close" or "similar" in your specific dataset. Over time, you will develop an intuition for what these numbers mean in practice.
In this lesson, you learned how to inspect the actual distance and similarity scores behind your nearest neighbor search results in pgvector. You saw how to display L2 (Euclidean) distance values and how to calculate and interpret cosine similarity scores in your SQL queries. Understanding these numbers will help you make better decisions about which products to show, set thresholds for filtering, and debug your search system.
Next, you will get a chance to practice writing and running these queries yourself. This hands-on experience will help you become more comfortable with interpreting distance and similarity scores and using them to improve your vector search results.
