Introduction: From Distance to Cosine Similarity

Welcome back! In the last lesson, you learned how to inspect the distance values that pgvector calculates between your query embedding and each product’s embedding. You saw that a smaller distance means a product is more similar to your search intent, and you practiced writing queries that include and order by this distance value.

Now, let’s take your understanding a step further. While distance is useful, sometimes it’s easier to work with a similarity score that ranges from 0 to 1, where 1 means “very similar” and 0 means “not similar at all.” This is where cosine similarity comes in. In this lesson, you will learn how to extract cosine similarity scores directly in your query results. This will help you quickly see how strong the match is between your query and each product, making your search results even more meaningful and easier to interpret.

Understanding the Cosine Similarity Query

To get cosine similarity scores in your results, you will use the <=> operator in pgvector. As a reminder, this operator calculates the cosine distance between two vectors. Cosine distance is a measure of how different two vectors are, with 0 meaning they are identical and 1 meaning they are completely different. However, for many applications, it’s more intuitive to work with similarity rather than distance.

To convert cosine distance to cosine similarity, you simply subtract the distance from 1. This way, a score closer to 1 means the vectors are more similar, and a score closer to 0 means they are less similar. In SQL, you can write this as 1 - (embedding <=> ${QUERY_EMBEDDING}). Here, embedding is the vector stored in your table, and ${QUERY_EMBEDDING} is the vector you are searching with.

Cosine similarity scores will always be between 0 and 1. This makes it easy to compare results and set thresholds for what you consider a “good” match.

Example: Querying and Interpreting Cosine Similarity

Let’s look at a practical example using the products table. Suppose you want to find the top 10 products that are most similar to your query embedding, and you want to see their cosine similarity scores. You can write the following SQL query:

In this query, you are selecting the product_id and product_name from the products table, just as before. The new part is 1 - (embedding <=> ${QUERY_EMBEDDING}) AS cosine_similarity. This calculates the cosine similarity between each product’s embedding and your query embedding and labels the result as cosine_similarity in the output. The ORDER BY cosine_similarity DESC clause ensures that the most similar products (those with the highest similarity) appear at the top of your results. Finally, LIMIT 10 restricts the output to the top 10 matches.

Here is what a sample output might look like:

product_idproduct_namecosine_similarity
1AI Smart Speaker0.88
3AI Camera0.81
5Smart Thermostat0.73
7Home Assistant0.67

In this table, you can see that "AI Smart Speaker" is the most similar to your query, with a cosine similarity of 0.88. The other products are ranked in order of decreasing similarity, so you can quickly see which ones are the best matches for your search.

Summary And Practice Preview

In this lesson, you learned how to extract cosine similarity scores from your vector search queries using pgvector. You saw how to use the <=> operator to calculate cosine distance and how to convert that distance into a similarity score that is easier to interpret. You also practiced reading and understanding the results, where higher scores mean stronger matches.

Next, you will get a chance to practice writing and running your own queries that extract and use cosine similarity scores. This hands-on experience will help you build confidence in using similarity scores to fine-tune your search results and make your vector search engine even more effective.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal