Welcome to the first lesson of the course, where we begin our journey into advanced querying with pgvector. In this lesson, you will learn how to combine traditional SQL filters with vector similarity search. This is a powerful technique that allows you to narrow down your search results using product attributes like category
, stock
, and price
, and then rank those results by how semantically similar they are to a given query. This approach is common in real-world applications, such as e-commerce platforms, where users want to find products that not only match certain criteria but are also most relevant to their needs. By the end of this lesson, you will be able to write queries that combine these two types of filtering, setting a strong foundation for more advanced techniques in the rest of the course.
Before we dive into combining filters, let’s quickly review how vector search works in SQL using the pgvector extension. In pgvector, you can store vector embeddings — numerical representations of text or other data — in a special column type. To compare how similar two vectors are, you use the <->
operator, which calculates the L2 distance between them. The smaller the distance, the more similar the vectors are.
For example, if you have a table of products and each product has an embedding
column, you can find the products most similar to a query embedding like this:
This query returns the 10 products whose embeddings are closest to the query embedding. The ${QUERY_EMBEDDING}
is a placeholder for the vector you want to compare against, such as the embedding of a search phrase.
Now, let’s see how you can add traditional SQL filters to your vector search. In real-world scenarios, you often want to filter your data by certain attributes before ranking by similarity. For example, you might want to consider only products in the "Electronics" category, with more than 100 items in stock, and a price above $300. By applying these filters first, you reduce the number of items that need to be compared by vector similarity, which can improve both the relevance and performance of your search.
In SQL, you use the WHERE
clause to add these filters. You can combine multiple conditions using AND
to make sure all criteria are met before the vector similarity is calculated.
Let’s walk through a complete example that combines multiple filters with vector search, this time using a join to reference category names from a separate categories
table. Here is the query:
In this query, we are selecting the product_id
and product_name
from the products
table (aliased as p
). We use a JOIN
to connect the products
table to the categories
table (aliased as c
) using the category_id
field. The WHERE
clause applies three filters: the product must belong to the "Electronics" category (as found in the categories
table), have a stock_quantity
greater than 100, and a price
greater than 300. Only products that meet all these conditions will be considered in the next step.
After filtering, the ORDER BY p.embedding <-> ${QUERY_EMBEDDING}
part sorts the remaining products by how close their embedding is to the query embedding. The ensures that only the top 10 most similar products are returned.
In this lesson, you learned how to combine multiple SQL filters with vector similarity search using pgvector. By filtering your data first and then ranking by vector similarity, you can create more targeted and efficient search queries. This technique is especially useful in applications where both metadata and semantic relevance matter. In the next set of exercises, you’ll get hands-on practice writing and modifying queries like the one we covered here. This will help you build confidence in combining filters and vector search, preparing you for even more advanced querying techniques in the rest of the course.
