Full-Text Search Combined with Vector Similarity

Introduction: Adding Keyword Search to Vector Queries

Welcome back! In the previous lesson, you learned how to combine traditional SQL filters — such as category, stock, and price — with vector similarity search using pgvector. This approach allowed you to narrow down your results using product attributes and then rank them by semantic relevance.

In this lesson, we will take your querying skills a step further by introducing full-text search into the mix. Specifically, you will learn how to filter your results using keywords in product descriptions and then use vector similarity to rank those results. This combination is especially useful when you want to find items that not only match a certain topic or keyword but are also semantically similar to your search intent.

By the end of this lesson, you will be able to write queries that combine keyword filtering and vector similarity, making your searches even more powerful and precise.

Understanding the Query: Full-Text and Similarity Together

Let’s look at the query you will master in this lesson:

This query does two important things. First, it uses ILIKE '%AI%' in the WHERE clause to filter products whose descriptions contain the word "AI," regardless of case. The ILIKE operator is a case-insensitive version of LIKE, which is helpful when you want to match keywords without worrying about capitalization.

Second, it adds another condition: embedding <-> ${QUERY_EMBEDDING} < 0.5. This means that only products whose vector embedding is within a certain distance (in this case, less than 0.5) from your query embedding will be considered. The <-> operator calculates the distance between two vectors, and a smaller value means higher similarity.

After filtering, the query orders the results by similarity using ORDER BY embedding <-> ${QUERY_EMBEDDING} and limits the output to the top 10 most similar products.

Example: Searching for Products with "AI"

Let’s see how this works in practice. Imagine you have a products table with various items, and you want to find products that mention "AI" in their descriptions and are also semantically similar to your search intent. Here’s the query again:

Suppose your products table looks like this:

product_id	product_name	description	embedding
1	AI Smart Speaker	The latest in AI-powered speakers.	[0.1, 0.2, ...]
2	Fitness Tracker	Tracks your steps and heart rate.	[0.3, 0.4, ...]
3	AI Camera	Capture moments with AI features.	[0.5, 0.6, ...]
4	Blender	High-speed kitchen blender.	[0.7, 0.8, ...]

If your ${QUERY_EMBEDDING} represents a search for "AI home devices," the query will first filter out any products whose descriptions do not contain "AI" (so "Fitness Tracker" and "Blender" are excluded). Then, among the remaining products, it will keep only those whose embedding is within 0.5 distance of your query embedding. Finally, it will sort these by similarity and return the top 10. The output might look like this:

product_id	product_name
1	AI Smart Speaker
3	AI Camera

This result shows you the most relevant "AI" products, ranked by how closely they match your search intent.

Customizing the Query

You can easily adjust this query to fit different needs. For example, if you want to search for products related to "robotics," you can change the keyword in the ILIKE clause to '%robotics%'. If you want to be more or less strict about how similar the products need to be, you can adjust the similarity threshold from 0.5 to another value, such as 0.3 for stricter matches or 0.7 for broader results.

The ${QUERY_EMBEDDING} variable can also be changed to represent different search intents, depending on what you are looking for. Each of these changes will affect which products are returned and how they are ranked, giving you fine control over your search results.

Summary and Practice Preview

In this lesson, you learned how to combine full-text keyword search with vector similarity in a single SQL query using pgvector. By filtering products with a keyword in their descriptions and then ranking them by semantic similarity, you can create highly targeted and relevant search experiences.

This technique is especially useful when you want to ensure that your results are both topically and semantically aligned with your search intent. In the next set of practice exercises, you will get hands-on experience writing and modifying queries like the one we covered here. This will help you build confidence in combining full-text and vector search, preparing you for even more advanced querying techniques in the rest of the course.

Previous Lesson

Next Lesson: Distance Inspection and Ranking Explanation

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal