Introduction: The Role of Indexes in Vector Search

Welcome to the first lesson of this course on indexing, optimization, and scaling with pgvector. In this lesson, you will learn how indexes play a crucial role in making vector search fast and efficient. As you may know, vector search is used to find similar items — such as products, documents, or images — based on their vector representations. Without indexes, searching through large datasets would be slow, as the database would have to scan every row to find the closest matches. Indexes help speed up this process by organizing the data in a way that makes searching much faster. By the end of this lesson, you will know how to create, drop, and monitor indexes in pgvector, which is the foundation for building scalable and high-performance vector search systems.

Types of Indexes in pgvector

pgvector supports two main types of indexes for vector search: IVFFlat and HNSW. Each has its own strengths and is suited for different use cases.

The IVFFlat index (Inverted File with Flat quantization) is a popular choice for large datasets where you want to balance search speed and accuracy. It works by dividing the data into clusters, which makes searching more efficient. You can control the number of clusters using the lists parameter. IVFFlat is a good default for most applications where you need fast, approximate nearest neighbor search.

The HNSW index (Hierarchical Navigable Small World) is another option that is often used for high-accuracy searches. HNSW builds a graph structure that allows for very fast and accurate searches, especially when you care about finding the closest matches. It is a bit more complex and can use more memory, but it is a great choice when you need top performance.

Choosing between IVFFlat and HNSW depends on your specific needs. If you want a balance between speed and resource usage, IVFFlat is a solid choice. If you need the highest accuracy and can afford more memory, HNSW may be better.

How to Drop and Recreate Indexes (with Examples)

Let’s look at how you can manage indexes in pgvector using SQL commands. Sometimes, you may need to drop and recreate an index, for example, if you want to change its parameters or switch to a different index type.

To drop an existing IVFFlat index on the embedding column of the products table, you can use the following command:

This command safely removes the index if it exists, so you can recreate it with new settings. To create a new IVFFlat index, you can use:

Here, lists = 100 sets the number of clusters for the IVFFlat index. Increasing the number of lists can improve search accuracy but may use more memory and take longer to build. The vector_l2_ops operator tells pgvector to use L2 (Euclidean) distance for measuring similarity between vectors during search.

If you want to use an HNSW index instead, you would first drop the old index (if it exists) and then create the new one:

In this example, vector_cosine_ops tells pgvector to use cosine similarity for searching. You can adjust the index type and operator depending on your search needs.

On CodeSignal, these indexes are usually pre-created for you, so you may not have permission to run these commands in practice. However, it is important to understand how to manage indexes on your own systems.

Monitoring Index Creation Progress

Building an index, especially on large tables, can take some time. It is helpful to monitor the progress so you know when the index will be ready. PostgreSQL provides a system view called pg_stat_progress_create_index that shows the current phase and progress of index creation.

You can check the progress with this SQL query:

This will show you the current phase of the index build and the percentage completed. For example, you might see output like:

phaseprogress
scanning table45.3

This means the index is about halfway through scanning the table. Monitoring progress is especially useful when working with large datasets, so you can plan your work accordingly.

Summary and What’s Next

In this lesson, you learned why indexes are important for vector search and how pgvector supports two main types: IVFFlat and HNSW. You saw how to drop and recreate these indexes using SQL commands, and how to monitor the progress of index creation. Understanding these basics is essential for building and maintaining efficient vector search systems.

Since the indexes are already pre-created for you, in the next step we'll simply review which IVFFlat and HNSW indexes exist on the products table.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal