Welcome to the first lesson of this course on storing and managing embeddings in PostgreSQL with pgvector. In this course, you will learn how to work with vector data — specifically, embeddings
— inside a PostgreSQL database. Embeddings are numerical representations of data, such as text or images, that capture their meaning in a way that computers can understand. They are widely used in modern applications for tasks like semantic search, recommendation systems, and natural language processing.
Storing embeddings in a database allows you to efficiently search, compare, and analyze large collections of data based on their semantic similarity, rather than just exact matches. PostgreSQL, with the help of the pgvector
extension, makes it possible to store and query these high-dimensional vectors directly in your database tables. This lesson will guide you through the initial setup required to work with embeddings in PostgreSQL, setting the stage for more advanced operations in later lessons.
To store and search embeddings in PostgreSQL, you need the pgvector
extension. pgvector
adds a new data type called vector
, which is designed for storing fixed-length arrays of numbers — perfect for embeddings generated by machine learning models.
On your own machine, you would typically install pgvector
and then enable it in your database using the following SQL command:
This command tells PostgreSQL to add the vector
data type to your database if it is not already available. On CodeSignal, the pgvector
extension is already installed and enabled for you, so you do not need to run this command in the CodeSignal environment. However, it is important to know how to enable it in case you work on your own setup in the future.
If you do try to run the CREATE EXTENSION
command in CodeSignal, you might see the following message in the warning tab:
This is because we have already created the extension for you as part of the setup, so PostgreSQL is letting you know that the extension is already present and does not need to be created again.
After enabling the pgvector
extension, it is a good idea to verify that it is active in your database. You can do this using the \dx
command in the PostgreSQL command-line interface (psql
). This command lists all installed extensions in your current database.
For example, after running \dx
, you should see output similar to the following:
If you see vector
listed, then the pgvector
extension is enabled and ready to use.
With pgvector
enabled, you can now store embeddings in your tables. In this course, you will work with a table called products
. This table is designed to store information about products, including their embeddings. Here is the schema for the products
table:
The key column for this course is embedding
, which uses the vector(384)
data type. This means each product has an associated embedding — a list of 384 floating-point numbers — that represents its features in a way that can be used for similarity search and other vector operations.
Let’s walk through the process of enabling pgvector
and inspecting the table structure. As mentioned earlier, on CodeSignal, the extension is already enabled, but here is how you would do it in a typical PostgreSQL environment:
First, you would enable the extension:
Next, you can verify that the extension is enabled by running:
You should see output that includes the vector
extension, confirming it is active.
Finally, to inspect the structure of the products
table and see the embedding
column, you can use the \d products
command in psql
. The output will look something like this:
In this lesson, you learned what embeddings are and why they are important for modern data applications. You saw how to enable the pgvector
extension in PostgreSQL, verified its installation, and reviewed the structure of a table designed to store embeddings. These steps are the foundation for working with vector data in your database.
Next, you will get a chance to practice these steps yourself. In the following lesson, we will explore the data stored in the products
table and see how embeddings are actually stored and managed. This hands-on experience will help you become comfortable with the basics before moving on to more advanced vector search queries.
