Generating Embeddings and Setting Up pgvector in PostgreSQL

Introduction: Embeddings and Their Role in PostgreSQL

Welcome to the first lesson of this course on storing and managing embeddings in PostgreSQL with pgvector. In this course, you will learn how to work with vector data — specifically, embeddings — inside a PostgreSQL database. Embeddings are numerical representations of data, such as text or images, that capture their meaning in a way that computers can understand. They are widely used in modern applications for tasks like semantic search, recommendation systems, and natural language processing.

Storing embeddings in a database allows you to efficiently search, compare, and analyze large collections of data based on their semantic similarity, rather than just exact matches. PostgreSQL, with the help of the pgvector extension, makes it possible to store and query these high-dimensional vectors directly in your database tables. This lesson will guide you through the initial setup required to work with embeddings in PostgreSQL, setting the stage for more advanced operations in later lessons.

Setting Up pgvector in PostgreSQL

To store and search embeddings in PostgreSQL, you need the pgvector extension. pgvector adds a new data type called vector, which is designed for storing fixed-length arrays of numbers — perfect for embeddings generated by machine learning models.

On your own machine, you would typically install pgvector and then enable it in your database using the following SQL command:

This command tells PostgreSQL to add the vector data type to your database if it is not already available. On CodeSignal, the pgvector extension is already installed and enabled for you, so you do not need to run this command in the CodeSignal environment. However, it is important to know how to enable it in case you work on your own setup in the future.

If you do try to run the CREATE EXTENSION command in CodeSignal, you might see the following message in the warning tab:

This is because we have already created the extension for you as part of the setup, so PostgreSQL is letting you know that the extension is already present and does not need to be created again.

Verifying pgvector Installation

After enabling the pgvector extension, it is a good idea to verify that it is active in your database. You can do this using the \dx command in the PostgreSQL command-line interface (psql). This command lists all installed extensions in your current database.

For example, after running \dx, you should see output similar to the following:

If you see vector listed, then the pgvector extension is enabled and ready to use.

Reviewing the Products Table Structure

With pgvector enabled, you can now store embeddings in your tables. In this course, you will work with a table called products. This table is designed to store information about products, including their embeddings. Here is the schema for the products table:

Column Name	Data Type	Description
product_id	integer	Unique identifier for each product
product_name	text	Name of the product
category	text	Product category
price	numeric	Price of the product
stock_quantity	integer	Number of items in stock
created_at	timestamp	When the product was added
description	text	Description of the product
embedding	vector(384)	Embedding vector (length 384) for the item

The key column for this course is embedding, which uses the vector(384) data type. This means each product has an associated embedding — a list of 384 floating-point numbers — that represents its features in a way that can be used for similarity search and other vector operations.

Example: Enabling pgvector and Inspecting the Table

Let’s walk through the process of enabling pgvector and inspecting the table structure. As mentioned earlier, on CodeSignal, the extension is already enabled, but here is how you would do it in a typical PostgreSQL environment:

First, you would enable the extension:

Next, you can verify that the extension is enabled by running:

You should see output that includes the vector extension, confirming it is active.

Finally, to inspect the structure of the products table and see the embedding column, you can use the \d products command in psql. The output will look something like this:

Summary and What’s Next

In this lesson, you learned what embeddings are and why they are important for modern data applications. You saw how to enable the pgvector extension in PostgreSQL, verified its installation, and reviewed the structure of a table designed to store embeddings. These steps are the foundation for working with vector data in your database.

Next, you will get a chance to practice these steps yourself. In the following lesson, we will explore the data stored in the products table and see how embeddings are actually stored and managed. This hands-on experience will help you become comfortable with the basics before moving on to more advanced vector search queries.

Next Lesson: Exploring the Data and Stored Embeddings in PostgreSQL

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal