Exploring the Data and Stored Embeddings in PostgreSQL

Introduction: Getting Familiar with Stored Data

Now that you have set up your PostgreSQL database with the pgvector extension and reviewed the structure of the products table, it’s time to take the next step: exploring the actual data stored in the table. In the previous lesson, you learned about the purpose of embeddings and how the table is designed to store them. In this lesson, you will see how to view the data itself, including the embeddings, so you can become comfortable with what is stored and how it appears in the database. This is an important step before you start running more advanced queries or similarity searches, as it helps you understand the foundation you are working with.

Key Columns for Exploration

As a quick reminder, the products table contains several columns, but for the purpose of exploring stored embeddings, you will focus on four key columns: product_id, product_name, description, and embedding. The product_id is a unique identifier for each product, while product_name and description provide basic information about the product. The embedding column is where the vector representation of each product is stored. These columns are the most relevant when you want to inspect the data and see how embeddings are associated with each product.

Example: Selecting Products and Their Embeddings

To view the data in the products table, you can use a simple SQL SELECT statement. Here is an example query that retrieves the first five products, including their embeddings:

When you run this query, you will see the following output:

In this output, each row represents a product, and the embedding column contains a vector — a long list of numbers inside square brackets. The actual embedding will have many more numbers (for example, 384 values if you are using a 384-dimensional embedding), but for display purposes, you may see only a portion of the vector or an abbreviated version.

Understanding the Output

When you look at the results of the query, you will notice that the embedding column is quite different from the others. While product_id, product_name, and description are easy to read and understand, the embedding column contains a sequence of numbers. These numbers are the embedding values generated by a machine learning model. They are not meant to be interpreted by humans directly, but they are essential for enabling powerful search and comparison features in your database. Each embedding captures the meaning of the product description in a way that allows you to compare products based on their semantic similarity, rather than just matching keywords. For now, it is enough to recognize that these vectors are present and stored as part of each product’s data.

Summary and What’s Next

In this lesson, you learned how to explore the data stored in your products table, focusing on how to view and interpret the embeddings alongside other product information. You practiced using a simple SQL query to select and inspect the key columns, including the embedding vectors. Understanding how embeddings are stored and displayed in your database is an important foundation for the next steps in this course. In the upcoming practice exercises, you will get hands-on experience running these queries yourself and exploring the data further. This will prepare you for more advanced topics, such as running similarity searches and analyzing results, in future lessons.

Previous Lesson

Next Lesson: Running Nearest Neighbor Queries with Different Distances

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal