Introduction: Why Encode Tracks and Users as Vectors?

Welcome to the first lesson of our Embedding-Based Recommendation with Similarity Scoring course. In this lesson, we will lay the foundation for building a smart music recommendation system by learning how to represent both music tracks and user preferences as vectors (also called embeddings).

Why do we need to encode tracks and users as vectors? The answer is simple: computers work best with numbers. By turning information about tracks (like genre, mood, tempo, and energy) and user listening history into vectors, we can use math to compare them. This makes it possible to find songs that are similar to each other or that match a user's taste, which is the core of any recommendation system.

By the end of this lesson, you will understand how to transform both tracks and user profiles into a format that is ready for similarity scoring and recommendations.

Recap: Setting Up the Music Data Environment

Before we dive into encoding, let's quickly review how we access our music data. In this course, we work with a dataset of tracks and user listening histories. On CodeSignal, the necessary libraries and data access functions are already set up for you, but it's good to know how this works in general.

Here is a quick code block that shows the basic setup:

  • get_all_tracks() returns a DataFrame with all available tracks and their features.
  • get_user_listening_history(user_id) returns a DataFrame with the tracks a specific user has listened to.

This setup allows us to work with both the track data and user data in the next steps.

Feature Selection and Preparation

To create useful embeddings, we need to decide which features of each track to use. In our example, we focus on four features:

  • genre (categorical)
  • mood (categorical)
  • tempo (numerical)
  • energy (numerical)

Before encoding, we must handle missing values and make sure each feature is in the right format. Here’s how this is done in the code:

Explanation:

  • The SimpleImputer from sklearn.impute is a preprocessing tool that automatically fills in missing values in your dataset. Many machine learning algorithms—and even transformers like OneHotEncoder—cannot work properly if the input has NaN (missing) values. That's why we impute (i.e., fill in) those gaps before continuing.
  • For numerical features like tempo and energy, we use the mean (average) because it preserves the overall distribution of the values and avoids introducing bias. Imagine 10 songs with a tempo, but 2 of them have missing tempos. Replacing those with the average tempo helps maintain a reasonable approximation without skewing the result too high or low.
  • For categorical features like genre and , we use the most frequent (mode) value. Why? Because there’s no meaningful "average" category. Filling in missing genres with the most common one helps reduce noise while still aligning with the most likely musical label.
Encoding Tracks into Embeddings

Now, let's see how we turn each track into a vector. This process is called embedding. Think of it like translating music into coordinates in a multi-dimensional space.

Why do we do this? Because mathematical operations like similarity comparison or clustering only work on numbers. By embedding tracks, we can say things like: "Track A is closer to Track B than to Track C"—which is exactly what we need for building recommendations.

We use two main techniques:

  • Standardization for numerical features (so they are on the same scale)
  • One-hot encoding for categorical features (turning categories into numbers)

Here is the relevant code from src/user_model.py:

_get_or_create_preprocessor(tracks_df):

This function builds a preprocessing pipeline that knows how to convert raw feature columns (tempo, energy, genre, mood) into a numerical vector format suitable for machine learning.

  • We use StandardScaler for Standardization, which rescales numerical values so they have a mean of 0 and standard deviation of 1. Features like tempo (which might range from 60 to 200 BPM) and energy (say, 0.0 to 1.0) are on different scales. Without standardization, one feature might dominate others when comparing vectors—even if it's less important. After scaling, they all contribute equally to similarity scoring.
  • We use OneHotEncoder to turn each category (like "rock" or "pop" for genre) into a separate column with a 1 or 0. This method ensures that all categories are treated equally—no implicit order is assumed (like “rock” being greater than “jazz”).
  • These transformations are combined using ColumnTransformer, which ensures that each transformation is applied only to the correct columns.
get_track_embeddings():

This function uses the fitted preprocessor to convert all tracks into numerical embeddings.

  • It loads the full track dataset using get_all_tracks().
  • Handles the Imputation for missing values
  • Builds the preprocessor using _get_or_create_preprocessor(...)
  • Applies the preprocessor to the tracks to generate a transformed matrix, where each row is the embedding for one track.
  • Returns two things: A list of track IDs (to keep track of which row corresponds to which song) and The matrix of embeddings (to be used for similarity comparison or user modeling).

Note: The embeddings returned here are not L2-normalized. If you plan to use cosine similarity to compare vectors, it's recommended to apply L2 normalization using normalize(..., norm='l2') from sklearn.preprocessing. This ensures that all vectors have unit length, which is important when comparing directions rather than magnitudes.

Example Output:
Suppose we have three tracks. After encoding, the output might look like this:

idembedding (first 5 values)
101[0.2, -1.1, 1, 0, 0, 1, 0, 0, ...]
102[-0.5, 0.3, 0, 1, 0, 0, 1, 0, ...]
103[1.1, 0.8, 0, 0, 1, 0, 0, 1, ...]
  • The first few values (such as 0.2, -1.1, etc.) come from the standardized numerical features like tempo and energy. These can take on any real number (positive or negative) depending on how far a value deviates from the mean.
Generating User Profile Vectors

To recommend music to a user, we need to represent their taste as a vector. We do this by averaging the embeddings of the tracks they have listened to. A user’s musical taste is reflected by the kinds of tracks they listen to. If we take the embedding (i.e., vector representation) of each track a user listens to, then averaging them gives us a single vector that summarizes their overall preferences. For example, if a user listens mostly to high-energy jazz tracks with fast tempo, the average of those embeddings will lean in that direction. This averaged vector can then be used to find new tracks with similar embeddings—thus creating personalized recommendations.

Here is the code for generating a user profile vector:

Explanation:

  • We get the list of tracks the user has listened to.
  • For each track, we find its embedding.
  • We take the average of all these embeddings to create the user's profile vector.

Example Output:
If a user has listened to three tracks, their profile vector might look like this:

This vector summarizes the user's musical taste in a way that can be compared to other tracks.

Summary And What’s Next

In this lesson, you learned how to:

  • Select and prepare track features for encoding
  • Transform tracks into numerical embeddings using standardization and one-hot encoding
  • Generate a user profile vector by averaging the embeddings of tracks they have listened to

These embeddings are the building blocks for making personalized music recommendations. In the next practice exercises, you will get hands-on experience with these concepts by encoding tracks and users yourself. This will prepare you for building similarity scoring and recommendation logic in future lessons.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal