Cosine Similarity Recommendations

Introduction: Why Similarity Matters in Recommendations

Welcome back! In the previous lesson, you learned how to represent both music tracks and user preferences as vectors, called embeddings. This lets us compare musical taste mathematically — the first step toward making personalized recommendations.

Now it’s time to use those vectors in action. In this lesson, you’ll learn how to:

Compute similarity between a user’s preferences and available tracks
Filter out already-listened songs
Rank and recommend the top matches

We'll also walk through a dedicated test file (test_recommend.py) that shows you exactly how the recommendation logic works.

Recap: Accessing User and Track Embeddings

Before we dive into similarity, let’s quickly remind ourselves how we get the vectors for users and tracks. You have already seen how to generate these embeddings in the previous lesson. Here’s a quick code snippet to show how you might access them:

generate_user_profile_vector(user_id) returns a vector representing the user's preferences.
get_track_embeddings() returns a list of all track IDs and their corresponding vectors.

These vectors are the building blocks for making recommendations.

Cosine Similarity Explained With Simple Example

Cosine similarity is a way to measure how similar two vectors are, regardless of their size. Imagine each vector as an arrow pointing in space. Cosine similarity looks at the angle between these arrows:

If the arrows point in the same direction, the similarity is 1 (very similar).
If they point in opposite directions, the similarity is -1 (very different).
If they are at 90 degrees, the similarity is 0 (not similar at all).

The cosine similarity between two vectors A and B is defined as:

Where:

A⋅B is the dot product of the two vectors.
||A|| and ||B|| are the L2 norms (lengths) of the vectors.

This formula computes the cosine of the angle between two vectors. The result ranges from -1 (opposite directions) to 1 (same direction).

In the context of music recommendations, if a user’s preference vector and a track’s vector point in the same direction, it means the user is likely to enjoy that track.

Here’s a simple example using NumPy:

Output:

This high score means the user and track are very similar, so the track is a good recommendation.

Note: You might wonder: why not just use Euclidean distance or subtract one vector from another? The reason is that cosine similarity focuses on the angle, not the magnitude. Two vectors pointing in the same direction are considered similar—even if one is longer than the other. This works especially well for comparing preference patterns like musical taste, where it’s the direction (i.e. relative importance of features) that matters more than the absolute values.

Walking Through recommend_tracks_by_similarity

Let’s break down the main function in src/recommend.py that puts everything together to recommend tracks:

Let’s go through each step:

Get the user’s profile vector:
This represents the user’s music taste. If the user has never played any tracks, this function will return None, which we check for early to avoid running similarity on an empty profile.
Get all track embeddings:
These are the vectors for every track in the system.
Compute cosine similarity:
This calculates how similar the user is to each track.
Create a DataFrame:
This makes it easy to sort and filter the results.

Summary And What’s Next

In this lesson, you learned how to use cosine similarity to compare user preferences with track features and recommend the best matches. You saw how the recommend_tracks_by_similarity function works step by step, from getting vectors to filtering and sorting recommendations.

Here’s a quick recap:

Cosine similarity measures how close two vectors are in direction.
We use it to find tracks that match a user’s taste.
The function filters out tracks the user already knows and sorts the rest by similarity.

Now, you are ready to practice these concepts yourself. In the next exercises, you’ll get hands-on experience using and modifying the recommendation function. This will help you build a deeper understanding of how embedding-based recommendations work in real applications. Good luck!

Previous Lesson

Next Lesson: Clustering Music Tracks

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal