Loading...

Introduction

Welcome back! You've journeyed through the basics of recommendation systems, starting with baseline predictions and learning about similarity measures like Pearson Correlation. Understanding user similarity is crucial in recommendation systems, enabling more accurate predictions of unknown ratings. In this lesson, we will build upon that knowledge and focus on a practical approach to predicting user ratings using weighted averages combined with Pearson similarity. This technique allows us to make personalized recommendations by accounting for the weighted influence of similar users' ratings. By the end of the lesson, you’ll be able to effectively predict a user's rating for an item—a vital skill in crafting sophisticated recommendation systems.

Recap: Using Pearson Similarity

Before diving into this lesson's main topic, let's quickly revisit the Pearson correlation function we discussed in the previous lesson. This function is key in determining how similar two users are based on their rating patterns.

Here's the function we'll use:

This function calculates how closely two sets of user ratings align. Higher values indicate greater similarity, which will be important for today's task: predicting ratings based on these similarities.

Reading the User-Item Rating Matrix

To make predictions, we first need to read and interpret our user-item rating data. This data is stored in a file named user_items_matrix.txt. Let's explore how the file is structured and how to load this information.

The file is organized with each line representing a user's rating for a specific item. It has three comma-separated values: User, Item, and Rating. Here's an example:

We'll use Python to read this data into a user-item dictionary, allowing us to easily access any user's ratings:

The code reads the file line by line, splitting each line into user, item, and rating, and then stores this data in a dictionary users_items_matrix. This structure allows for easy retrieval and manipulation of ratings, facilitating our upcoming calculations.

Calculating Non-weighted Average Rating

Before making predictions using weighted averages, it's beneficial to understand non-weighted averages, which are simpler aggregates of ratings for a specific item across all users.

Let's look at how to compute this:

This function, calculate_non_weighted_average, gathers all ratings for a specified item (e.g., 'ItemC') from the user-item matrix and calculates the average. It's a straightforward method but does not consider user similarity, unlike the weighted prediction—which we’ll explore next.

Formula

Let's consider the formula for predicting a rating using a weighted average based on Pearson similarity:

\text{Predicted Rating} = \frac{\sum_{v \in U} \text{sim}(u, v) \times r_{v,i}}{\sum_{v \in U} |\text{sim}(u, v)|}

Where:

$\text{sim}(u, v)$ is the Pearson similarity between the target user $u$ and another user $v$ .
$r_{v,i}$ is the rating given by user $v$ to the target item $i$ .
$U$ is the set of all users except the target user $u$ who have provided a rating for item $i$ .

By multiplying each rating by similarity between users, we give more weight to ratings of users that are similar to the target user, and less weight to ratings of those who are different. Note that we use an absolute value of the similarity in the denominator, because similarities could be negative.

Preparing to Predict Ratings Using Weighted Average

To predict ratings using the weighted average approach, an essential preparatory step involves transforming the ratings of the target user into an array. This array will exclude the item that we aim to predict. This simplification allows us to focus on the set of ratings that are pivotal for calculating similarity with other users.

Here's how you can derive the target_ratings variable:

By processing the target_ratings, you establish the foundation for calculating Pearson similarity with other users, a crucial factor in making an informed prediction.

Predicting Rating Using Weighted Average

Now, let's move to the core of this lesson: predicting ratings using a weighted average that's informed by Pearson similarity. This method considers the similarity between users when calculating the predicted rating. Here’s a detailed implementation of this approach with explanations:

This function, weighted_rating_prediction, predicts the rating for a specified user ('User3') on a target item ('ItemC') by:

Gathering Similarity Scores: Evaluating the closeness between the target user and each other user, expressed as a Pearson correlation score.
Calculating a Weighted Sum: Using the similarity scores as weights, sum the product of each user's similarity and their rating for the target item.
Normalizing by Sum of Similarity Weights: Divide the weighted sum by the sum of the similarity scores to produce a personalized rating prediction.

This method is more nuanced as it adjusts ratings based on the closeness of users' preferences, thus providing more personalized recommendations.

Example Data and Interpreting Results

To better understand how the weighted rating prediction works, let's look at an example with three users (User1, User2, and User3) and their ratings for various items, including ItemC.

Example Data:

We want to predict User3's rating for ItemC. Here’s how this process unfolds:

Calculate Similarities:
- User1's past ratings (e.g., on ItemA and ItemB) are not very similar to User3's ratings, indicating a medium Pearson correlation of 0.5.
- User2's ratings are more similar to User3's ratings, resulting in a higher correlation of approximately 0.86.
Weighted Sum Calculation:
- Since User1's similarity to User3 is lower, User1's rating of 4 for ItemC carries less weight, it's influence is equal to $4 \cdot 0.5 = 2$ .
- Conversely, User2's rating of 2 for ItemC is more influential due to higher similarity. It's influence is $2 \cdot 0.86 = 1.72$
Compute Predicted Rating:
- Using the weighted average method, the predicted rating is derived by combining ratings adjusted for user similarity: $\frac{2 + 1.72}{0.5 + 0.86} = \frac{3.72}{1.36} = 2.73$ -The average rating for ItemC is 3, and the predicted rating for User3 will skew towards the more similar User2's rating, potentially resulting in a prediction closer to 2.

The final predicted rating suggests that despite the average being 3, User3's preference is expected to align closer with that of User2 due to their similarity. This highlights the essence of weighted rating predictions: leveraging user similarity to provide a more personalized estimate of a user's potential interest in an item.

Summary and Preparation for Practice

You’ve now learned to predict user ratings using a weighted average approach informed by user similarity. This lesson has enhanced your understanding of model-based recommendation systems, allowing you to make predictions that better reflect individual user preferences.

As you move into the practice exercises, take the opportunity to apply these techniques to different datasets and observe how recommendations alter based on user similarity. You are building the foundational knowledge to create effective recommendation systems.

Previous Lesson

Next Lesson: Improved Prediction Using Adjusted Weighted Average

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal