Lesson 3
Rating Prediction Using Weighted Average and Pearson Similarity
Introduction

Welcome back! You've journeyed through the basics of recommendation systems, starting with baseline predictions and learning about similarity measures like Pearson Correlation. Understanding user similarity is crucial in recommendation systems, enabling more accurate predictions of unknown ratings. In this lesson, we will build upon that knowledge and focus on a practical approach to predicting user ratings using weighted averages combined with Pearson similarity. This technique allows us to make personalized recommendations by accounting for the weighted influence of similar users' ratings. By the end of the lesson, you’ll be able to effectively predict a user's rating for an item—a vital skill in crafting sophisticated recommendation systems.

Recap: Using Pearson Similarity

Before diving into this lesson's main topic, let's quickly revisit the Pearson correlation function we discussed in the previous lesson. This function is key in determining how similar two users are based on their rating patterns.

Here's the function we'll use:

Python
1import numpy as np 2 3def pearson_correlation(ratings1, ratings2): 4 n = len(ratings1) 5 assert n == len(ratings2) 6 7 mean1 = np.mean(ratings1) 8 mean2 = np.mean(ratings2) 9 10 diff1 = ratings1 - mean1 11 diff2 = ratings2 - mean2 12 13 numerator = np.sum(diff1 * diff2) 14 denominator = np.sqrt(np.sum(diff1 ** 2) * np.sum(diff2 ** 2)) 15 16 if denominator == 0: 17 return 0 18 else: 19 return numerator / denominator

This function calculates how closely two sets of user ratings align. Higher values indicate greater similarity, which will be important for today's task: predicting ratings based on these similarities.

Reading the User-Item Rating Matrix

To make predictions, we first need to read and interpret our user-item rating data. This data is stored in a file named user_items_matrix.txt. Let's explore how the file is structured and how to load this information.

The file is organized with each line representing a user's rating for a specific item. It has three comma-separated values: User, Item, and Rating. Here's an example:

1User1,ItemA,5 2User1,ItemB,4 3User2,ItemA,3

We'll use Python to read this data into a user-item dictionary, allowing us to easily access any user's ratings:

Python
1def read_users_items_matrix(file_path): 2 users_items_matrix = {} 3 with open(file_path, 'r') as file: 4 for line in file: 5 user, item, rating = line.strip().split(',') 6 if user not in users_items_matrix: 7 users_items_matrix[user] = {} 8 users_items_matrix[user][item] = int(rating) 9 return users_items_matrix 10 11# Example usage: 12file_path = 'user_items_matrix.txt' 13users_items_matrix = read_users_items_matrix(file_path)

The code reads the file line by line, splitting each line into user, item, and rating, and then stores this data in a dictionary users_items_matrix. This structure allows for easy retrieval and manipulation of ratings, facilitating our upcoming calculations.

Calculating Non-weighted Average Rating

Before making predictions using weighted averages, it's beneficial to understand non-weighted averages, which are simpler aggregates of ratings for a specific item across all users.

Let's look at how to compute this:

Python
1def calculate_non_weighted_average(target_item, user_ratings): 2 ratings = [ratings[target_item] for ratings in user_ratings.values() if target_item in ratings] 3 if not ratings: 4 return None 5 return np.mean(ratings) 6 7# Example usage: 8non_weighted_average = calculate_non_weighted_average('ItemC', users_items_matrix) 9print(f"Non-Weighted Average Rating for ItemC: {non_weighted_average}")

This function, calculate_non_weighted_average, gathers all ratings for a specified item (e.g., 'ItemC') from the user-item matrix and calculates the average. It's a straightforward method but does not consider user similarity, unlike the weighted prediction—which we’ll explore next.

Formula

Let's consider the formula for predicting a rating using a weighted average based on Pearson similarity:

Predicted Rating=vUsim(u,v)×rv,ivUsim(u,v)\text{Predicted Rating} = \frac{\sum_{v \in U} \text{sim}(u, v) \times r_{v,i}}{\sum_{v \in U} |\text{sim}(u, v)|}

Where:

  • sim(u,v)\text{sim}(u, v) is the Pearson similarity between the target user uu and another user vv.
  • rv,ir_{v,i} is the rating given by user vv to the target item ii.
  • UU is the set of all users except the target user uu who have provided a rating for item ii.

By multiplying each rating by similarity between users, we give more weight to ratings of users that are similar to the target user, and less weight to ratings of those who are different. Note that we use an absolute value of the similarity in the denominator, because similarities could be negative.

Preparing to Predict Ratings Using Weighted Average

To predict ratings using the weighted average approach, an essential preparatory step involves transforming the ratings of the target user into an array. This array will exclude the item that we aim to predict. This simplification allows us to focus on the set of ratings that are pivotal for calculating similarity with other users.

Here's how you can derive the target_ratings variable:

Python
1def generate_target_ratings(target_user, target_item, user_ratings): 2 # Extract the ratings of the target user, excluding the target item 3 target_ratings = np.array([rating for item, rating in user_ratings[target_user].items() if item != target_item]) 4 return target_ratings 5 6# Example usage: 7target_user = 'User3' 8target_item = 'ItemC' 9target_ratings = generate_target_ratings(target_user, target_item, users_items_matrix)

By processing the target_ratings, you establish the foundation for calculating Pearson similarity with other users, a crucial factor in making an informed prediction.

Predicting Rating Using Weighted Average

Now, let's move to the core of this lesson: predicting ratings using a weighted average that's informed by Pearson similarity. This method considers the similarity between users when calculating the predicted rating. Here’s a detailed implementation of this approach with explanations:

Python
1def weighted_rating_prediction(target_user, target_item, user_ratings): 2 weighted_sum = 0 3 sum_of_weights = 0 4 5 # Retrieve the target user's ratings, excluding the target item 6 target_ratings = np.array([rating for item, rating in user_ratings[target_user].items() if item != target_item]) 7 8 for user, ratings in user_ratings.items(): 9 # Skip the target user as we don't compare them to themselves 10 if user != target_user and target_item in ratings: 11 # Retrieve and prepare the other user's ratings, excluding the target item 12 other_ratings = np.array([rating for item, rating in ratings.items() if item != target_item]) 13 14 # Calculate Pearson similarity between the target user and the other user 15 similarity = pearson_correlation(target_ratings, other_ratings) 16 17 # Accumulate weighted sum of ratings and running total of similarities 18 weighted_sum += similarity * ratings[target_item] 19 sum_of_weights += abs(similarity) 20 21 # Return zero if there are no weights to prevent division by zero 22 if sum_of_weights == 0: 23 return 0 24 else: 25 # Compute and return the final weighted average rating prediction 26 return weighted_sum / sum_of_weights 27 28# Example usage and output: 29predicted_rating = weighted_rating_prediction('User3', 'ItemC', users_items_matrix) 30print(f"Predicted Rating for User3 on ItemC (Weighted Average): {predicted_rating}")

This function, weighted_rating_prediction, predicts the rating for a specified user ('User3') on a target item ('ItemC') by:

  1. Gathering Similarity Scores: Evaluating the closeness between the target user and each other user, expressed as a Pearson correlation score.
  2. Calculating a Weighted Sum: Using the similarity scores as weights, sum the product of each user's similarity and their rating for the target item.
  3. Normalizing by Sum of Similarity Weights: Divide the weighted sum by the sum of the similarity scores to produce a personalized rating prediction.

This method is more nuanced as it adjusts ratings based on the closeness of users' preferences, thus providing more personalized recommendations.

Example Data and Interpreting Results

To better understand how the weighted rating prediction works, let's look at an example with three users (User1, User2, and User3) and their ratings for various items, including ItemC.

Example Data:

Plain text
1User1,ItemA,4 2User1,ItemB,4 3User1,ItemC,4 4User1,ItemD,5 5 6User2,ItemA,3 7User2,ItemB,2 8User2,ItemC,2 9User2,ItemD,4 10 11User3,ItemA,5 12User3,ItemB,3 13User3,ItemD,5

We want to predict User3's rating for ItemC. Here’s how this process unfolds:

  1. Calculate Similarities:

    • User1's past ratings (e.g., on ItemA and ItemB) are not very similar to User3's ratings, indicating a medium Pearson correlation of 0.5.
    • User2's ratings are more similar to User3's ratings, resulting in a higher correlation of approximately 0.86.
  2. Weighted Sum Calculation:

    • Since User1's similarity to User3 is lower, User1's rating of 4 for ItemC carries less weight, it's influence is equal to 40.5=24 \cdot 0.5 = 2.
    • Conversely, User2's rating of 2 for ItemC is more influential due to higher similarity. It's influence is 20.86=1.722 \cdot 0.86 = 1.72
  3. Compute Predicted Rating:

    • Using the weighted average method, the predicted rating is derived by combining ratings adjusted for user similarity: 2+1.720.5+0.86=3.721.36=2.73\frac{2 + 1.72}{0.5 + 0.86} = \frac{3.72}{1.36} = 2.73 -The average rating for ItemC is 3, and the predicted rating for User3 will skew towards the more similar User2's rating, potentially resulting in a prediction closer to 2.

The final predicted rating suggests that despite the average being 3, User3's preference is expected to align closer with that of User2 due to their similarity. This highlights the essence of weighted rating predictions: leveraging user similarity to provide a more personalized estimate of a user's potential interest in an item.

Summary and Preparation for Practice

You’ve now learned to predict user ratings using a weighted average approach informed by user similarity. This lesson has enhanced your understanding of model-based recommendation systems, allowing you to make predictions that better reflect individual user preferences.

As you move into the practice exercises, take the opportunity to apply these techniques to different datasets and observe how recommendations alter based on user similarity. You are building the foundational knowledge to create effective recommendation systems.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.