Lesson 4
Improved Prediction Using Adjusted Weighted Average
Introduction to Adjusted Predictions

Welcome to the final lesson of this course on recommendation systems, where we will explore the concept of adjusted weighted averages. Previously, we've used raw ratings to predict user preferences. However, this approach can introduce bias, as it doesn't account for individual users' rating tendencies. In this lesson, you'll learn how switching to using the difference between a rating and a user's average rating can improve prediction accuracy by minimizing these biases.

Recap of Previous Setup

Let's briefly revisit the code setup that we've built upon throughout this course. You should already be familiar with reading a user-item rating matrix from a text file and setting the stage for using this data in predictions. Here's a quick code reminder:

Python
1import numpy as np 2 3# Read user-item rating matrix from a text file 4def read_users_items_matrix(file_path): 5 users_items_matrix = {} 6 with open(file_path, 'r') as file: 7 for line in file: 8 user, item, rating = line.strip().split(',') 9 if user not in users_items_matrix: 10 users_items_matrix[user] = {} 11 users_items_matrix[user][item] = int(rating) 12 return users_items_matrix 13 14# Define the path to the text file 15file_path = 'user_items_matrix.txt' 16 17# Read the user-item matrix 18users_items_matrix = read_users_items_matrix(file_path)

This code reads the user-item matrix from a file, setting up our essential data structure for further manipulations. Remember, understanding this setup is crucial as we now proceed to modify our prediction approach.

Understanding the Switch in Attributes

When we use raw ratings in recommendation systems, we might introduce bias because different users have different rating tendencies. Here's what that means:

  • Consistently High Raters: Some users might generally give high ratings to most items, regardless of their true preference. For example, a user might rate most movies 4 or 5 stars.
  • Consistently Low Raters: Conversely, some users might rate items lower on average, even if they like them. They might give most movies 2 or 3 stars.

These tendencies can skew predictions because the system might interpret a high rating as a strong preference, even if it's just the user's habit. To reduce this bias and improve the accuracy of our recommendation system, we adjust the ratings by subtracting the average rating of each user.

By using the rating differences rather than raw averages, we can better identify genuine preferences:

  • This adjustment ensures that predictions are based more on relative preferences rather than absolute ratings.
  • It helps to normalize user ratings, making comparisons between users more equitable.
Formula

The formula for predicting a rating using adjusted weighted averages is:

Predicted Rating=rˉu+vUsim(u,v)×(rv,irˉv)vUsim(u,v)\text{Predicted Rating} = \bar{r}_{u} + \frac{\sum_{v \in U} \text{sim}(u, v) \times (r_{v,i} - \bar{r}_{v})}{\sum_{v \in U} |\text{sim}(u, v)|}

Where:

  • rˉu\bar{r}_{u} is the average rating of the target user uu.
  • sim(u,v)\text{sim}(u, v) is the similarity between the target user uu and another user vv.
  • rv,ir_{v,i} is the rating given by user vv to item ii.
  • rˉv\bar{r}_{v} is the average rating of user vv.
  • UU is the set of all users except the target user uu who have rated item ii.

This formula illustrates how the prediction accounts for individual user rating tendencies and emphasizes relative preferences, thus minimizing bias. Note that as we calculate not the average rating itself, but the average deviation from the user's average rating, we need to add the target user's average, denoted as rˉu\bar{r}_{u}.

Step-by-Step Code Modification: Step 1

Now, let's walk through the specific code modifications needed to implement these changes. The key is to adjust the computation to use the difference between a rating and the user's average rating in our weighted rating prediction function.

Modify the calculation of rating differences by subtracting each user's average rating, like so:

Python
1# Inside weighted_rating_prediction function 2avg_user_rating = np.mean(list(ratings.values())) 3rating_diff = ratings[target_item] - avg_user_rating

Here, avg_user_rating is calculated as the mean of all ratings given by a user. The rating_diff is the difference between the item rating and this average.

Step-by-Step Code Modification: Step 2

Ensure our similarity calculations take these differences into account:

Python
1weighted_sum += similarity * rating_diff 2sum_of_weights += similarity

This modification makes sure the predictions leverage differences rather than raw ratings, aligning with the theoretical benefits discussed earlier.

Implementing Adjusted Ratings in Predictions

With these adjustments, the weighted rating prediction function is revised to incorporate adjusted ratings. Let's consider the entire prediction function:

Python
1def weighted_rating_prediction(target_user, target_item, user_ratings): 2 similarities = [] 3 weighted_sum = 0 4 sum_of_weights = 0 5 6 target_ratings = user_ratings[target_user] 7 avg_target_user_rating = np.mean(list(target_ratings.values())) 8 9 for user, ratings in user_ratings.items(): 10 if user != target_user and target_item in ratings: 11 similarity = pearson_correlation(target_ratings, ratings) 12 similarities.append((user, similarity)) 13 14 avg_user_rating = np.mean(list(ratings.values())) 15 rating_diff = ratings[target_item] - avg_user_rating 16 17 weighted_sum += similarity * rating_diff 18 sum_of_weights += abs(similarity) 19 20 if sum_of_weights == 0: 21 return avg_target_user_rating 22 else: 23 return avg_target_user_rating + (weighted_sum / sum_of_weights)

The predicted rating now effectively balances user biases, leading to recommendations that better reflect each user's true preferences. Note that as we predict a difference between users, average rating and the target item prediction, we add our prediction to avg_target_user_rating in order to get the final rating.

Review, Summary, and Preparation for Practice

Congratulations on reaching the end of this course! Let's summarize what you have covered in this lesson:

  • You learned about using adjusted weighted averages to improve prediction accuracy by reducing bias in user-item matrices.
  • You explored specific code modifications designed to use rating differences rather than raw averages, thus enhancing the fairness and equity of similarity-based recommendations.

In the practice exercises that follow, you'll have the chance to apply these concepts hands-on, solidifying your understanding. Thank you for your dedication and hard work throughout this journey. Your newfound expertise in recommendation systems positions you well for further exploration and application in real-world projects. Well done!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.