Welcome back! You've journeyed through the basics of recommendation systems, starting with baseline predictions and learning about similarity measures like Pearson Correlation. Understanding user similarity is crucial in recommendation systems, enabling more accurate predictions of unknown ratings. In this lesson, we will build upon that knowledge and focus on a practical approach to predicting user ratings using weighted averages combined with Pearson similarity. This technique allows us to make personalized recommendations by accounting for the weighted influence of similar users' ratings. By the end of the lesson, you’ll be able to effectively predict a user's rating for an item—a vital skill in crafting sophisticated recommendation systems.
Before diving into this lesson's main topic, let's quickly revisit the Pearson correlation
function we discussed in the previous lesson. This function is key in determining how similar two users are based on their rating patterns.
Here's the function we'll use:
Python1import numpy as np 2 3def pearson_correlation(ratings1, ratings2): 4 n = len(ratings1) 5 assert n == len(ratings2) 6 7 mean1 = np.mean(ratings1) 8 mean2 = np.mean(ratings2) 9 10 diff1 = ratings1 - mean1 11 diff2 = ratings2 - mean2 12 13 numerator = np.sum(diff1 * diff2) 14 denominator = np.sqrt(np.sum(diff1 ** 2) * np.sum(diff2 ** 2)) 15 16 if denominator == 0: 17 return 0 18 else: 19 return numerator / denominator
This function calculates how closely two sets of user ratings align. Higher values indicate greater similarity, which will be important for today's task: predicting ratings based on these similarities.
To make predictions, we first need to read and interpret our user-item rating data. This data is stored in a file named user_items_matrix.txt
. Let's explore how the file is structured and how to load this information.
The file is organized with each line representing a user's rating for a specific item. It has three comma-separated values: User
, Item
, and Rating
. Here's an example:
1User1,ItemA,5 2User1,ItemB,4 3User2,ItemA,3
We'll use Python to read this data into a user-item dictionary, allowing us to easily access any user's ratings:
Python1def read_users_items_matrix(file_path): 2 users_items_matrix = {} 3 with open(file_path, 'r') as file: 4 for line in file: 5 user, item, rating = line.strip().split(',') 6 if user not in users_items_matrix: 7 users_items_matrix[user] = {} 8 users_items_matrix[user][item] = int(rating) 9 return users_items_matrix 10 11# Example usage: 12file_path = 'user_items_matrix.txt' 13users_items_matrix = read_users_items_matrix(file_path)
The code reads the file line by line, splitting each line into user
, item
, and rating
, and then stores this data in a dictionary users_items_matrix
. This structure allows for easy retrieval and manipulation of ratings, facilitating our upcoming calculations.
Before making predictions using weighted averages, it's beneficial to understand non-weighted averages, which are simpler aggregates of ratings for a specific item across all users.
Let's look at how to compute this:
Python1def calculate_non_weighted_average(target_item, user_ratings): 2 ratings = [ratings[target_item] for ratings in user_ratings.values() if target_item in ratings] 3 if not ratings: 4 return None 5 return np.mean(ratings) 6 7# Example usage: 8non_weighted_average = calculate_non_weighted_average('ItemC', users_items_matrix) 9print(f"Non-Weighted Average Rating for ItemC: {non_weighted_average}")
This function, calculate_non_weighted_average
, gathers all ratings for a specified item (e.g., 'ItemC'
) from the user-item matrix and calculates the average. It's a straightforward method but does not consider user similarity, unlike the weighted prediction—which we’ll explore next.
Let's consider the formula for predicting a rating using a weighted average based on Pearson similarity:
Where:
- is the Pearson similarity between the target user and another user .
- is the rating given by user to the target item .
- is the set of all users except the target user who have provided a rating for item .
By multiplying each rating by similarity between users, we give more weight to ratings of users that are similar to the target user, and less weight to ratings of those who are different. Note that we use an absolute value of the similarity in the denominator, because similarities could be negative.
To predict ratings using the weighted average approach, an essential preparatory step involves transforming the ratings of the target user into an array. This array will exclude the item that we aim to predict. This simplification allows us to focus on the set of ratings that are pivotal for calculating similarity with other users.
Here's how you can derive the target_ratings
variable:
Python1def generate_target_ratings(target_user, target_item, user_ratings): 2 # Extract the ratings of the target user, excluding the target item 3 target_ratings = np.array([rating for item, rating in user_ratings[target_user].items() if item != target_item]) 4 return target_ratings 5 6# Example usage: 7target_user = 'User3' 8target_item = 'ItemC' 9target_ratings = generate_target_ratings(target_user, target_item, users_items_matrix)
By processing the target_ratings
, you establish the foundation for calculating Pearson similarity with other users, a crucial factor in making an informed prediction.
Now, let's move to the core of this lesson: predicting ratings using a weighted average that's informed by Pearson similarity. This method considers the similarity between users when calculating the predicted rating. Here’s a detailed implementation of this approach with explanations:
Python1def weighted_rating_prediction(target_user, target_item, user_ratings): 2 weighted_sum = 0 3 sum_of_weights = 0 4 5 # Retrieve the target user's ratings, excluding the target item 6 target_ratings = np.array([rating for item, rating in user_ratings[target_user].items() if item != target_item]) 7 8 for user, ratings in user_ratings.items(): 9 # Skip the target user as we don't compare them to themselves 10 if user != target_user and target_item in ratings: 11 # Retrieve and prepare the other user's ratings, excluding the target item 12 other_ratings = np.array([rating for item, rating in ratings.items() if item != target_item]) 13 14 # Calculate Pearson similarity between the target user and the other user 15 similarity = pearson_correlation(target_ratings, other_ratings) 16 17 # Accumulate weighted sum of ratings and running total of similarities 18 weighted_sum += similarity * ratings[target_item] 19 sum_of_weights += abs(similarity) 20 21 # Return zero if there are no weights to prevent division by zero 22 if sum_of_weights == 0: 23 return 0 24 else: 25 # Compute and return the final weighted average rating prediction 26 return weighted_sum / sum_of_weights 27 28# Example usage and output: 29predicted_rating = weighted_rating_prediction('User3', 'ItemC', users_items_matrix) 30print(f"Predicted Rating for User3 on ItemC (Weighted Average): {predicted_rating}")
This function, weighted_rating_prediction
, predicts the rating for a specified user ('User3'
) on a target item ('ItemC'
) by:
- Gathering Similarity Scores: Evaluating the closeness between the target user and each other user, expressed as a Pearson correlation score.
- Calculating a Weighted Sum: Using the similarity scores as weights, sum the product of each user's similarity and their rating for the target item.
- Normalizing by Sum of Similarity Weights: Divide the weighted sum by the sum of the similarity scores to produce a personalized rating prediction.
This method is more nuanced as it adjusts ratings based on the closeness of users' preferences, thus providing more personalized recommendations.
To better understand how the weighted rating prediction works, let's look at an example with three users (User1, User2, and User3) and their ratings for various items, including ItemC
.
Example Data:
Plain text1User1,ItemA,4 2User1,ItemB,4 3User1,ItemC,4 4User1,ItemD,5 5 6User2,ItemA,3 7User2,ItemB,2 8User2,ItemC,2 9User2,ItemD,4 10 11User3,ItemA,5 12User3,ItemB,3 13User3,ItemD,5
We want to predict User3's rating for ItemC
. Here’s how this process unfolds:
-
Calculate Similarities:
- User1's past ratings (e.g., on
ItemA
andItemB
) are not very similar to User3's ratings, indicating a medium Pearson correlation of0.5
. - User2's ratings are more similar to User3's ratings, resulting in a higher correlation of approximately
0.86
.
- User1's past ratings (e.g., on
-
Weighted Sum Calculation:
- Since User1's similarity to User3 is lower, User1's rating of
4
forItemC
carries less weight, it's influence is equal to . - Conversely, User2's rating of
2
forItemC
is more influential due to higher similarity. It's influence is
- Since User1's similarity to User3 is lower, User1's rating of
-
Compute Predicted Rating:
- Using the weighted average method, the predicted rating is derived by combining ratings adjusted for user similarity:
-The average rating for
ItemC
is3
, and the predicted rating for User3 will skew towards the more similar User2's rating, potentially resulting in a prediction closer to2
.
- Using the weighted average method, the predicted rating is derived by combining ratings adjusted for user similarity:
-The average rating for
The final predicted rating suggests that despite the average being 3
, User3's preference is expected to align closer with that of User2 due to their similarity. This highlights the essence of weighted rating predictions: leveraging user similarity to provide a more personalized estimate of a user's potential interest in an item.
You’ve now learned to predict user ratings using a weighted average approach informed by user similarity. This lesson has enhanced your understanding of model-based recommendation systems, allowing you to make predictions that better reflect individual user preferences.
As you move into the practice exercises, take the opportunity to apply these techniques to different datasets and observe how recommendations alter based on user similarity. You are building the foundational knowledge to create effective recommendation systems.