Welcome to the exciting world of recommendation systems. You've likely experienced the power of these systems when accessing platforms like Netflix, Amazon, or Spotify. Recommendation systems analyze user data to suggest items you might like. The key task in these systems is to predict the rating a user might give to an unseen item. In this lesson, you will learn how to make these predictions using a simple baseline model.
Baseline models are essential in recommendation systems, serving as a standard to measure the performance of more complex models. Today, you will explore one such baseline model using global averages.
In recommendation systems, the user-item rating matrix is a foundational concept. This matrix helps organize user interactions with different items. Each row represents a user, and each column represents an item. The cell values are the ratings given by users to the items. Rating could be an explicit feedback, where user actually rates an item, for example giving 4 stars to a watch movie. But rating could also be some implicit metric that evaluates how user liked an item. For example, for songs, it could be calculated based on how often user listens to this song.
For example, consider the following user-item matrix where users have rated some items:
ItemA | ItemB | ItemC | |
---|---|---|---|
User1 | 5 | 3 | 4 |
User2 | 3 | 1 | 2 |
User3 | 4 | 3 | ? |
Notice the missing rating for User3 on ItemC. This might mean that User3 didn't interact with ItemC. We want to predict what rating User3 would give to ItemC: If this rating is high, we might recommend this item.
It's important to note that predicted ratings can be floats, as they represent averages. Even though the example ratings are integers, predictions do not have to be rounded to the nearest integer.
Let's create a user-item rating matrix using a Python dictionary. This matrix will be handy when you predict missing ratings.
Python1# Creating a user-item rating matrix with one missing rating 2users_items_matrix = { 3 'User1': {'ItemA': 5, 'ItemB': 3, 'ItemC': 4}, 4 'User2': {'ItemA': 3, 'ItemB': 1, 'ItemC': 2}, 5 'User3': {'ItemA': 4, 'ItemB': 3} # User3's rating for ItemC is missing 6}
Here, you have a dictionary representing the ratings given by each user for each item. Notice how User3 has not rated ItemC, which you will predict using a baseline model. In this unit, we do not consider situations where the item is fresh new, meaning it was just added to our database, like a video that was just uploaded. It is known as the 'cold start' problem. We assume that the item is already present in our matrix and has ratings.
We wish to start with a simple baseline model, which will pivot future model development. The global average, the average of ratings that the item received from all users, is a simple approach to predicting missing ratings. You can suggest that other users would give this item ratings that are close to the global average.
Here's how you can implement this:
Python1# Baseline model - predicting based on the item's global average 2def compute_global_averages(user_ratings): 3 total_ratings = {} 4 count_ratings = {} 5 6 for user, ratings in user_ratings.items(): 7 for item, rating in ratings.items(): 8 if item not in total_ratings: 9 total_ratings[item] = 0 10 count_ratings[item] = 0 11 total_ratings[item] += rating 12 count_ratings[item] += 1 13 14 global_averages = {item: total_ratings[item] / count_ratings[item] for item in total_ratings} 15 return global_averages 16 17global_averages = compute_global_averages(users_items_matrix) 18predicted_rating_itemC_user3 = global_averages['ItemC'] # Predict User3's rating for ItemC 19print(f"Predicted Rating for User3 on ItemC: {predicted_rating_itemC_user3}") # Predicted Rating for User3 on ItemC: 3.0
Let's review the algorithm
-
Initialization:
total_ratings
: A dictionary to store the sum of all ratings for each item.count_ratings
: A dictionary to store the count of ratings for each item.
-
Iterating through User Ratings:
- For each user and their corresponding ratings, iterate through the items they have rated.
- If an item is encountered for the first time, initialize its entry in both
total_ratings
andcount_ratings
to zero. - Add the current rating to the
total_ratings
for the respective item. - Increment the count for this item in
count_ratings
.
-
Compute Global Averages:
- Use a dictionary comprehension to compute the global average for each item by dividing the total ratings by the count of ratings.
- This results in
global_averages
, which holds the average rating of each item.
-
Prediction:
- To predict the rating for User3 on ItemC, retrieve the item's global average from the
global_averages
dictionary.
- To predict the rating for User3 on ItemC, retrieve the item's global average from the
You've just completed your first steps into the world of recommendation systems, tackling user-item matrices and baseline predictions with global averages. Remember, while the global average method is basic, it gives you a foundation for understanding more advanced techniques.
Looking ahead, you will engage in hands-on exercises designed to solidify these concepts. You'll practice constructing matrices and prediction models, enhancing your understanding of prediction algorithms within recommendation systems. Keep up the great work and prepare yourself for more complex challenges ahead!