Welcome back! You've journeyed through the basics of recommendation systems, starting with baseline predictions and learning about similarity measures like Pearson Correlation. Understanding user similarity is crucial in recommendation systems, enabling more accurate predictions of unknown ratings. In this lesson, we will build upon that knowledge and focus on a practical approach to predicting user ratings using weighted averages combined with Pearson similarity. We will also keep a non-weighted (global item) average as a baseline and as a fallback when similarity information is weak or unavailable. By the end of the lesson, you’ll be able to effectively predict a user's rating for an item—a vital skill in crafting sophisticated recommendation systems.
Why keep the non-weighted average?
- It’s a baseline to compare against the weighted method so you can see how similarity improves predictions.
- It’s a safe fallback when there’s no usable similarity signal (sum of weights equals 0).
- It provides a sanity check during debugging.
Before diving into this lesson's main topic, let's quickly revisit the Pearson correlation function we discussed in the previous lesson. This function is key in determining how similar two users are based on their rating patterns.
Here's the function we'll use in JavaScript, utilizing mathjs for vectorized operations:
This function calculates how closely two sets of user ratings align. Higher values indicate greater similarity, which will be important for today's task: predicting ratings based on these similarities. In practice, similarity should be computed on items both users have rated; you’ll refine this in a later exercise.
To make predictions, we first need to read and interpret our user-item rating data. This data is stored in a file named user_items_matrix.txt. Let's explore how the file is structured and how to load this information.
The file is organized with each line representing a user's rating for a specific item. It has three comma-separated values: User, Item, and Rating. Here's an example:
We'll use JavaScript to read this data into a user-item dictionary, allowing us to easily access any user's ratings. (Assume this code runs in a Node.js environment.)
The code reads the file line by line, splitting each line into user, item, and rating, and then stores this data in a dictionary usersItemsMatrix. This structure allows for easy retrieval and manipulation of ratings, facilitating our upcoming calculations.
Before making predictions using weighted averages, it's beneficial to understand non-weighted averages, which are simpler aggregates of ratings for a specific item across all users. We will also use this as a fallback when there isn’t enough similarity signal.
Let's look at how to compute this in JavaScript:
This function, calculateNonWeightedAverage, gathers all ratings for a specified item (e.g., 'ItemC') from the user-item matrix and calculates the average. It’s a straightforward baseline and a practical fallback for the weighted predictor.
Let's consider the formula for predicting a rating using a weighted average based on Pearson similarity:
To predict ratings using the weighted average approach, an essential preparatory step involves transforming the ratings of the target user into an array. This array will exclude the item that we aim to predict. This simplification allows us to focus on the set of ratings that are pivotal for calculating similarity with other users.
Here's how you can derive the targetRatings variable in JavaScript:
By processing the targetRatings, you establish the foundation for calculating Pearson similarity with other users, a crucial factor in making an informed prediction. In later practice, you will refine this to compare only items both users rated.
Now, let's move to the core of this lesson: predicting ratings using a weighted average that's informed by Pearson similarity. This method considers the similarity between users when calculating the predicted rating. We’ll also add a fallback to the non-weighted item average if similarity cannot be computed reliably.
This function, weightedRatingPrediction, predicts the rating for a specified user ('User3') on a target item ('ItemC') by:
- Gathering similarity scores between the target user and each other user.
- Calculating a weighted sum using similarities as weights.
To better understand how the weighted rating prediction works, let's look at an example with three users (User1, User2, and User3) and their ratings for various items, including ItemC.
Example Data (as a text file):
We want to predict User3's rating for ItemC. Here’s how this process unfolds:
- Calculate similarities:
- User1 and User3 have moderate similarity.
- User2 and User3 may be more or less similar depending on the overlap and trends of ratings.
- Weighted sum calculation:
- More similar users have greater influence on the prediction.
- Compute predicted rating:
- If similarities produce a meaningful signal, the prediction will lean toward the ratings of the most similar users.
- If similarities are weak or undefined, the predictor falls back to the non-weighted item average for
ItemC.
Let’s print both predictions for comparison:
Interpreting the comparison:
- When similarities provide strong signal, the weighted prediction diverges from the simple average and reflects the preferences of the most similar users.
- When similarity is weak or overlap is limited, the weighted prediction gracefully falls back to the non-weighted average, ensuring robust behavior.
You’ve now learned to predict user ratings using a weighted average approach informed by user similarity, with a clear baseline and fallback using the non-weighted item average. This lesson has enhanced your understanding of model-based recommendation systems, allowing you to make predictions that better reflect individual user preferences.
As you move into the practice exercises, you will:
- Read user–item matrices from files,
- Compute Pearson similarity more robustly by aligning on commonly rated items, and
- Apply the weighted prediction method to real datasets.
Keep an eye on how the weighted method compares with the non-weighted baseline—this contrast will strengthen your intuition about when similarity truly helps.
