Lesson 1
Exploring User-Item Explicit Rating Matrix
Introduction to Recommendation Systems

Welcome to the first lesson of our course on Collaborative Filtering with ALS! In this lesson, we will keep on our journey into the world of recommendation systems, an integral technology that powers many of the platforms we interact with daily. From helping you find movies you might like to suggesting products based on your past purchases, recommendation systems are all around us. We will begin by understanding user-item interactions and explicit ratings, which form the foundation of collaborative filtering techniques. This lesson will equip you with the necessary skills to handle explicit rating matrices, setting the stage for implementing the ALS (Alternating Least Squares) algorithm.

Recap

An explicit rating matrix is a foundational concept in recommendation systems. It refers to a table where users explicitly rate items. For instance, on a movie streaming service, you might rate a movie from 1 to 5 stars. These ratings help the service recommend other movies you might enjoy. The explicit rating matrix records these ratings in a structured form, making it easier to generate personalized recommendations.

Consider a movie platform where users have provided ratings for different films. Here's how a simplified explicit rating matrix might look:

Users \ MoviesMovie AMovie BMovie C
User 153-1
User 22-14
User 3342

In this table, each cell contains the rating a user has assigned to a movie, with -1 indicating that the user has not rated the movie. This type of matrix is crucial for generating recommendations based on user preferences.

Reading Data

Let's set up the explicit rating matrix using a dataset of user-item interactions. The matrix is loaded from the explicit-ratings.txt file, which we will use for further exploration and manipulation. The format of the file is the following:

Plain text
1-1, 2, 5 25, 3, -1 33, 3, 3

Each row represents a user, and each column represents an item. -1 means that the rating for this user-item pair is missing.

Python
1import numpy as np 2 3# Load user-item interaction matrix with explicit feedback from file 4R = [] 5with open('explicit-ratings.txt', 'r') as file: 6 users = file.readlines() 7 for user in users: 8 ratings = list(map(int, user.split(' '))) 9 R.append(ratings) 10R = np.array(R) 11 12# Resulting matrix R 13# array([[-1, 2, 5], 14# [ 5, 3, -1], 15# [ 3, 3, 3]])

In this block of code, we imported the necessary library (numpy) and read user ratings from a file. Each line in the file corresponds to one user's ratings, and we're storing these in a NumPy array, R, for efficient mathematical operations.

Marking Missing Entries for Testing

In the context of recommendation systems, marking certain entries as missing is fundamental when it comes to testing and validating models. We can compare algorithm's predictions to the entries that we manually marked as missing.

Python
1import random 2 3num_users, num_items = R.shape 4original_R = R.copy() # Create a copy to track original values 5missing_ratio = 0.1 # Fraction of entries to exclude 6num_missing = int(missing_ratio * np.count_nonzero(R != -1)) 7missing_indices = random.sample(list(zip(*np.where(R != -1))), num_missing) 8 9for (u, i) in missing_indices: 10 R[u, i] = -1 # Mark selected original values as missing

Here, we use Python's random library to randomly select and mark a fraction of the ratings as missing. We achieve this by creating a copy of R to retain original ratings, calculating how many entries to mark as missing based on the missing ratio, and then updating R with -1 at those positions. This prepares the matrix for model testing, allowing us to evaluate its ability to predict these missing ratings.

Missing Data in Rating Matrices

Explicit feedback is very valuable, as it represents the user's preferences explicitly. Unfortunately, users tend to ignore leaving feedback, so most of the time, you will see that most of the data in the rating matrix is missing. The solution is to use implicit feedback, like analyzing users' actions, clicks, and engagement metrics (for example, time spent with an item). We will explore this in the following lessons.

Summary and Preparation for Practice

In this lesson, you learned the significance of the explicit rating matrix in recommendation systems. You practiced setting it up using a dataset, marking entries as missing for testing purposes, and gained an introductory understanding of handling missing data. These skills are essential as you progress in the field of recommendation systems.

As you move on to the practice exercises, consider how marking missing entries can aid in evaluating model performance. These exercises will reinforce your understanding and provide hands-on experience in dealing with explicit rating matrices. This first step is crucial, laying the groundwork for more advanced topics you will encounter soon. Happy coding, and I look forward to seeing your progress!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.