Welcome to the first lesson of our course on Collaborative Filtering with ALS! In this lesson, we will begin our journey into the world of recommendation systems, an integral technology that powers many of the platforms we interact with daily. From helping you find movies you might like to suggesting products based on your past purchases, recommendation systems are all around us. We will begin by understanding user-item interactions and explicit ratings, which form the foundation of collaborative filtering techniques. This lesson will equip you with the necessary skills to handle explicit rating matrices, setting the stage for implementing the ALS (Alternating Least Squares) algorithm.
We begin with explicit ratings because they clearly show what problem we want to solve: predict ratings that are missing from a sparse user-item matrix. ALS is especially useful when this matrix becomes large and sparse, because it learns compact user and item factors instead of relying only on direct comparisons between users or items. Later in the course, we will move to implicit feedback, which is more common in production systems but also noisier and less direct than explicit ratings.
An explicit rating matrix is a foundational concept in recommendation systems. It refers to a table where users explicitly rate items. For instance, on a movie streaming service, you might rate a movie from 1 to 5 stars. These ratings help the service recommend other movies you might enjoy. The explicit rating matrix records these ratings in a structured form, making it easier to generate personalized recommendations.
Consider a movie platform where users have provided ratings for different films. Here's how a simplified explicit rating matrix might look:
In this table, each cell contains the rating a user has assigned to a movie, with -1 indicating that the user has not rated the movie. In this course, -1 is a simple internal sentinel value for a missing rating. In production systems, people also use masks, sparse formats, or special floating-point values, but -1 keeps our examples easy to read.
Let's set up the explicit rating matrix using a dataset of user-item interactions. The matrix is loaded from the explicit_ratings.txt file, which we will use for further exploration and manipulation. The format of the file is as follows:
Each row represents a user, and each column represents an item. -1 means that the rating for this user-item pair is missing.
Below is how you can read this data into a two-dimensional slice in Go:
In this code, we open the ratings file, read each line, split the line by spaces, trim spaces, convert each value to an integer, and store the results in a two-dimensional slice called R. Each row in represents a user's ratings for all items.
In the context of recommendation systems, marking certain entries as missing is fundamental when it comes to testing and validating models. We intentionally hide some ratings so that we can later compare an algorithm's predictions against the true values we removed. This gives us a controlled way to measure whether the model is learning useful patterns.
Below is how you can randomly mark a fraction of the existing ratings as missing in Go:
Here, we create a copy of R to retain the original ratings, because once we hide some entries in R, we still need the true values later for evaluation. We then iterate through all entries and randomly mark a fraction of the non-missing ratings as missing based on the missing ratio. This probabilistic masking prepares the matrix for model testing, allowing us to evaluate its ability to predict these missing ratings.
Explicit feedback is very valuable, as it represents the user's preferences explicitly. Unfortunately, users tend to avoid leaving feedback, so most of the time, you will see that most of the data in the rating matrix is missing. The solution is to use implicit feedback, such as analyzing users' actions, clicks, and engagement metrics (for example, time spent with an item). We will explore this in the following lessons.
In this lesson, you learned the significance of the explicit rating matrix in recommendation systems. You practiced setting it up using a dataset, marking entries as missing for testing purposes, and gained an introductory understanding of handling missing data. These skills are essential as you progress in the field of recommendation systems.
As you move on to the practice exercises, consider how marking missing entries can aid in evaluating model performance. These exercises will reinforce your understanding and provide hands-on experience in dealing with explicit rating matrices. This first step is crucial, laying the groundwork for more advanced topics you will encounter soon. Happy coding, and I look forward to seeing your progress!
