Lesson 3
Understanding Implicit Feedback in Recommendation Systems
Introduction to Implicit Feedback

Welcome back! As you continue your journey through the fascinating world of recommendation systems, it's important to understand not just explicit feedback — such as star ratings — but also implicit feedback. Implicit feedback is obtained from user behavior patterns, like watch times or click histories. While it's much easier to gather, it doesn't directly reveal user satisfaction as explicit feedback does.

Most classical models utilize either implicit or explicit feedback separately due to the complexities involved in integrating both types into a unified system. In this course, we'll focus on analyzing implicit feedback independently.

Binary Matrix of Interactions

Now, let's delve into the binary matrix of interactions. In the context of implicit feedback, this matrix is a simplified representation showing whether a user interacted with an item or not. Each entry in the matrix is a binary value:

  • 1 indicates an interaction (e.g., a user watched an item),
  • 0 implies no interaction.

For example, let's say User 1 interacted with Items 1, 2, and 4. The binary matrix would look like this:

Markdown
1| User/Item | Item 1 | Item 2 | Item 3 | Item 4 | 2|-----------|--------|--------|--------|--------| 3| User 1 | 1 | 1 | 0 | 1 |

This matrix is crucial as it helps algorithms understand which items have been interacted with, providing a baseline for recommending new items to users.

Confidence Matrix Explanation

The confidence matrix goes beyond the binary matrix by incorporating the confidence we have in each interaction. This confidence is calculated based on user behaviors such as watch times. Longer watch times suggest higher interest and thus, greater confidence in the interaction.

Here's how you might compute a confidence matrix, where watch_time plays a significant role:

Python
1import numpy as np 2 3# Initializing a sample confidence matrix 4confidence_matrix = np.zeros((1, 4)) 5 6# Let's assume some watch times for User 1 7watch_times = [30, 28, 11, 51] # For Items 1, 2, 3, 4 respectively 8alpha = 40 # Constant factor 9 10# Fill the confidence matrix using the formula 11confidence_matrix[0, :] = [1 + alpha * time for time in watch_times]

This might result in:

Markdown
1| User/Item | Item 1 | Item 2 | Item 3 | Item 4 | 2|-----------|--------|--------|--------|--------| 3| User 1 | 1201 | 1121 | 441 | 2041 |

Here, higher values denote greater confidence that the user is interested in those items, which is invaluable for personalizing recommendations.

Generally, there are various ways of evaluating the implicit feedback. Of course, you can come up with your own! The approach we described is the one posted in the article called Collaborative Filtering for Implicit Feedback Datasets posted by researchers from AT&T labs. We will use this approach to train a special version of ALS, called IALS, which works with implicit feedback efficiently, in the next lesson.

Data Format and Reading

The dataset is a JSON file where each entry contains entries for 'user', 'item', 'rating' and 'watch_time'. Each record describes an interaction a user had with an item.

JSON
1[ 2 {"user": 1, "item": 1, "rating": 2, "watch_time": 30}, 3 {"user": 1, "item": 2, "rating": 2, "watch_time": 28}, 4 {"user": 1, "item": 4, "rating": -1, "watch_time": 11}, 5... 6]

Here's an excerpt explaining how to read the data:

Python
1import json 2 3# Load the JSON data 4with open('ratings.json', 'r') as file: 5 data = json.load(file) 6 7# Determine the size of the matrices 8max_user = max(entry['user'] for entry in data) 9max_item = max(entry['item'] for entry in data)

In this block, we read the JSON file and calculate max_user and max_item to ascertain the dimensions of our matrices.

Initializing and Filling the Matrices

Following the data read, we initialize the matrices and populate them with interactions and confidence values:

Python
1import numpy as np 2 3# Initialize the matrices 4interaction_matrix = np.zeros((max_user, max_item), dtype=int) 5confidence_matrix = np.zeros((max_user, max_item)) 6 7# Fill the matrices with interactions and confidence values 8alpha = 40 # Constant for scaling confidence 9 10for entry in data: 11 user_id = entry['user'] - 1 # Convert to zero-index 12 item_id = entry['item'] - 1 # Convert to zero-index 13 interaction_matrix[user_id, item_id] = 1 14 confidence_matrix[user_id, item_id] = 1 + alpha * entry['watch_time']

Here, interaction_matrix is filled with 1s indicating a user-item interaction, while confidence_matrix is filled using the formula:

confidence=1+α×watch_time\text{confidence} = 1 + \alpha \times \text{watch\_time}

This formula is taken from the article that we mentioned before. In practice, you can experiment and come up with different approaches to calculate the implicit feedback value. For example, in the same article, authors offer an alternative formula for confidence that also worked well for them:

confidence=1+α×ln(1+watch_timeϵ)\text{confidence} = 1 + \alpha \times \ln{(1 + \frac{\text{watch\_time}}{\epsilon})}

Example of Resulting Matrices

Here's a short example showing how the resulting matrices might look:

Plain text
1Interaction Matrix (Binary): 2[[1 1 0 0] 3 [0 0 1 1]] 4 5Confidence Matrix: 6[[1201 1121 0 0] 7 [ 0 0 601 881]]

This output reflects interactions and confidence levels across users and items.

You might wonder, why don't we use only the confidence matrix, as it contains all the information? The reason is that splitting user preferences (interactions) and our confidence in their preferences allows us to work with these values distinctly and construct a model that treats them separately. It generally improves the model's performance.

In the next lesson, we will train one example of such a model. But before that, let's wrap it up and have some practice!

Summary and Preparation for Practice

In this lesson, you focused on understanding and creating interaction and confidence matrices based on implicit feedback like user watch times. You now have both the theoretical understanding and practical skills to process implicit feedback. This enables you to create a more nuanced and personalized recommendation system.

In the next session, you'll have the opportunity to explore practice exercises that reinforce today's lesson. These exercises will help solidify your understanding and make the transition to advanced models seamless. Keep up the great work as you advance towards mastering recommendation systems with ALS!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.