Lesson 3
Diversity in Recommendation Systems
Introduction to Diversity in Recommendation Systems

Welcome to today's lesson on diversity in recommendation systems. In our previous lesson, we explored coverage and novelty metrics. Now, we will dive into diversity, an equally important concept that is crucial in enhancing user satisfaction and engagement with recommendation systems. By ensuring that users receive a diverse range of recommendations, we maintain their interest and cater to varied tastes, which ultimately leads to a richer user experience.

Setup

Before we dive into the code, let’s quickly ensure we have the necessary setup in place. For this lesson, we need user predictions and item vectors. As a reminder, here’s a brief setup using a simple dictionary for predictions and item vectors.

Python
1import numpy as np 2 3# Example user predictions: each user receives a list of recommended items 4user_predictions = { 5 'user1': ['item1', 'item2', 'item3'], 6 'user2': ['item2', 'item3', 'item4'], 7 'user3': ['item1', 'item4', 'item5'] 8} 9 10# Example item vectors representing characteristics of items in a multi-dimensional space 11item_vectors = { 12 'item1': np.array([1, 0, 0]), 13 'item2': np.array([0, 1, 0]), 14 'item3': np.array([0, 0, 1]), 15 'item4': np.array([1, 1, 0]), 16 'item5': np.array([0, 1, 1]), 17}

These data structures are essential for calculating diversity and should be loaded into your environment beforehand.

Cosine Similarity Revisit

As a reminder, Cosine similarity is a measure used to determine the similarity between two non-zero vectors. In recommendation systems, it helps to measure how similar or diverse the recommended items are based on their vectors. A cosine similarity of 1 means the vectors are identical, while a value of 0 indicates complete dissimilarity.

For two item vectors, A and B, the cosine similarity is calculated as:

Cosine Similarity(A,B)=ABA×B\text{Cosine Similarity}(A, B) = \frac{A \cdot B}{||A|| \times ||B||}

Where:

  • ABA \cdot B is the dot product of the vectors.
  • A||A|| and B||B|| are the magnitudes of the vectors.

Understanding this concept is crucial as it is the foundation for calculating diversity.

Step-by-Step Code Walkthrough: Part 1

Let's break down the diversity function and understand its components. First, we process each user's list of recommended items to transform them into vectors using the item_vectors dictionary:

Python
1def diversity(predictions, item_vectors): 2 # Convert item recommendations to vectors 3 item_indices = [ 4 [item_vectors[item] for item in items if item in item_vectors] 5 for items in predictions.values() 6 ]
Calculating Similarities:

After processing each user's recommended items into vectors, we calculate the pairwise cosine similarity for the vectors and adjust for self-similarity (diagonal values).

Here's how the pairwise similarity matrix looks for a list of items:

Plain text
1Example Items: ['item1', 'item2', 'item3'] 2 3Similarity Matrix: 4[[1.0, 0.7, 0.3], 5 [0.7, 1.0, 0.5], 6 [0.3, 0.5, 1.0]]

In the matrix, the diagonal elements represent self-similarity, i.e., each item is identical to itself, hence the value 1. To calculate the diversity of recommendations, we are interested in similarities between different items, not self-similarity.

To exclude these diagonal values, we subtract len(items) from the sum of all elements in the similarity matrix:

Python
1sum_similarities = np.sum(similarities) - len(items)

Subtracting len(items) precisely eliminates the diagonal ones because the diagonal consists of len(items) ones, as each item is completely similar to itself. This adjustment ensures that the diversity calculation focuses solely on the similarity between different items, providing a more accurate assessment of diversity.

Step-by-Step Code Walkthrough: Part 2

Now, let's implement it:

Python
1# Calculate pairwise cosine similarity for each user's recommended items 2total_similarity = 0 3count = 0 4for items in item_indices: 5 if len(items) < 2: 6 continue 7 similarities = cosine_similarity(items) 8 sum_similarities = np.sum(similarities) - len(items) # Subtract diagonal (self-similarity)

We accumulate the total similarity and keep a count to later derive the average similarity.

Python
1# inside the same loop: 2 total_similarity += sum_similarities 3 count += len(items) * (len(items) - 1)

Finally, we can return the answer:

Python
1# outside the loop: 2average_similarity = (total_similarity / count) if count != 0 else 0 3return 1 - average_similarity

By subtracting the average similarity from 1, we calculate the diversity score, which indicates how diverse the recommendations are.

Full Code Snippet

Here is the full function for calculating diversity in recommendation systems using cosine similarity:

Python
1import numpy as np 2from sklearn.metrics.pairwise import cosine_similarity 3 4def diversity(predictions, item_vectors): 5 # Convert item recommendations to vectors 6 item_indices = [ 7 [item_vectors[item] for item in items if item in item_vectors] 8 for items in predictions.values() 9 ] 10 11 # Calculate pairwise cosine similarity for each user's recommended items 12 total_similarity = 0 13 count = 0 14 for items in item_indices: 15 if len(items) < 2: 16 continue 17 similarities = cosine_similarity(items) 18 sum_similarities = np.sum(similarities) - len(items) # Subtract diagonal (self-similarity) 19 total_similarity += sum_similarities 20 count += len(items) * (len(items) - 1) 21 22 # Calculate average similarity and derive diversity 23 average_similarity = (total_similarity / count) if count != 0 else 0 24 return 1 - average_similarity

This complete code snippet incorporates each step discussed previously.

Calculating and Interpreting the Diversity Score

After implementing the function, we can calculate the diversity score:

Python
1diversity_score = diversity(user_predictions, item_vectors) 2print(f"Diversity: {diversity_score:.2f}")

Output:

Plain text
1Diversity: 0.67

A diversity score close to 1 indicates a high diversity level, meaning the recommended items are quite different. Conversely, a score near 0 indicates a lack of diversity.

Conclusion and Next Steps

In this lesson, we've explored the concept of diversity in recommendation systems, learned about cosine similarity, and understood how to calculate a diversity score with a practical code example. Understanding diversity is essential as it enhances the robustness and appeal of recommendation systems.

Now, you're encouraged to proceed to the practice exercises where you can apply these concepts using different datasets and configurations. Congratulations on progressing through the lesson, and keep up the strong momentum in your learning journey!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.