Welcome to today's lesson on diversity in recommendation systems. In our previous lesson, we explored coverage and novelty metrics. Now, we will dive into diversity, an equally important concept that is crucial in enhancing user satisfaction and engagement with recommendation systems. By ensuring that users receive a diverse range of recommendations, we maintain their interest and cater to varied tastes, which ultimately leads to a richer user experience.
Before we dive into the code, let’s quickly ensure we have the necessary setup in place. For this lesson, we need user predictions and item vectors. As a reminder, here’s a brief setup using a simple dictionary for predictions and item vectors.
Python1import numpy as np 2 3# Example user predictions: each user receives a list of recommended items 4user_predictions = { 5 'user1': ['item1', 'item2', 'item3'], 6 'user2': ['item2', 'item3', 'item4'], 7 'user3': ['item1', 'item4', 'item5'] 8} 9 10# Example item vectors representing characteristics of items in a multi-dimensional space 11item_vectors = { 12 'item1': np.array([1, 0, 0]), 13 'item2': np.array([0, 1, 0]), 14 'item3': np.array([0, 0, 1]), 15 'item4': np.array([1, 1, 0]), 16 'item5': np.array([0, 1, 1]), 17}
These data structures are essential for calculating diversity and should be loaded into your environment beforehand.
As a reminder, Cosine similarity is a measure used to determine the similarity between two non-zero vectors. In recommendation systems, it helps to measure how similar or diverse the recommended items are based on their vectors. A cosine similarity of 1 means the vectors are identical, while a value of 0 indicates complete dissimilarity.
For two item vectors, A
and B
, the cosine similarity is calculated as:
Where:
- is the dot product of the vectors.
- and are the magnitudes of the vectors.
Understanding this concept is crucial as it is the foundation for calculating diversity.
Let's break down the diversity
function and understand its components. First, we process each user's list of recommended items to transform them into vectors using the item_vectors
dictionary:
Python1def diversity(predictions, item_vectors): 2 # Convert item recommendations to vectors 3 item_indices = [ 4 [item_vectors[item] for item in items if item in item_vectors] 5 for items in predictions.values() 6 ]
After processing each user's recommended items into vectors, we calculate the pairwise cosine similarity for the vectors and adjust for self-similarity (diagonal values).
Here's how the pairwise similarity matrix looks for a list of items:
Plain text1Example Items: ['item1', 'item2', 'item3'] 2 3Similarity Matrix: 4[[1.0, 0.7, 0.3], 5 [0.7, 1.0, 0.5], 6 [0.3, 0.5, 1.0]]
In the matrix, the diagonal elements represent self-similarity, i.e., each item is identical to itself, hence the value 1. To calculate the diversity of recommendations, we are interested in similarities between different items, not self-similarity.
To exclude these diagonal values, we subtract len(items)
from the sum of all elements in the similarity matrix:
Python1sum_similarities = np.sum(similarities) - len(items)
Subtracting len(items)
precisely eliminates the diagonal ones because the diagonal consists of len(items)
ones, as each item is completely similar to itself. This adjustment ensures that the diversity calculation focuses solely on the similarity between different items, providing a more accurate assessment of diversity.
Now, let's implement it:
Python1# Calculate pairwise cosine similarity for each user's recommended items 2total_similarity = 0 3count = 0 4for items in item_indices: 5 if len(items) < 2: 6 continue 7 similarities = cosine_similarity(items) 8 sum_similarities = np.sum(similarities) - len(items) # Subtract diagonal (self-similarity)
We accumulate the total similarity and keep a count to later derive the average similarity.
Python1# inside the same loop: 2 total_similarity += sum_similarities 3 count += len(items) * (len(items) - 1)
Finally, we can return the answer:
Python1# outside the loop: 2average_similarity = (total_similarity / count) if count != 0 else 0 3return 1 - average_similarity
By subtracting the average similarity from 1, we calculate the diversity score, which indicates how diverse the recommendations are.
Here is the full function for calculating diversity in recommendation systems using cosine similarity:
Python1import numpy as np 2from sklearn.metrics.pairwise import cosine_similarity 3 4def diversity(predictions, item_vectors): 5 # Convert item recommendations to vectors 6 item_indices = [ 7 [item_vectors[item] for item in items if item in item_vectors] 8 for items in predictions.values() 9 ] 10 11 # Calculate pairwise cosine similarity for each user's recommended items 12 total_similarity = 0 13 count = 0 14 for items in item_indices: 15 if len(items) < 2: 16 continue 17 similarities = cosine_similarity(items) 18 sum_similarities = np.sum(similarities) - len(items) # Subtract diagonal (self-similarity) 19 total_similarity += sum_similarities 20 count += len(items) * (len(items) - 1) 21 22 # Calculate average similarity and derive diversity 23 average_similarity = (total_similarity / count) if count != 0 else 0 24 return 1 - average_similarity
This complete code snippet incorporates each step discussed previously.
After implementing the function, we can calculate the diversity score:
Python1diversity_score = diversity(user_predictions, item_vectors) 2print(f"Diversity: {diversity_score:.2f}")
Output:
Plain text1Diversity: 0.67
A diversity score close to 1 indicates a high diversity level, meaning the recommended items are quite different. Conversely, a score near 0 indicates a lack of diversity.
In this lesson, we've explored the concept of diversity in recommendation systems, learned about cosine similarity, and understood how to calculate a diversity score with a practical code example. Understanding diversity is essential as it enhances the robustness and appeal of recommendation systems.
Now, you're encouraged to proceed to the practice exercises where you can apply these concepts using different datasets and configurations. Congratulations on progressing through the lesson, and keep up the strong momentum in your learning journey!