Pearson Correlation in Go

Introduction to Similarity Measures in Recommendation Systems

In the world of recommendation systems, one of the keys to success is understanding the similarity between users or items. This understanding forms the backbone of making accurate recommendations. Similarity measures allow us to identify users with similar preferences, improving the quality and relevance of recommendations.

In this lesson, we will explore Pearson Correlation, a tool used to measure similarity based on patterns in ratings. By the end, you will be able to implement this measure and understand its application in recommendation systems.

Recap of Essential Setup Steps

Before we dive in, let's quickly recap the setup from previous lessons. We will be using Go slices to represent user ratings in this lesson.

Here's a simple code block to demonstrate setting up user rating datasets using Go slices:

Each index in user1Ratings and user2Ratings corresponds to the rating of the same item by both users.

These slices can be extracted from a user-item matrix, but for simplicity, we define them directly here. In practice, if a rating is missing for either user in the user-item matrix, you should first filter the data to include only the items that both users have rated. This ensures that only ratings for commonly rated items are compared when calculating similarity. In this lesson, we assume that the input slices are already filtered and aligned, containing ratings for the same set of items in the same order, with no missing values.

For example, if you start from a user-item map, you would first collect only the co-rated items and build two aligned []float64 slices before calling PearsonCorrelation:

If your raw ratings are stored as int, convert them to float64 when building these slices. Pearson correlation uses means, differences from means, and a fractional result, so float64 is the right type for the calculation.

Understanding Pearson Correlation

Pearson Correlation measures the strength and direction of a linear relationship between two sets of data. It's a popular tool in recommendation systems because it helps gauge the similarity between users based on their rating trends, rather than their absolute ratings.

The formula for Pearson Correlation is:

$r = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}$

Step-by-step Implementation

Let’s break down the implementation of the PearsonCorrelation function to understand how to calculate it step by step in Go.

Note: The PearsonCorrelation function below assumes that the input slices contain ratings for the same set of items in the same order, with no missing values. In a real-world scenario, users typically rate different subsets of items, and you would need to filter the ratings to include only the items that both users have rated before applying this function. For simplicity and clarity, we use aligned slices in this lesson, but keep in mind that handling missing data is an important step in practical recommendation system implementations.

Calculate Means: We compute the mean rating for each user to understand their general preference level.
Difference from Mean: We find how each rating deviates from the mean.
Numerator and Denominator: We calculate these using the differences and basic arithmetic to determine the correlation.

You can think of the calculation as a short recipe:

Compute mean1 and mean2.

Example Application and Interpretation

Let's apply this function to our previous example user ratings and see how it works in practice.

Output:

In this context, a Pearson Correlation of 0.7 indicates a positive relationship between the rating trends of the two users. This suggests that they have similar interests, making it easier to recommend new items they might both enjoy.

Understanding Correlation with Additional Example

The term correlation refers to the statistical measure that describes the extent to which two variables change together. In recommendation systems, correlation focuses on how users' ratings move together, not just the number of items they have both rated or how often they give the same rating.

Let's illustrate this with a third user:

In this example, all three users have rated the same set of 5 items. Notice that User 1 and User 2 even have the same rating (2) for the first item, and both give a rating of 4 for the last item. However, their overall rating patterns are quite different, resulting in a low (and negative) correlation of -0.32. On the other hand, User 1 and User 3 do not always give the same ratings for individual items, but their ratings tend to move together in a similar pattern—when User 1's ratings go up, so do User 3's. This results in a much higher correlation of 0.96.

This demonstrates that Pearson correlation measures the similarity in rating trends, not just the number of items rated in common or the number of identical ratings. Even if two users have several identical ratings, their overall trends may differ, and correlation will reflect that. Other similarity metrics, such as cosine similarity, focus on different aspects of user similarity.

Overview and Preparation for Practice

Throughout this lesson, you've learned to calculate and apply Pearson Correlation to measure user similarity based on their ratings. This measure is a powerful tool in crafting accurate and personalized recommendations.

With these concepts and coding steps in mind, prepare yourself for the practice exercises that follow. Try implementing the code on your own and explore how varying datasets influence the correlation results. Remember, this hands-on experience will strengthen your understanding and competence in developing effective recommendation systems.

Previous Lesson

Next Lesson: Weighted Recommendations in Go

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal