Hello! In today's lesson, we will dive into the concept of correlation and focus specifically on highlighting certain correlation values within the diamonds
dataset.
Correlation is a statistical measure that describes the extent to which two variables change together. Understanding correlations is crucial in data analysis as it helps us identify relationships between different variables.
For example:
- Positive Correlation: As one variable increases, the other also increases (e.g., height and weight).
- Negative Correlation: As one variable increases, the other decreases (e.g., speed and travel time).
By the end of this lesson, you will be able to compute, mask, and visually represent these correlations to get a clearer picture of the underlying data relationships.
Let's compute the correlation matrix for our prepared diamonds
dataset. As mentioned before, the correlation matrix is a table showing correlation coefficients between many variables. Each cell in the table shows the correlation between two variables.
You might be familiar with the process by now, but here's how to compute and display the correlation matrix using pandas:
Output:
