Hello and welcome! In this lesson, you'll learn how to compute and visualize a correlation matrix using the diamonds dataset. The goal is to understand how different features in the dataset relate to each other through correlations and visualize these relationships using a heatmap.
The diamonds dataset contains categorical features such as cut
, color
, and clarity
. Correlation matrices require numerical data, so we need to convert these categorical variables to numerical codes.
First, let's identify the categorical columns that need conversion, then we'll convert them using astype('category').cat.codes
. astype('category')
makes sure the feature is a categorical type, after which .cat.codes
can be applied to convert it to a unique integer code ranging from 0
to number_of_categories - 1
.
By converting these columns, you enable the dataset to be used in correlation computations where all features need to be numerical:
