Topic Overview

Hello and welcome! In this lesson, you'll learn how to compute and visualize a correlation matrix using the diamonds dataset. The goal is to understand how different features in the dataset relate to each other through correlations and visualize these relationships using a heatmap.

Convert Categorical Variables

The diamonds dataset contains categorical features such as cut, color, and clarity. Correlation matrices require numerical data, so we need to convert these categorical variables to numerical codes.

First, let's identify the categorical columns that need conversion, then we'll convert them using astype('category').cat.codes. astype('category') makes sure the feature is a categorical type, after which .cat.codes can be applied to convert it to a unique integer code ranging from 0 to number_of_categories - 1.

By converting these columns, you enable the dataset to be used in correlation computations where all features need to be numerical:

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal