An Introduction to Principal Component Analysis (PCA)

Let's dive into Principal Component Analysis (PCA), a technique often used in machine learning to simplify complex data while keeping important details. PCA transforms datasets with lots of closely connected parts into datasets with parts that do not directly relate to each other. Think of it like organizing a messy room and putting everything in clear, separate bins.

Make A Simple Dataset

We can start using the PCA by creating our own little dataset. For this lesson, we'll make a 3D (three-dimensional) dataset of 200 points:

Standardizing the Dataset

Before PCA, we need to bring all features of our dataset to a common standard to avoid bias. This just means making sure every feature's average value is 0, and the spread of their values is the same:

The above code calculates the dataset's average (np.mean) and spread (np.std) and then adjusts each point accordingly.

Covariance Matrix
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal