Lesson Introduction

Hello! Today, we are diving into the world of data scaling techniques. Imagine you are playing a game where you need to fit different shapes into matching holes. If your shapes vary greatly in size, it can be challenging. Similarly, in data analysis and machine learning, features (or columns) in your dataset may have vastly different scales. This can affect the performance of your analysis or model.

Our goal for this lesson is to understand two key data scaling techniques: Standard Scaling and Min-Max Scaling. By the end of this lesson, you'll be able to apply these techniques to scale features in a dataset, making them easier to work with.

Understanding Standard Scaling

Standard Scaling is like leveling the playing field for your data. It transforms your data so it has a mean (average) of 0 and a standard deviation (how spread out the numbers are) of 1. This is especially useful when you want your data to follow a standard normal distribution.

The formula for standard scaling is:

z=(Xμ)σz = \frac{(X - \mu)}{\sigma}

Where:

  • XX is the original value.
  • μ\mu is the mean of the values.
  • σ\sigma is the standard deviation of the values.

In simpler terms, you subtract the average value from each data point and then divide by how much your data varies from the average.

Applying Standard Scaling

Let's use the Titanic dataset to perform Standard Scaling on the age and fare columns.

Output:

Understanding Min-Max Scaling

Min-Max Scaling adjusts the scale of your data to fit within a specific range, typically between 0 and 1. This is like resizing shapes to fit in a smaller box, making them easier to compare.

The formula for Min-Max Scaling is:

X=(XXmin)(XmaxXmin)X' = \frac{(X - X_{min})}{(X_{max} - X_{min})}

Where:

  • XX is the original value.
  • XminX_{min} is the minimum value in the feature.
  • XmaxX_{max} is the maximum value in the feature.

In simpler terms, you subtract the smallest value from each data point and then divide by the range (difference between the largest and smallest values).

Applying Min-Max Scaling

Let's apply Min-Max Scaling to the age and fare columns in the Titanic dataset.

Output:

Lesson Summary

Great job! Today, you learned about the importance of data scaling and explored two common techniques: Standard Scaling and Min-Max Scaling. These techniques help bring features to a common scale, making them easier to analyze and work within machine learning models.

Now it's time for some hands-on practice. You'll apply Standard Scaling and Min-Max Scaling to different columns in a dataset using the CodeSignal IDE. This will solidify your understanding and give you practical experience in scaling data. Enjoy scaling your data!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal