Section 1 - Instruction

Welcome to scaling numeric features! Imagine comparing heights in meters versus millimeters - same data, but vastly different numbers.

Machine learning models can get confused when features have different scales, leading to poor predictions.

Engagement Message

Can you name another pair of features you've seen with very different numeric ranges?

Section 2 - Instruction

Here's the problem: models often calculate distances between data points. A feature with large numbers (like income) will dominate features with small numbers (like age).

It's like shouting over whispers - the loud feature drowns out the quiet ones!

Engagement Message

Which difference will influence distance more—$1 k in income or 1 year in age?

Section 3 - Instruction

Scaling transforms all features to similar ranges so they contribute equally. There are two main approaches: min-max scaling and standardization.

Min-max scaling squeezes all values into a 0-1 range. Standardization centers values around zero with consistent spread.

Engagement Message

Do any of these approaches sound familiar?

Section 4 - Instruction

Min-max scaling uses this formula: (value - minimum) ÷ (maximum - minimum)

Let's say we have car ages: 2, 5, 8 years. Minimum = 2, Maximum = 8.

For age 5: (5 - 2) ÷ (8 - 2) = 3 ÷ 6 = 0.5

Engagement Message

What would age 2 become after min-max scaling?

Section 5 - Instruction

Let's complete our car age example:

  • Age 2: (2-2) ÷ (8-2) = 0 ÷ 6 = 0
  • Age 5: (5-2) ÷ (8-2) = 3 ÷ 6 = 0.5
  • Age 8: (8-2) ÷ (8-2) = 6 ÷ 6 = 1

Notice how all values now fall between 0 and 1?

Engagement Message

Why does the oldest car scale to 1 and the youngest to 0?

Section 6 - Instruction

You can scale an entire array at once using NumPy:

Engagement Message

Notice how all the values are now between 0 and 1, just like when we did it by hand?

Section 7 - Instruction

Standardization uses: (value - mean) ÷ standard deviation

Same car ages: 2, 5, 8. Mean = 5, Standard deviation ≈ 2.45.

For age 5: (5 - 5) ÷ 2.45 = 0 ÷ 2.45 = 0

Engagement Message

This centers the average value at zero. What would age 8 become?

Section 8 - Instruction

Let's complete our car age example using standardization:

  • Age 2: (2 - 5) ÷ 2.45 = (-3) ÷ 2.45 ≈ -1.22
  • Age 5: (5 - 5) ÷ 2.45 = 0 ÷ 2.45 = 0
  • Age 8: (8 - 5) ÷ 2.45 = 3 ÷ 2.45 ≈ 1.22

Now, the average car age is 0, younger cars are negative, and older cars are positive.

Engagement Message

Why do you think standardization can help when features have very different spreads or outliers?

Section 9 - Instruction

You can also standardize an entire array at once using NumPy:

Now, the average value is 0, and the spread is consistent across the array!

Engagement Message

Does this make sense?

Section 10 - Instruction

Use min-max scaling when you want values bounded between 0 and 1. It's great for neural networks and when you know the expected range.

Use standardization when your data has outliers or follows a normal distribution. It's preferred for many algorithms.

Engagement Message

Which would work better for test scores ranging 0-100?

Section 11 - Practice

Type

Multiple Choice

Practice Question

Let's practice min-max scaling! Scale these house prices to 0-1 range: $100,000, $150,000, $200,000.

What's the scaled value for $150,000?

Formula: (value - min) ÷ (max - min)

A. 0.25 B. 0.5 C. 0.75 D. 1.0

Suggested Answers

  • A
  • B - Correct
  • C
  • D
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal