Loading...

Section 1 - Instruction

We've covered two crucial data cleaning steps: handling missing values and scaling numeric features. Now it's time to practice deciding which technique to use in different situations.

Engagement Message

Are you ready?

Section 2 - Practice

Type

Multiple Choice

Practice Question

Your dataset has 10,000 rows. The 'Age' column is missing in 200 rows (2% of the data). What is the most common and simplest strategy to handle this?

A. Impute the missing values with the mean age. B. Remove the 200 rows with missing ages. C. Delete the entire 'Age' column. D. Replace missing ages with zero.

Suggested Answers

A
B - Correct
C
D

Section 3 - Practice

Type

Sort Into Boxes

Practice Question

When should you use min-max scaling versus standardization? Sort these use cases.

Labels

First Box Label: Use Min-Max Scaling
Second Box Label: Use Standardization

First Box Items

Input for neural networks
You need values between 0 and 1

Second Box Items

Data has significant outliers
Algorithm is sensitive to feature range
Feature follows a Gaussian distribution

Section 4 - Practice

Type

Fill In The Blanks

Markdown With Blanks

Let's practice imputation. Fill in the blanks with the correct imputed values.

For a numeric 'Temperature' column with a mean of 25, a missing value would be imputed with [[blank:25]]. For a categorical 'Shirt Color' column where 'Blue' is the most frequent, a missing value would be imputed with [[blank:Blue]].

Suggested Answers

25
Blue
0
Red

Section 5 - Practice

Type

Swipe Left or Right

Practice Question

Which scaling method is generally preferred for these scenarios? Swipe each one to the correct method.

Labels

Left Label: Min-Max Scaling
Right Label: Standardization

Left Label Items

When you need values between 0 and 1
For data that should stay within fixed boundaries
When working with percentages or ratings

Right Label Items

When your data has many outliers
For most machine learning models
When your data values are spread normally around an average

Section 6 - Practice

Type

Multiple Choice

Practice Question

You have a feature for 'Years of Experience' ranging from 0 to 40, and another for 'Annual Salary' ranging from $30,000 to$ 500,000. Why is scaling important here?

A. To make the data easier to read. B. To prevent the 'Annual Salary' feature from dominating the model's calculations. C. To convert the data into integers. D. To remove any missing values.

Suggested Answers

A
B - Correct
C
D

Previous Lesson

Next Lesson: Splitting Data for Modeling

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal