Section 1 - Instruction

Let's recap: systematic cleaning involves handling missing values, fixing inconsistencies, and dealing with outliers. The right strategy depends on the context of the data and your analysis goals.

Engagement Message

Let's practice choosing the best approach for different situations. What's the first thing you'd check before deleting rows with missing data?

Section 2 - Practice

Type

Multiple Choice

Practice Question

In a sales dataset of 10,000 records, 50 records are missing the Order_Date. What is the most reasonable first step?

A. Delete the 50 rows immediately B. Delete the entire Order_Date column C. Leave the missing dates as-is D. Investigate why those dates are missing

Suggested Answers

  • A
  • B
  • C
  • D - Correct
Section 3 - Practice

Type

Swipe Left or Right

Practice Question

Is the described action a good or bad data cleaning practice? Swipe left for "Good Practice" and right for "Bad Practice".

Labels

  • Left Label: Good Practice
  • Right Label: Bad Practice

Left Label Items

  • Converting "USA" and "US" to a standard "United States"
  • Removing a customer with a birth year of 1850 after verification
  • Documenting that you removed 15 duplicate entries

Right Label Items

  • Deleting a column because you don't understand it
  • Filling all missing salaries with the average salary
  • Changing "N/A" to 0 without checking what "N/A" means
Section 4 - Practice
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal