Let's recap: systematic cleaning involves handling missing values, fixing inconsistencies, and dealing with outliers. The right strategy depends on the context of the data and your analysis goals.
Engagement Message
Let's practice choosing the best approach for different situations. What's the first thing you'd check before deleting rows with missing data?
Type
Multiple Choice
Practice Question
In a sales dataset of 10,000 records, 50 records are missing the Order_Date
. What is the most reasonable first step?
A. Delete the 50 rows immediately
B. Delete the entire Order_Date
column
C. Leave the missing dates as-is
D. Investigate why those dates are missing
Suggested Answers
- A
- B
- C
- D - Correct
Type
Swipe Left or Right
Practice Question
Is the described action a good or bad data cleaning practice? Swipe left for "Good Practice" and right for "Bad Practice".
Labels
- Left Label: Good Practice
- Right Label: Bad Practice
Left Label Items
- Converting "USA" and "US" to a standard "United States"
- Removing a customer with a birth year of 1850 after verification
- Documenting that you removed 15 duplicate entries
Right Label Items
- Deleting a column because you don't understand it
- Filling all missing salaries with the average salary
- Changing "N/A" to 0 without checking what "N/A" means
