Section 1 - Instruction

Welcome to working with real-world data! Here's the truth: most business data you'll encounter is messy, incomplete, and inconsistent.

Unlike textbook examples, real data comes with problems that need fixing before analysis.

Engagement Message

What’s one clue that told you a dataset was "off" or incomplete?

Section 2 - Instruction

Let's start with missing values - gaps where data should exist but doesn't. Imagine a customer database where some people didn't provide their age or income.

These blank cells create holes in your analysis and can lead to wrong conclusions.

Engagement Message

Can you think of why someone might skip providing their income?

Section 3 - Instruction

Next up: duplicate records. These happen when the same customer, transaction, or product appears multiple times in your dataset.

For example, if John Smith appears twice with slightly different spellings, your analysis might count him as two different customers.

Engagement Message

How might duplicates affect your customer count analysis?

Section 4 - Instruction

Finally, inconsistencies - when the same information is recorded differently across your data. Think "USA", "United States", and "US" all meaning the same country.

Or dates written as "Jan 15" in one place and "1/15" in another.

Engagement Message

What problems might this create when analyzing sales by country?

Section 5 - Instruction

Why does this matter for business decisions? Messy data leads to wrong insights, which lead to poor decisions.

If 30% of your customer ages are missing, can you trust an analysis about age preferences?

Engagement Message

What business decision might go wrong with incomplete customer data?

Section 6 - Instruction

The good news: recognizing these issues is the first step to fixing them. In upcoming units, we'll learn systematic approaches to clean data.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal