Section 1 - Instruction

Besides missing values, another common data issue is duplicates. Imagine a customer accidentally submitting an order twice. If you don't remove the duplicate, your sales figures will be wrong. This is why finding and removing them is critical.

Engagement Message

How might duplicate customer orders affect your business decisions?

Section 2 - Instruction

Pandas gives us a simple method to find these identical rows: .duplicated(). This method scans your entire DataFrame and checks if any row is an exact copy of a row that has already appeared earlier in the dataset.

Engagement Message

What do you think the output of this method looks like?

Section 3 - Instruction

Just like .isnull(), the .duplicated() method returns a boolean Series of True or False values. It marks the first occurrence of a row as False and any subsequent identical rows as True.

Engagement Message

Why do you think it's designed to keep the first instance and flag the others?

Section 4 - Instruction

Once you've identified the duplicates, you need to remove them. For this, Pandas provides another convenient method: .drop_duplicates(). It automatically removes all the rows that .duplicated() would have marked as .

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal