Section 1 - Instruction

Text data can be messy in its own unique way. Imagine a city column with entries like 'New York', ' new york ', and 'new york'. To a computer, these are three different cities, which would ruin any analysis.

Engagement Message

Why is it so important for identical items to be written in the exact same way?

Section 2 - Instruction

To clean text, we need to use string methods. In Pandas, you access these by using the .str accessor on a Series. For example, to work on a city column, you would start with df['city'].str. This tells Pandas to treat each entry as a string.

Engagement Message

Why do you think Pandas requires this extra .str step?

Section 3 - Instruction

A common issue is extra whitespace. The .str.strip() method removes any spaces from the beginning and end of a string. So, ' chicago ' becomes 'chicago'. This is a crucial first step for cleaning up user-entered text.

Engagement Message

What kind of common data entry errors does .str.strip() help fix?

Section 4 - Instruction

Next, we tackle inconsistent capitalization. The .str.lower() method converts every character in a string to lowercase. This ensures that 'Boston', 'boston', and 'BOSTON' are all treated as the same value: 'boston'.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal