In our last unit, we used .describe()
for numerical columns. But what about text-based columns, like 'Department' or 'City'? Calculating an "average" city doesn't make sense.
Engagement Message
Why wouldn't calculating an average make sense for categorical data like city names?
For this, we need to count how many times each unique value appears. This is perfect for categorical data—information that can be sorted into distinct groups. For example, the 'Color' column in a car dataset would be categorical.
Engagement Message
What other categorical columns might you find in a car dataset?
Pandas gives us a simple method for this: .value_counts()
. You apply it to a single column (a Series) to see the frequency of each unique entry. It's a powerful way to understand the distribution of categorical data.
Engagement Message
What do you expect .value_counts()
to show for a 'Status' column with values like 'Active' and 'Inactive'?
The syntax is df['ColumnName'].value_counts()
. For an employee dataset, df['Department'].value_counts()
might show:
Engineering 50
Sales 35
Marketing 15
