In the last unit, we learned to drop missing data. But what if a row with a missing value still contains lots of useful information? Deleting it might mean losing valuable insights. The alternative is to fill in the gaps instead.
Engagement Message
When might filling a blank be better than throwing away the whole row?
Pandas provides the .fillna()
method for this exact purpose. It works like a "find and replace" for missing data, scanning for NaN
values and replacing them with a specific value that you provide. This lets you keep the row.
Engagement Message
What would make a replacement value 'safe' versus potentially misleading?
The most common strategy is filling with a constant value. For a numerical column like items_sold
, you could replace NaN
s with 0
. The code would look like this: df['items_sold'].fillna(0)
. This is a safe bet when zero is a logical default.
Engagement Message
For a 'discount_applied' column, why might filling with 0 be a good choice?
This method works great for text columns, too. If a city
column has missing values, you could fill them with a placeholder string like 'Unknown'. This keeps your data tidy and ensures every record has a value: df['city'].fillna('Unknown')
.
