Besides missing values and duplicates, another common issue is incorrect data types. Imagine a column for age
where the numbers are stored as text. You can't calculate an average age if the computer thinks it's looking at words!
Engagement Message
What would happen if you tried to calculate the average of ages stored as text?
You can quickly check the data type of every column in your DataFrame using the .dtypes
attribute. This will show you types like object
(usually for text), int64
(for whole numbers), and float64
(for numbers with decimals).
Engagement Message
Why would checking .dtypes
be a good habit before performing any analysis?
A frequent problem is finding numbers stored as object
type. For example, a quantity
column might contain '5'
instead of 5
. If you try to perform math on this column, it will fail because you can't add text together numerically.
Engagement Message
What do you think would happen if you tried to find the highest value in a column of text-based numbers?
