Machine learning models are like calculators - they only understand numbers, not words or categories like "Red", "Blue", or "Toyota".
But many of our features are categorical! Car brands, colors, cities - all text that models can't process directly.
Engagement Message
How do you think we convert "Toyota" into numbers a model can use?
Converting categories into numbers is called "encoding." It's like translating text into a language that machines understand.
There are two main approaches: one-hot encoding and ordinal encoding. Each has different strengths for different situations.
Engagement Message
Which do you think would work better: converting "Red" to 1, "Blue" to 2, or something else?
One-hot encoding turns each category into its own column with 1s and 0s. If something IS that category, it gets a 1. If it's NOT, it gets a 0.
Think of it like checkboxes - you check the box that applies and leave others empty.
Engagement Message
How many columns would you need for car colors: Red, Blue, Green?
Here's one-hot encoding in action with car colors:
Original: Red, Blue, Red, Green
Becomes:
Notice how each row has exactly one "1" and the rest are "0s"?
Engagement Message
How would a new color like Yellow look in this table?
Ordinal encoding assigns numbers based on meaningful order or ranking. Size categories like "Small, Medium, Large" become 1, 2, 3.
This preserves the natural ordering - the model understands that Large (3) is "more" than Small (1).
Engagement Message
Can you think of another category that has natural ordering?
Here's ordinal encoding with education levels:
Original: High School, Bachelor's, Master's, PhD
Becomes:
The numbers reflect the natural progression from less to more education.
Engagement Message
What number would you assign to "Associate's Degree"?
Choose one-hot encoding when categories have no natural order (like car brands or colors). Use ordinal encoding when there's a clear ranking (like education levels or satisfaction ratings).
Using ordinal for unordered categories can confuse models - they might think "Toyota = 1" is "less than" "Ford = 2"!
Engagement Message
Would you use one-hot or ordinal encoding for movie genres?
Type
Fill In The Blanks
Markdown With Blanks
Let's practice encoding car transmission types. Convert this categorical data using one-hot encoding.
The original data, in order, is: Manual, Automatic, Manual, CVT.
Suggested Answers
- Manual
- Automatic
- CVT
