Section 1 - Instruction

Machine learning models are like calculators - they only understand numbers, not words or categories like "Red", "Blue", or "Toyota".

But many of our features are categorical! Car brands, colors, cities - all text that models can't process directly.

Engagement Message

How do you think we convert "Toyota" into numbers a model can use?

Section 2 - Instruction

Converting categories into numbers is called "encoding." It's like translating text into a language that machines understand.

There are two main approaches: one-hot encoding and ordinal encoding. Each has different strengths for different situations.

Engagement Message

Which do you think would work better: converting "Red" to 1, "Blue" to 2, or something else?

Section 3 - Instruction

One-hot encoding turns each category into its own column with 1s and 0s. If something IS that category, it gets a 1. If it's NOT, it gets a 0.

Think of it like checkboxes - you check the box that applies and leave others empty.

Engagement Message

How many columns would you need for car colors: Red, Blue, Green?

Section 4 - Instruction

Here's one-hot encoding in action with car colors:

Original: Red, Blue, Red, Green

Becomes:

RedBlueGreen
100
010
100
001

Notice how each row has exactly one "1" and the rest are "0s"?

Engagement Message

How would a new color like Yellow look in this table?

Section 5 - Instruction

Ordinal encoding assigns numbers based on meaningful order or ranking. Size categories like "Small, Medium, Large" become 1, 2, 3.

This preserves the natural ordering - the model understands that Large (3) is "more" than Small (1).

Engagement Message

Can you think of another category that has natural ordering?

Section 6 - Instruction

Here's ordinal encoding with education levels:

Original: High School, Bachelor's, Master's, PhD

Becomes:

Education Level
1
2
3
4

The numbers reflect the natural progression from less to more education.

Engagement Message

What number would you assign to "Associate's Degree"?

Section 7 - Instruction

Choose one-hot encoding when categories have no natural order (like car brands or colors). Use ordinal encoding when there's a clear ranking (like education levels or satisfaction ratings).

Using ordinal for unordered categories can confuse models - they might think "Toyota = 1" is "less than" "Ford = 2"!

Engagement Message

Would you use one-hot or ordinal encoding for movie genres?

Section 8 - Practice

Type

Fill In The Blanks

Markdown With Blanks

Let's practice encoding car transmission types. Convert this categorical data using one-hot encoding.

The original data, in order, is: Manual, Automatic, Manual, CVT.

[[blank:Manual]][[blank:Automatic]][[blank:CVT]]
100
010
100
001

Suggested Answers

  • Manual
  • Automatic
  • CVT
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal