Welcome to the course path on Machine Learning! To get started with ML, it's important to first understand data—beginning with the basics of tabular data.
Tabular data is information organized in rows and columns, just like a spreadsheet. Think of a simple table about cars with columns for Brand, Year, and Price.
Here's an example:
Engagement Message
Have you worked with spreadsheets before?
In tabular data, each row represents one example or record. In our car table, each row would be one specific car.
Each column represents one attribute or characteristic. Brand, Year, and Price are all attributes that describe each car.
Engagement Message
Can you think of another attribute we might include?
Here's crucial terminology: we call these columns "features". Features are the individual pieces of information that describe each example in your dataset.
So Brand, Year, and Price are all features of our car dataset.
Engagement Message
If you could only pick three features to describe a car, which would you choose?
There's one special type of column called a "label" or "target." This is what you're trying to predict or understand.
If we want to predict car prices, then Price becomes our label. The other columns (Brand, Year) become our input features.
Here's how that looks in a table:
Engagement Message
Which column would be the label if we wanted to predict car age?
Features come in different types. Numerical features are numbers you can do math with, like Year (2015, 2020) or Price ($15000, $25000).
Categorical features are categories or groups, like Brand (Toyota, Ford, BMW) or Color (Red, Blue, Black).
Engagement Message
Is "Number of Doors" numerical or categorical?
Quality issues are common in real datasets. Look for: missing values (blank cells), inconsistent formats (2020 vs "twenty-twenty"), or obvious errors (car year = 3025).
These problems can confuse models, so spotting them early is crucial.
Engagement Message
What quality issue do you see in: Brand column with "Toyota", "TOYOTA", "toyota"?
Type
Fill In The Blanks
Markdown With Blanks
Let's practice identifying features versus labels. Imagine we want to predict house prices using various house characteristics.
Fill in the blanks below with either Feature
or Label
:
Suggested Answers
- Feature
- Label
