Section 1 - Instruction

Welcome to the course path on Machine Learning! To get started with ML, it's important to first understand data—beginning with the basics of tabular data.

Tabular data is information organized in rows and columns, just like a spreadsheet. Think of a simple table about cars with columns for Brand, Year, and Price.

Here's an example:

BrandYearPrice
Toyota2018$15,000
Ford2020$18,500
BMW2017$22,000

Engagement Message

Have you worked with spreadsheets before?

Section 2 - Instruction

In tabular data, each row represents one example or record. In our car table, each row would be one specific car.

Each column represents one attribute or characteristic. Brand, Year, and Price are all attributes that describe each car.

Engagement Message

Can you think of another attribute we might include?

Section 3 - Instruction

Here's crucial terminology: we call these columns "features". Features are the individual pieces of information that describe each example in your dataset.

So Brand, Year, and Price are all features of our car dataset.

Engagement Message

If you could only pick three features to describe a car, which would you choose?

Section 4 - Instruction

There's one special type of column called a "label" or "target." This is what you're trying to predict or understand.

If we want to predict car prices, then Price becomes our label. The other columns (Brand, Year) become our input features.

Here's how that looks in a table:

BrandYearPrice (Label)
Toyota2018$15,000
Ford2020$18,500
BMW2017$22,000
Honda2019

Engagement Message

Which column would be the label if we wanted to predict car age?

Section 5 - Instruction

Features come in different types. Numerical features are numbers you can do math with, like Year (2015, 2020) or Price ($15000, $25000).

Categorical features are categories or groups, like Brand (Toyota, Ford, BMW) or Color (Red, Blue, Black).

Engagement Message

Is "Number of Doors" numerical or categorical?

Section 6 - Instruction

Quality issues are common in real datasets. Look for: missing values (blank cells), inconsistent formats (2020 vs "twenty-twenty"), or obvious errors (car year = 3025).

These problems can confuse models, so spotting them early is crucial.

Engagement Message

What quality issue do you see in: Brand column with "Toyota", "TOYOTA", "toyota"?

Section 7 - Practice

Type

Fill In The Blanks

Markdown With Blanks

Let's practice identifying features versus labels. Imagine we want to predict house prices using various house characteristics.

Fill in the blanks below with either Feature or Label:

Square footage ([[blank:Feature]])Neighborhood ([[blank:Feature]])House price ([[blank:Label]])
1800Maple Grove$350,000
2200Oakwood$420,000
1500Pinecrest$280,000
2000Maple Grove$390,000

Suggested Answers

  • Feature
  • Label
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal