Lesson Introduction

This lesson provides a quick refresher on the core concepts of linear regression, focusing on key steps and implementation in Python using sklearn.

By the end of this lesson, you'll be ready to load datasets, split them, create and train a linear regression model, make predictions, and evaluate the model.

Loading Data

We'll start by loading the diabetes dataset from sklearn. This dataset contains ten baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements), which were obtained for each of 442 diabetes patients. The target is a quantitative measure of disease progression one year after baseline.

Note that we can access features and target of this dataset by using .data and .target attributes.

This code prints out the first two rows of the dataset, so we can observe its structure:

There is also a shortcut for loading X and y:

The return_X_y=True parameter allows us to split the dataset when loading. You can use any method you find comfortable.

Splitting the Dataset

Next, we'll split our data into training and testing sets, like we did before. As a reminder, we use the train_test_split function for it.

Output:

The size of the test set, test_size, is set to 0.2, which is 20%. It is common to set the test set size to 20-30%,

Creating the Model

Let's create a Linear Regression model and train it:

Making Predictions

Using the trained model, let's make predictions on the test set:

We print out the first 5 predictions to observe their values.

Now, we can evaluate the model's performance by using some metric. We will apply the Mean Squared Error (MSE) metric here:

Output:

Lesson Summary

You've refreshed your knowledge on:

  • Loading datasets
  • Splitting data into training and testing sets
  • Creating and training a linear regression model
  • Making predictions
  • Evaluating the model using MSE

Now, you're prepared for the practice session to reinforce these concepts. Let's dive in!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal