Lesson Introduction

This lesson provides a quick refresher on the core concepts of linear regression, focusing on key steps and implementation in Python using sklearn.

By the end of this lesson, you'll be ready to load datasets, split them, create and train a linear regression model, make predictions, and evaluate the model.

Loading Data

We'll start by loading the diabetes dataset from sklearn. This dataset contains ten baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements), which were obtained for each of 442 diabetes patients. The target is a quantitative measure of disease progression one year after baseline.

Note that we can access features and target of this dataset by using .data and .target attributes.

This code prints out the first two rows of the dataset, so we can observe its structure:

There is also a shortcut for loading X and y:

The return_X_y=True parameter allows us to split the dataset when loading. You can use any method you find comfortable.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal