This lesson provides a quick refresher on the core concepts of linear regression, focusing on key steps and implementation in Python using sklearn
.
By the end of this lesson, you'll be ready to load datasets, split them, create and train a linear regression model, make predictions, and evaluate the model.
We'll start by loading the diabetes dataset from sklearn
. This dataset contains ten baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements), which were obtained for each of 442 diabetes patients. The target is a quantitative measure of disease progression one year after baseline.
Note that we can access features and target of this dataset by using .data
and .target
attributes.
This code prints out the first two rows of the dataset, so we can observe its structure:
There is also a shortcut for loading X and y:
The return_X_y=True
parameter allows us to split the dataset when loading. You can use any method you find comfortable.
Next, we'll split our data into training and testing sets, like we did before. As a reminder, we use the train_test_split
function for it.
Output:
The size of the test set, test_size
, is set to 0.2
, which is 20%. It is common to set the test set size to 20-30%,
Let's create a Linear Regression model and train it:
Using the trained model, let's make predictions on the test set:
We print out the first 5 predictions to observe their values.
Now, we can evaluate the model's performance by using some metric. We will apply the Mean Squared Error (MSE) metric here:
Output:
You've refreshed your knowledge on:
- Loading datasets
- Splitting data into training and testing sets
- Creating and training a linear regression model
- Making predictions
- Evaluating the model using
MSE
Now, you're prepared for the practice session to reinforce these concepts. Let's dive in!
