Topic Overview

Hello and welcome! In today's lesson, we will learn how to make predictions using a trained Linear Regression model and evaluate the model's performance using the Mean Squared Error (MSE) metric. We will use the diamonds dataset to demonstrate this process.

Recap of the Trained Model

Before we dive into making predictions, let's briefly recap the steps we took to prepare and train our Linear Regression model.

First, we loaded the diamonds dataset using seaborn and prepared it by converting categorical variables into dummy variables for numerical compatibility. Next, we selected our features and target variable, and split the data into training and testing sets to ensure our model would generalize well to unseen data. Finally, we created and trained our Linear Regression model:

With the trained model ready, we can now move on to making predictions.

Making Predictions on Test Data

To make predictions with our trained model, we use the predict method provided by the LinearRegression class. This method will generate predicted values for our test data.

Here’s how to use the predict method and display the first 10 predictions:

The output of the above code will be:

This output represents the first ten predicted prices of diamonds based on the model. Each number corresponds to the model's prediction of a diamond's price within the test dataset.

By generating predictions, we can now compare these predicted values to the actual values in our test set to evaluate the model's performance.

Calculating and Understanding Mean Squared Error (MSE)

The Mean Squared Error (MSE) is a metric that measures the average of the squares of the errors — that is, the average squared difference between the predicted values and the actual values. A lower MSE indicates a better fit of the model to the data. In mathematical terms, MSE is defined as: MSE=1ni=1n(yiy^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 where nn is the number of observations, yiy_i are the actual values, and y^i\hat{y}_i are the predicted values. A lower MSE indicates a better fit of the model to the data, while a higher MSE indicates larger errors in the predictions. However, it’s essential to consider that MSE is sensitive to outliers, and a large error in any prediction can disproportionately affect the MSE value.

To calculate the MSE in code, we use the mean_squared_error function from the sklearn.metrics module. Here’s the code to perform this calculation and print the result:

The output of the above calculation will be:

This function compares the predicted values with the actual values in the test set and computes the MSE, giving us a sense of the model's accuracy. More strictly, on average, the squared difference between the model's predictions and actual diamond prices is approximately 1,288,705.

This highlights the variability in the model's predictions relative to actual prices, emphasizing areas for improving the model's accuracy.

Lesson Summary

In this lesson, we learned how to make predictions using a pre-trained Linear Regression model and evaluated the model’s performance using Mean Squared Error (MSE). Understanding prediction and evaluation is crucial in making informed decisions based on model outputs. Keep going, and happy coding!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal