Lesson Introduction

Hello! Today, we're diving into Polynomial Regression, an advanced form of regression analysis for modeling complex relationships between variables. We'll learn how to use Python and Scikit-Learn to perform polynomial regression. By the end, you'll know how to create polynomial features, train a model, and make predictions.

Polynomial regression is useful for capturing non-linear relationships. For instance, predicting exam scores (the target) based on study hours (the feature) might not follow a simple linear pattern. Polynomial regression can help in such cases.

Understanding Polynomial Features

Why do we need polynomial features? To fit a curve instead of a straight line, we create new features that include polynomial terms (like x2x^2, x3x^3). This helps in modeling more complex relationships.

Scikit-Learn offers PolynomialFeatures to transform our input data. Here's how it works:

The new X_poly includes the original term, its square, and an intercept term (the first column).

Loading and Preparing Data

We'll create data to work with. We'll generate random values between -1 and 1 as features, and our target variable will follow a quadratic equation y=3x2+2x+noisey = 3x^2 + 2x + \text{noise}, simulating realistic data with some noise.

Now, we have the data where our target variable has a non-linear relationship with the feature.

Splitting Data into Training and Test Sets

As always, we'll split our data into training and test sets to train and evaluate our model. We will use X_train to train the model and X_test to evaluate its peformance.

Training a Simple Linear Regression Model

First, we'll train a simple linear regression model without polynomial features, like we did in the first lesson.

Now, we have the MSE score for a regular linear regression model. There is not much to say about it, but we can use it to compare this model to others. Let's train a smarter polynomial regression model and check if it works better.

Transforming Features and Training a Polynomial Regression Model

Next, we'll transform the input data to include polynomial terms and train a polynomial regression model.

By applying PolynominalFeatures(degree=2)().fit_transform() to our data (both X_train and X_test), we create a new feature that models a quadratic relationship.

Having trained both models, we can now compare their performance using the mean squared error (MSE).

The polynomial regression model has a much lower MSE, indicating it fits the data much better.

Lesson Summary

Great job! We covered polynomial regression, from creating polynomial features to training a model and making predictions. Here’s a quick recap:

  • Polynomial Features: We used PolynomialFeatures to transform our features.
  • Sample Data: We created a sample dataset using a quadratic formula with noise.
  • Train/Test Split: We split the data into training and test sets.
  • Model Training: We trained both a simple linear regression model and a polynomial regression model.
  • Evaluation: We compared their performance using MSE.

Next, you'll move to practice, where you'll apply what you've learned. You'll generate your own polynomial features, train models, and make predictions.

Happy coding!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal