Feature Selection in R

Introduction

Welcome back! In this lesson, we’ll explore practical techniques for feature selection in R. Feature selection helps you focus on the most relevant variables, improving model performance, interpretability, and training efficiency. We’ll use a model-based approach with linear regression and rank features by the magnitude of their standardized coefficients.

To keep everything self-contained and portable, we’ll work with the built-in mtcars dataset and predict mpg (miles per gallon) from the remaining columns.

Exploring the Dataset

Sample head(mtcars) output:

Target: mpg
Predictors: all other columns (cyl, disp, hp, wt, etc.)

Training a Linear Model (with Standardization)

Coefficients depend on feature scales. To compare feature importance fairly, we’ll standardize predictors (mean 0, sd 1) before fitting the model.

Selecting Important Features (Coefficient Magnitude)

We’ll measure importance using the absolute value of standardized coefficients. Larger absolute coefficients indicate a stronger relationship with the target.

Threshold-Based Selection

Instead of choosing a fixed number of features, you can select everything above a threshold on |standardized coefficients|:

Lesson Summary

In this lesson, you:

Loaded a built-in dataset (mtcars) and defined a target (mpg).
Standardized predictors to make coefficients comparable.
Fit a linear model and ranked features by |standardized coefficients|.
Selected features via top-k and threshold approaches.

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal