Welcome! Today's topic is an essential technique in data science and machine learning, called Recursive Feature Elimination (RFE). It's a method used for feature selection—choosing the most relevant input variables in our training data.
In Recursive Feature Elimination, we initially fit the model using all available features. Then, we recursively eliminate the least important features and fit the model again. We continue this process until we are left with the specified number of features. The result is a model that’s potentially more efficient and can generalize better.
The concept of Recursive Feature Elimination is simple yet powerful. It is based on the idea of recursively removing the least important features from the model. The process involves the following steps:
- Fit the model using all available features.
- Rank the features based on their importance (coefficients, impurity-based importance, etc.).
- Remove the least important feature(s).
- Repeat steps 1–3 until the desired number of features is reached.
We’ll generate a synthetic dataset with informative and noisy features using mlbench::mlbench.friedman1, then convert it into a binary classification target.
To avoid hook mismatches, we’ll build a self-consistent function set for RFE using caret::caretFuncs and a train(method = "rpart") model. We’ll then rank with varImp.train.
Useful attributes:
rfe_result$optVariables: names of the selected features.rfe_result$variables: ranking of all features across resamples.
Example output:
Explanation:
- The
"Selected Features"output shows the names of the top features chosen by RFE. - The feature rankings table lists each feature (
var), its importance score (Overall), the number of variables considered in that resample (Variables), and the resample fold (Resample). HigherOverallvalues indicate greater importance for that feature in the model. - Features with an importance of
0.00000are considered uninformative by the model in that fold.
Feature selection improves the efficiency of the model by reducing computational complexity and can improve performance by eliminating irrelevant and redundant features. It also increases interpretability—highlighting which variables the model relies on most.
You’ve learned how to apply RFE in R using a consistent rpart + caret workflow and how to interpret the selected features and rankings.
