Welcome back! In this lesson, we will delve into another powerful technique for feature selection — SelectFromModel
. This technique is particularly useful when you have a trained model and want to select the most important features based on the model's importance criterion.
SelectFromModel
is a meta-transformer that can be used along with any estimator that assigns importance to each feature through a specific attribute (like coef_
or feature_importances_
). Later, you can set a threshold, and SelectFromModel
will consider those features whose importance is more than this threshold.
So, in essence, SelectFromModel
does the heavy lifting of identifying and choosing the right features based on the model's importance criterion — a significant advantage for any machine learning practitioner!
While discussing dimensionality reduction and feature selection remains crucial, these theories gain relevance and become more comprehensible when we apply them to real-life datasets. For the purpose of our lesson, we shall work with the California Housing dataset available in scikit-learn's
set of datasets.
Let's begin by loading and briefly exploring the dataset:
