Welcome to this lesson on Feature Engineering! Today, we'll explore how to derive new features from our existing data to enhance our predictive models. These derived features could provide more insightful information that our original data might not capture directly.
Feature Engineering is an essential part of machine learning, and it's the process of using domain knowledge to create features that make machine learning algorithms work. Although modern machine learning methods can automatically derive features, manually combining existing features – based on human intuition and industry expertise – can often produce better results.
Why is Feature Engineering vital? Consider this parallel: Artistic talent won't help a painter without paints, and a high-quality dataset may be useless without proper features. The process of Feature Engineering ensures you have the 'right paint' to create your masterpiece!
Let's use the Titanic
dataset as an example. We could create a new feature, age_group
to categorize age into different groups, or another feature, family_size
, by adding sibsp
(number of siblings/spouses aboard) and parch
(number of parents/children aboard). Let's dive in!
We'll start by creating the family_size
feature. This is simply the sibsp
and parch
features added together plus one (the passenger themself). You might be wondering why we are creating the feature. The reason is that sometimes, the size of the family might have a significant impact on the survival chance of a person. For instance, if a person has a big family, they might have gotten confused and lost in the crowd, or they might have tried to look for their family members, delaying their escape.
