Welcome back to our course - Data Cleaning and Preprocessing in Machine Learning. Today's mission revolves around Feature Engineering on the Titanic dataset. By the end of today's lesson, your toolkit will be loaded with skills that revolve around feature creation, modification, and encoding. Your expertise in Python and Pandas will also be put into practice, reinforcing your knowledge in the process.
Feature engineering is the process of creating optimized features that improve the effectiveness of machine learning algorithms. This process utilizes the data to create new features and modify existing ones. This might involve creating new features, transforming existing features, or identifying and removing irrelevant ones. For instance, in our Titanic dataset, we have properties or indicators like age
, sex
, pclass
, etc., which might need some optimizing.
Let's take sibsp
and parch
as an example: sibsp
shows the number of siblings/spouses aboard while parch
shows the number of parents/children onboard. Because these features both indicate the number of family members onboard for each individual, one might see them as similar features or even overlapping. Hence, we can combine these two features to create a new feature: family_size
.
