Introduction

Our destination for today's learning journey is Outlier Detection in Passenger Data. We'll be delving into the vast pool of machine learning data preparation, with a special emphasis on the Titanic Dataset. So, why are we focusing on outliers?

So, outliers are data points that significantly deviate from the other data points in our dataset. They can drastically influence the outcomes of our data analysis and machine learning models, possibly leading to erroneous results. While exploring the Titanic Dataset, we may encounter outliers in variables such as extreme ages or abnormally high ticket prices.

In this lesson, we aim to introduce you to Python and the Pandas library's power, allowing you to detect and appropriately handle outliers lucidly. Our itinerary includes understanding the concept of outliers, learning various techniques for their detection, and then exploring strategies to handle them effectively.

The three common methods for outlier detection are Z-score (identifying data points with a Z-score greater than 3 as outliers), IQR (defining outliers as observations outside the range of Q11.5IQRQ_1 - 1.5 \sdot IQR and ), and (categorizing data points more than three standard deviations from the mean as outliers).

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal