Introduction to Outliers Detection and Treatment

Welcome to our detailed exploration of outliers detection and treatment in predictive modeling. Using real-life scenarios such as uneven pricing in housing markets, we will delve into statistical methodologies to identify outliers. Imagine an apartment costing significantly less or a mansion priced substantially higher than the standard in an area; these data points can skew the average, affecting our predictive analysis. In this session, we’re going to employ the California Housing Dataset to identify these critical data points and effectively execute robust treatment strategies.

Detecting Outliers with Z-Scores

To systematically identify outliers, we start by implementing the z-score method—a statistical measure that quantifies how many standard deviations a data point is from the mean. In mathematical terms, for a given data point (x), the z-score (z)(z) is calculated as:

z=(xμ)σz = \frac{(x - \mu)}{\sigma}
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal