Ready for our next lesson? Today, we're delving into quantiles and the Interquartile Range (IQR). Quantiles divide our data into equal parts, and the IQR reveals where half of our data lies. These tools aid us in understanding the distribution of our data and in identifying outliers. With Python's pandas
and NumPy
libraries, we'll explore how to calculate these measures.
Quantiles segment data into equal intervals. For example, when we divide a group of student grades into four equal parts, we employ quartiles (Q1 - 25th percentile, Q2 - 50th percentile or median, and Q3 - 75th percentile).
The Interquartile Range (IQR) shows where half of our data lies. It's resistant to outliers; for instance, when analyzing salaries, the IQR omits extreme values, thereby depicting the range where most salaries fall.
Python's NumPy
function, percentile()
, calculates quantiles.
Quantiles are essentially just cuts at specific points in your data when it's sorted in ascending order. The first quartile (Q1) is the point below which 25% of the data falls, while the third quartile (Q3) is the point below which 75% of the data falls. The second quartile or the median is the mid-point of the data when it's sorted in ascending order.
These values are important in identifying the spread and skewness of your data. Let's consider a dataset of student scores:
