Welcome back! Our journey into Descriptive Statistics continues with Measures of Dispersion. These measures, including range, variance and standard deviation, inform us about the extent to which our data is spread out. We'll use Python's numpy
and pandas
libraries to paint a comprehensive picture of our data's dispersion. Let's dive right in!
Measures of Dispersion capture the spread within a dataset. For example, apart from knowing the average test scores (a Measure of Centrality), understanding the ways in which the scores vary from the average provides a fuller picture. This enhanced comprehension is vital in everyday data analysis.
This graph illustrates two normal distributions with varying standard deviations. Standard deviation measures how much each data point deviates from the average. Notice the curve's width under each distribution: a smaller spread (blue curve) reflects a smaller standard deviation, where most of the data points are closer to the mean. In contrast, a wider spread (green curve) signifies a greater standard deviation and that data points vary more widely around the mean.
The Range, simply the difference between the highest and lowest values, illustrates the spread between the extremes of our dataset. Python's numpy
library has a function, (peak to peak), to calculate the range. Here are the test scores of five students:
