Descriptive Statistics in R: Diving Into Measures of Dispersion

Introduction and Overview

Welcome back! Our journey into Descriptive Statistics continues with Measures of Dispersion. These measures, which include the range, variance, and standard deviation, inform us about the extent to which our data is spread. R's built-in statistical functions offer all we need to thoroughly understand dispersion in our data. Let's dive right in!

Understanding Measures of Dispersion

Measures of Dispersion capture the spread within a dataset. For example, knowing the average test scores (a Measure of Centrality) isn't enough. Understanding how those scores vary from the average provides a fuller picture. This enhanced comprehension is vital for daily data analysis.

Visualizing Measures of Dispersion

The graph below illustrates two normal distributions with varying standard deviations. A standard deviation measures how much each data point deviates from the average. Observe the width of the curve under each distribution: a smaller spread, reflected by the blue curve, corresponds to a smaller standard deviation. Most of the data points are closer to the mean. In contrast, the wider spread, denoted by the green curve, reveals a greater standard deviation and suggests that data points vary more widely around the mean.

Calculating Range in R

The range, simply the difference between the highest and lowest values, illustrates the spread between the extremes of our dataset. We can calculate the range of a set of numbers using R's built-in function, range(). Here, we calculate the range of test scores for five students:

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal