Are you ready for another captivating session? Today, we are taking a step further into the captivating world of data visualization by learning how to use box plots. Box plots are unique in providing a snapshot of a dataset's distribution and outlier detection, all in one plot!
Box plots are crucial in understanding the Titanic
dataset, particularly in discovering relationships between survival rates, passenger classes, and fares. This can answer our central question: How did the passenger class and fare correlate with survival?
A box plot, also known as a whisker plot, is a standardized way of displaying the data distribution based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the interquartile range.
We can create a box plot using the boxplot()
function in the Python Seaborn library. First, let's start with pclass (passenger class)
against fare
:
In the box plot:
- The box represents the interquartile range (i.e., 25th to 75th percentile) of the fares in each passenger class.
