Diving into Box Plots: Passenger Class, Fare, and Survival

Are you ready for another captivating session? Today, we are taking a step further into the captivating world of data visualization by learning how to use box plots. Box plots are unique in providing a snapshot of a dataset's distribution and outlier detection, all in one plot!

Box plots are crucial in understanding the Titanic dataset, particularly in discovering relationships between survival rates, passenger classes, and fares. This can answer our central question: How did the passenger class and fare correlate with survival?

Introducing Box Plots

A box plot, also known as a whisker plot, is a standardized way of displaying the data distribution based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the interquartile range.

We can create a box plot using the boxplot() function in the Python Seaborn library. First, let's start with pclass (passenger class) against fare:

plot

In the box plot:

  • The box represents the interquartile range (i.e., 25th to 75th percentile) of the fares in each passenger class.
Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal