Welcome to our course, Intro to Data Visualization with Titanic - an in-depth exploration into the techniques and methodologies of data visualization using Python. This course is designed to provide you with comprehensive insights into real-world scenarios, helping you understand the invaluable concept of data visualization and its applications in today's data-driven world.
In the first lesson of this course, we will explore the detailed properties of the Titanic
dataset available from Seaborn
- the dataset containing the demographic and passenger information from the 891
surviving passengers out of the 2214
on board the Titanic.
Understanding the data we're working with is foundational in data analysis because it lets us gain better insights into it and spot potential errors. It also helps us form a reliable basis for further intricate analysis. The runtime of this process can vary solely based on the characteristics of the dataset and what we intend to understand from it.
So, let's delve in and explore the Titanic
dataset to understand further the people who pursued their fate on Titanic.
We shall begin our voyage into the dataset by understanding the various attributes of the Titanic
dataset.
First, let's briefly go over the features of the Titanic
dataset:
survived
: Whether the passenger survived (0
= No;1
= Yes).pclass
: Passenger class (1
= 1st;2
= 2nd;3
= 3rd).sex
: Sex of the passenger (male
orfemale
).age
: Age of the passenger (float number).sibsp
: Number of siblings/spouses aboard.parch
: Number of parents/children aboard.fare
: Passenger fare (in British pounds).embarked
: Port of Embarkation (C
= Cherbourg;Q
= Queenstown;S
= Southampton).- ... and more!
By discussing these attributes, let's familiarize ourselves with the Titanic
dataset available in Seaborn
.
The output of the head
command is in the following table:
Each row here represents a different passenger on the ship, while each column corresponds to one of the features described above.
Our dataset (titanic_df
) is a Pandas DataFrame, and it comes with many built-in functions that we can use to inspect the data:
head(n)
: Displays the firstn
entries of the DataFrame.tail(n)
: Displays the lastn
entries of the DataFrame.shape
: Returns the number of rows and columns of the DataFrame.info()
: Provides a concise summary of the DataFrame.describe()
: Generates descriptive statistics that summarize a dataset's distribution's central tendency, dispersion, and shape.
Each of these functions offers a different perspective on the Titanic
dataset:
The output shows:
- The
head
command outputs the first five rows similar to the abovementioned one. - The
tail
command outputs the last five rows of the dataframe. - The
shape
command returns(891, 15)
, indicating the dataframe has 891 rows and 15 columns. - The
info
command prints a concise summary, including the number of non-null entries for each column. - The
describe
command provides a statistics table for the dataframe's numerical columns.
You will notice from this description that the dataset contains some missing values in features like Age
and Embarked
, something we will learn to handle in later lessons.
The value_counts()
function can also be quite helpful in understanding the distribution of categorical data. For example, if you want to count how many male and female passengers were on the Titanic, you could use this command:
The nunique()
and unique()
functions could also come in handy to identify unique entries within your dataset. The former gives the count of unique entries, and the latter gives the actual unique entries.
These additional functions provide functionality to make your exploratory data analysis even more powerful!
Congratulations! You've now learned to explore and understand the Titanic
dataset's basic features and characteristics using Python and Pandas. We dove into the dataset's content, comprehensively understanding the Titanic
passengers and their tragic journey. Today's deep dive is invaluable in setting the foundation for more advanced data visualizations.
In this lesson, we learned how to:
- Load a dataset using
Seaborn
. - Explore the dataset using the various built-in functions provided by
Pandas
.
We encourage you to apply what you've learned in this beginner-friendly exploration. Take the time to explore the dataset further: check the missing values, investigate the descriptive statistics, and try using other functionalities of Pandas
.
Good luck with your journey in data visualization! Happy sailing!
