Welcome to the next guide on our remarkable voyage! Moving into our discussion on multivariate data visualization, we'll introduce you to scatter plots, one of the most powerful tools for visualizing the relationship between multiple variables. We'll guide you through plotting scatter plots for different variable pairs in our Titanic
dataset and stepping further into correlating these variables.
Why is it important to understand the correlation among variables? Imagine, we want to know whether passengers in the higher classes were more likely to survive. Or maybe we are interested in the fare paid for a ticket correlates with the survival on Titanic
. Finding correlations among variables will help us generate hypotheses, create insightful visualizations, and eventually enable efficient predictive modeling.
By the end of this lesson, you'll be conversant with how scatter plots and correlation techniques can be used to explore and visualize relationships between different features present in a multivariate dataset.
A scatter plot is a versatile visualization tool that can disclose the relationship, if any exists, between two variables. Each point on the plot represents an observation in the dataset, with its position along the X and Y axes representing the values of two variables.
Let's initiate with a scatter plot depicting the relationship between age
and fare
.
