Hello and welcome! In today's lesson, we will focus on visualizing the distribution of diamond prices using histograms and Kernel Density Estimates (KDE). This visualization is a crucial part of Exploratory Data Analysis (EDA) and helps us uncover patterns in our data.
By the end of this lesson, you will be able to create a histogram, overlay it with a KDE, and interpret the resulting visualization effectively.
A histogram is a type of bar plot that groups data points into specified ranges (bins) and then displays the number of points that fall into each bin. This makes histograms useful for understanding the distribution, central tendency, and variability of your data. Here is a simple example, with the corresponding figure below:
The hist()
function in Matplotlib takes several parameters:
- x: The data array for which the histogram will be generated.
- bins: The number of intervals the data range is divided into. It can also be a sequence defining the bin edges.
- range: The lower and upper range of the bins.
- density: If True, it normalizes the histogram to form a probability density.
- cumulative: If True, it computes a cumulative histogram.
Kernel Density Estimate (KDE) is a method used to estimate the probability density function of a continuous variable. Unlike histograms, KDEs provide a smooth curve representing the data distribution, as presented below. This can offer a clearer picture of the data.
