Exploring Linear Discriminant Analysis: Theory to Code

Introduction to LDA and Its Role in Supervised Learning

Welcome to the world of Linear Discriminant Analysis (LDA), a technique widely used for dimensionality reduction in supervised learning. In this lesson, we're delving into LDA and building a Python implementation. We'll use the popular Iris dataset for a practical example.

Understanding the Algorithm of LDA

LDA reduces dimensionality by constructing a feature space that optimally separates the classes in the data. The axes in this space are linear combinations of the original features and are known as eigenvectors. The LDA algorithm consists of several steps, such as calculating class mean vectors and scatter matrices, then finding eigenvalues and eigenvectors that map the original feature space on to a lower-dimensional space.

The Inner Workings of LDA: Mathematics and Intuition

To understand LDA, let's start with a simple two-dimensional example. Suppose we have data points scattered across a 2-D space, with each point belonging to one of two possible classes.

In LDA, we want to project these points onto a line such that when the points are projected, the two classes are as separated as possible. The goal here is twofold:

Maximize the distance between the means of the two classes.
Minimize the variation (in other words, the scatter) within each category.

This forms the intuition behind LDA. The crux of an LDA transformation involves formulating a weight matrix and transforming our input data by multiplying it with this weight matrix. The weights here help in increasing class separability. Let's see how we can achieve this mathematically using scatter matrices.

Scatter Matrices: Capturing Variability

Scatter matrices are a critical part of LDA. To understand them, we need to differentiate between two types of scatter matrices:

The within-class scatter matrix measures the variability of data points within each class.
The between-class scatter matrix captures the variability between different classes.

Mathematically, the within-class scatter matrix $S_W$ is computed as the sum of scatter matrices for each class, and the between-class scatter matrix $S_B$ is calculated based on the means of different classes.

The LDA Algorithm: A Step-by-Step Breakdown

Given the mathematical foundations of within-class and between-class scatter matrices, the steps involved in the LDA algorithm can be conceptualized as:

Compute class means: Calculate the average value for each class.
Compute the within-class scatter matrix $S_{W}$ .
Compute the weights that maximize separability: This is obtained by multiplying the inverse of $S_{W}$ and the difference between the class means.
Use the weights to transform our input data.

Preparing the data for LDA

Let's load the Iris dataset for LDA which is a 3-class dataset with 4 features (sepal length, sepal width, petal length, petal width)

The code above loads the Iris dataset and normalizes the data. Normalization helps balance different input variables before they're input into the machine learning algorithm.

Building Simple LDA from Scratch - Part 1: Defining the Class and its Constructor

To build our LDA from scratch, we start by defining a Python class called LDA. The class has two methods for calculating within-class and between-class scatter matrices, and finally, a fit method for calculating the transformation matrix.

The LDA class has self as its first parameter so that instance variables and methods can be accessed in the class. We define the __init__ method that Python calls when you create a new instance of this class. This method sets up a state for the object by defining the variable W which will hold the transformation matrix.

Building Simple LDA from Scratch - Part 2: Computing Within-class Scatter Matrix

The within-class scatter matrix is a crucial component in the LDA algorithm—it encapsulates the total scatter within each class. To calculate it, we first define the scatter matrix for each class, and then add them all to give us the within-class scatter matrix, $S_{W}$ .

Mathematically, the scatter matrix for a specific class is computed via:

S_{W} = \Sigma_{i=1}^{g}\Sigma_{j=1}^{N_i} (x_{ij} - m)(x_{ij} - m)^T

Building Simple LDA from Scratch - Part 3: Computing Between-class Scatter Matrix

Another pivotal component in the LDA algorithm is the between-class scatter matrix. Unlike the within-class scatter matrix, which characterizes the variance within individual classes, the between-class scatter matrix is all about capturing the variance between different classes—hence the name.

To compute the between-class scatter matrix, we need to look at the means, or central tendencies, of our different classes. Just as in our example from the previous section—children with a preference for chocolate versus vanilla flavor—if we measure the average preference strength for chocolate among all chocolate-loving children and do the same for the vanilla-loving children, we'll have the two class mean vectors.

The between-class scatter matrix computes the covariance of these class means. Simply put, it measures how different the two classes are from each other in terms of their central tendencies.

Just imagine a scatter on a graph: Points that represent individual children are scattered around the mean point (average taste preference strength) for their class—either chocolate or vanilla. The scatter within class captures how far apart these children are spread out within their flavor preference.

Now think about the means of both classes on this graph: Are the lovers of chocolate, in general (on average), similar in taste strength to the lovers of vanilla, or are they different? The between-class scatter captures this difference—the scatter among the class means.

In other words, the between-class scatter matrix captures the 'distance' between the two classes—in our example, the distance between chocolate-lovers and vanilla-lovers, based on their average preference strength.

Mathematically, the between-class scatter matrix $S_{B}$ is calculated as follows:

Building Simple LDA from Scratch - Part 4: Eigenvalues, Eigenvectors, and Subspace Transformation

Now that we've calculated the scatter matrices, we can write the fit method. This method finds the generalized eigenvalues, and corresponding eigenvectors, and selects a subset for transformation.

Additionally, we define a function for final transformation:

Let's understand the implementation step-by-step — the heart of the LDA algorithm lies in solving a traditional eigenvalue problem. After computing the scatter matrices $S_W$ and $S_B$ , we solve the generalized eigenvalue problem for the matrix , which is done in the following line of the method:

Transforming the Samples onto the new Subspace

Next, we create an instance of the LDA class and call the fit method passing in our original feature space, the target variable, and the number of components we need for transformation.

We then perform projection of the dataset from the original features space to the new subspace. Afterward, we plot the projected samples.

Finally, we can plot the transformed data:

Lesson Summary and Upcoming Practice

Great job! You've successfully understood the concepts of LDA and its mathematical foundation, and built a simple LDA from scratch using Python. Now, get ready for practice exercises to apply your newly acquired knowledge and consolidate learning! Happy learning!

Next Lesson: Exploring Linear Discriminant Analysis with Scikit-Learn

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal