Unveiling High-Dimensional Data: An Introduction to t-Distributed Stochastic Neighbor Embedding (t-SNE)

Introduction to the Lesson

Let's embark on another captivating adventure within our Intro to Unsupervised Machine Learning course. We have already delved into key topics, including unsupervised machine learning techniques, the concept of clusters with k-means clustering, and the secrets of dimensionality reduction methods such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA).

This lesson introduces another critical tool in the toolbox for dimensionality reduction - the t-Distributed Stochastic Neighbor Embedding (t-SNE). This advanced technique offers an impressive way to visualize high-dimensional data by minimizing the divergence or difference between two distributions - namely, a pair modeled over high-dimensional and corresponding low-dimensional space.

Our primary objective in this lesson is to provide you with an in-depth understanding of the mechanism and theory underlying the t-SNE algorithm. Using hands-on examples, we will transition from theory to practice and implement it in Python using the scikit-learn library. To keep things consistent, we will continue using the Iris dataset, a popular dataset in machine learning. Now, let's delve into the fascinating world of t-SNE.

Introduction to t-Distributed Stochastic Neighbor Embedding

Visualizing high-dimensional data can be quite challenging. Imagine plotting points in a space with more than three dimensions - it's almost impossible for our human brains to comprehend! However, t-SNE, a non-linear dimensionality reduction technique, comes to our rescue. t-SNE is particularly great for visualizing high-dimensional datasets in a 2D or even 3D space.

This method was developed by Laurens van der Maaten and Geoffrey Hinton in 2008. Simply put, t-SNE maps high-dimensional data points to a lower-dimensional space (2D or 3D). Fascinatingly, it keeps similar data points close together and dissimilar data points far apart in this lower-dimensional space. Neat, right?

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal