Welcome to the lesson on K-Nearest Neighbors (KNN)! Today, we’ll explore this simple yet intuitive algorithm. KNN is used for classification and regression tasks. Our goal is to understand how KNN works and implement it in Python using Scikit-Learn
. By the end, you'll be able to classify data points based on their features.
What is KNN? Imagine identifying a fruit as an apple or an orange. Instead of using a dictionary, you ask nearby people for their opinions. The majority wins. This is the idea behind KNN, classifying a data point based on its nearest neighbors.
Let's take a look at an example:
In this image, we see a target point (black cross) that we want to predict the class for. This target point's three nearest neighbors are two red points and one green point. As the majority of the neighbors are red points, the target point will be also classified as a red point.
Why use KNN? It's easy to understand and implement. It is useful in recommending products, recognizing medical patterns, etc.
Let's load the Iris dataset, which contains information about different flowers. Here's how we do it using Scikit-Learn
:
This code loads the Iris dataset and splits it into features X
and labels . The dataset contains various information about flowers, including Sepal Length, Sepal Width, Petal Length and Petal Width. Our goal is to predict the type of the flower, which is one of the following: Setosa, Versicolour, and Virginica.
