KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (2024)

The K-Nearest Neighbors (K-NN) algorithm is a popular Machine Learning algorithm used mostly for solving classification problems.

In this article, you'll learn how the K-NN algorithm works with practical examples.

We'll use diagrams, as well sample data to show how you can classify data using the K-NN algorithm. We'll also discuss the advantages and disadvantages of using the algorithm.

How Does the K-Nearest Neighbors Algorithm Work?

The K-NN algorithm compares a new data entry to the values in a given data set (with different classes or categories).

Based on its closeness or similarities in a given range (K) of neighbors, the algorithm assigns the new data to a class or category in the data set (training data).

Let's break that down into steps:

Step #1 - Assign a value to K.

Step #2 - Calculate the distance between the new data entry and all other existing data entries (you'll learn how to do this shortly). Arrange them in ascending order.

Step #3 - Find the K nearest neighbors to the new entry based on the calculated distances.

Step #4 - Assign the new data entry to the majority class in the nearest neighbors.

Don't worry if the steps above seem confusing at the moment. The examples in the sections that follow will help you understand better.

K-Nearest Neighbors Classifiers and Model Example With Diagrams

With the aid of diagrams, this section will help you understand the steps listed in the previous section.

Consider the diagram below:

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (1)

The graph above represents a data set consisting of two classes — red and blue.

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (2)

A new data entry has been introduced to the data set. This is represented by the green point in the graph above.

We'll then assign a value to K which denotes the number of neighbors to consider before classifying the new data entry. Let's assume the value of K is 3.

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (3)

Since the value of K is 3, the algorithm will only consider the 3 nearest neighbors to the green point (new entry). This is represented in the graph above.

Out of the 3 nearest neighbors in the diagram above, the majority class is red so the new entry will be assigned to that class.

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (4)

The last data entry has been classified as red.

K-Nearest Neighbors Classifiers and Model Example With Data Set

In the last section, we saw an example the K-NN algorithm using diagrams. But we didn't discuss how to know the distance between the new entry and other values in the data set.

In this section, we'll dive a bit deeper. Along with the steps followed in the last section, you'll learn how to calculate the distance between a new entry and other existing values using the Euclidean distance formula.

Note that you can also calculate the distance using the Manhattan and Minkowski distance formulas.

Let's get started!

BrightnessSaturationClass
4020Red
5050Blue
6090Blue
1025Red
7070Blue
6010Red
2580Blue

The table above represents our data set. We have two columns — Brightness and Saturation. Each row in the table has a class of either Red or Blue.

Before we introduce a new data entry, let's assume the value of K is 5.

How to Calculate Euclidean Distance in the K-Nearest Neighbors Algorithm

Here's the new data entry:

BrightnessSaturationClass
2035?

We have a new entry but it doesn't have a class yet. To know its class, we have to calculate the distance from the new entry to other entries in the data set using the Euclidean distance formula.

Here's the formula: √(X₂-X₁)²+(Y₂-Y₁)²

Where:

  • X₂ = New entry's brightness (20).
  • X₁= Existing entry's brightness.
  • Y₂ = New entry's saturation (35).
  • Y₁ = Existing entry's saturation.

Let's do the calculation together. I'll calculate the first three.

Distance #1

For the first row, d1:

BrightnessSaturationClass
4020Red

d1 = √(20 - 40)² + (35 - 20)²
= √400 + 225
= √625
= 25

We now know the distance from the new data entry to the first entry in the table. Let's update the table.

BrightnessSaturationClassDistance
4020Red25
5050Blue?
6090Blue?
1025Red?
7070Blue?
6010Red?
2580Blue?
Distance #2

For the second row, d2:

BrightnessSaturationClassDistance
5050Blue?

d2 = √(20 - 50)² + (35 - 50)²
= √900 + 225
= √1125
= 33.54

Here's the table with the updated distance:

BrightnessSaturationClassDistance
4020Red25
5050Blue33.54
6090Blue?
1025Red?
7070Blue?
6010Red?
2580Blue?
Distance #3

For the third row, d3:

BrightnessSaturationClassDistance
6090Blue?

d2 = √(20 - 60)² + (35 - 90)²
= √1600 + 3025
= √4625
= 68.01

Updated table:

BrightnessSaturationClassDistance
4020Red25
5050Blue33.54
6090Blue68.01
1025Red?
7070Blue?
6010Red?
2580Blue?

At this point, you should understand how the calculation works. Attempt to calculate the distance for the last four rows.

Here's what the table will look like after all the distances have been calculated:

BrightnessSaturationClassDistance
4020Red25
5050Blue33.54
6090Blue68.01
1025Red10
7070Blue61.03
6010Red47.17
2580Blue45

Let's rearrange the distances in ascending order:

BrightnessSaturationClassDistance
1025Red10
4020Red25
5050Blue33.54
2580Blue45
6010Red47.17
7070Blue61.03
6090Blue68.01

Since we chose 5 as the value of K, we'll only consider the first five rows. That is:

BrightnessSaturationClassDistance
1025Red10
4020Red25
5050Blue33.54
2580Blue45
6010Red47.17

As you can see above, the majority class within the 5 nearest neighbors to the new entry is Red. Therefore, we'll classify the new entry as Red.

Here's the updated table:

BrightnessSaturationClass
4020Red
5050Blue
6090Blue
1025Red
7070Blue
6010Red
2580Blue
2035Red

How to Choose the Value of K in the K-NN Algorithm

There is no particular way of choosing the value K, but here are some common conventions to keep in mind:

  • Choosing a very low value will most likely lead to inaccurate predictions.
  • The commonly used value of K is 5.
  • Always use an odd number as the value of K.

Advantages of K-NN Algorithm

  • It is simple to implement.
  • No training is required before classification.

Disadvantages of K-NN Algorithm

  • Can be cost-intensive when working with a large data set.
  • A lot of memory is required for processing large data sets.
  • Choosing the right value of K can be tricky.

Summary

In this article, we talked about the K-Nearest Neighbors algorithm. It is often used for classification problems.

We saw an example using diagrams to explain how the algorithms works.

We also saw an example using sample data to see the steps involved in classifying a new data entry.

Lastly, we discussed the advantages and disadvantages of the algorithm, and how you can choose the value of K.

Happy coding!

KNN Algorithm – K-Nearest Neighbors Classifiers and Model Example (2024)

FAQs

What is an example of KNN classification algorithm? ›

Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure.

What is the KNN algorithm for nearest neighbors? ›

What is K nearest neighbors algorithm? A. KNN classifier is a machine learning algorithm used for classification and regression problems. It works by finding the K nearest points in the training dataset and uses their class to predict the class or value of a new data point.

What is the best way to choose k in KNN algorithm? ›

The optimal K value usually found is the square root of N, where N is the total number of samples. Use an error plot or accuracy plot to find the most favorable K value. KNN performs well with multi-label classes, but you must be aware of the outliers.

How to solve KNN? ›

Workings of KNN algorithm
  1. Step 1: Selecting the optimal value of K. K represents the number of nearest neighbors that needs to be considered while making prediction.
  2. Step 2: Calculating distance. ...
  3. Step 3: Finding Nearest Neighbors. ...
  4. Step 4: Voting for Classification or Taking Average for Regression.
Jan 25, 2024

What is the KNN algorithm equation? ›

The k-nearest neighbor classifier fundamentally relies on a distance metric. The better that metric reflects label similarity, the better the classified will be. The most common choice is the Minkowski distance dist(x,z)=(d∑r=1|xr−zr|p)1/p.

How to use KNN to classify data? ›

To build a KNN classifier, one needs to choose the value of k, calculate the distances between the new data point and all existing data points, and then classify the new point based on the most common class among its k nearest neighbors.

Which algorithm is better than KNN? ›

While both algorithms yield positive results regarding the accuracy in which they classify the images, the SVM provides significantly better classification accuracy and classification speed than the kNN.

How can I improve my KNN classification? ›

What are the most effective ways to improve k-nearest neighbor search accuracy?
  1. Choose the right k value.
  2. Use a suitable distance metric.
  3. Scale and normalize the data. Be the first to add your personal experience.
  4. Reduce the dimensionality. ...
  5. Use an efficient data structure. ...
  6. Use an ensemble method. ...
  7. Here's what else to consider.
Dec 28, 2023

What is the rule of thumb for KNN? ›

K-NN algorithm is an ad-hoc classifier used to classify test data based on distance metric. However, the value of K is non-parametric and a general rule of thumb in choosing the value of K is K=√n, where n stands for the number of samples in the training dataset.

How do I choose distance in KNN? ›

The most intuitive and widely used distance metric for KNN is the Euclidean distance, which is the straight-line distance between two points in a vector space. It is calculated by taking the square root of the sum of the squared differences between the corresponding coordinates of the two points.

What is KNN in layman's terms? ›

K-nearest neighbor (KNN) is a simple algorithm that stores all available cases and classifies new data or cases based on a similarity measure. It is mostly used to classify a data point based on how its neighbors are classified.

How does KNN work mathematically? ›

KNN works by finding the distances between a query and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the most frequent label (in the case of classification) or averages the labels (in the case of regression).

What are the applications of KNN? ›

It's used in many different areas, such as handwriting detection, image recognition, and video recognition. KNN is most useful when labeled data is too expensive or impossible to obtain, and it can achieve high accuracy in a wide variety of prediction-type problems.

What is the KNN algorithm for image classification? ›

KNN Algorithm — Explained

KNN (K nearest neighbours) is a data science model used to identify the closest approximate neighbours to the input data. The input data can be of any format (in our case it would be images). For images, the comparison is done by converting that image into vectors in a multi-dimensional plane.

What is KNN model based approach in classification? ›

Our method constructs a kNN model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied for different data, and is optimal in terms of classification accuracy.

Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 5635

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.