Understanding K-Nearest Neighbors Algorithm

k-nearest neighbor is a super simple supervised machine learning algorithm that can be solved for both classification and regression problem. Here is a simple two-dimensional example for you to have a better understanding of this algorithm. Let's say we want to classify the given point into one of the three groups. In order to find the k-nearest neighbors of the given point, we need to calculate the distance between the given point to the other points. There are many distance functions but Euclidean is the most commonly used one. Then we need to sort the nearest neighbors of the given point by the distances in increasing order. For the classification problem, the point is classified by a vote of its neighbors. Then the point is assigned to the class most common among its key nearest neighbors. Key value here controls the balance between overfitting and underfitting. The best value can be found with cross validation and the learning curve. A small k value usually leads to low bias but high variance, and a large k usually leads to high bias but low variance. It is important to find a balance between them. For the regression problem, we simply return the average of the k nearest the neighbors labels as the prediction. Here it is a simple code example. We use the very famous Ares dataset and only take the first two features for demonstration purpose. The kiniest neighbor algorithm is from sklearn, and it is self-explanatory. I encourage you to try it yourself with different parameters. These two plots are real visualization from the previous code example with different k settings. The left plot shows the classification decision boundary with k equal to 15, and the right plot is for k equal to 3. Thanks for watching. This is the Betasize ML concept from Intuitive Machine Learning. If you like this video and want to learn more, make sure to comment, like and subscribe to our channel. See you at the next one.

In order to find the k-nearest neighbors of the given point, we need to calculate the distance between the given point to the other points. There are many distance functions but Euclidean is the most commonly used one. Then we need to sort the nearest neighbors of the given point by the distances in increasing order.

For the classification problem, the point is classified by a vote of its neighbors. Then the point is assigned to the class most common among its key nearest neighbors. Key value here controls the balance between overfitting and underfitting.

The best value can be found with cross validation and the learning curve. A small k value usually leads to low bias but high variance, and a large k usually leads to high bias but low variance. It is important to find a balance between them. For the regression problem, we simply return the average of the k nearest the neighbors labels as the prediction. Here it is a simple code example.

We use the very famous Ares dataset and only take the first two features for demonstration purpose. The kiniest neighbor algorithm is from sklearn, and it is self-explanatory. I encourage you to try it yourself with different parameters.

These two plots are real visualization from the previous code example with different k settings. The left plot shows the classification decision boundary with k equal to 15, and the right plot is for k equal to 3. Thanks for watching. This is the Betasize ML concept from Intuitive Machine Learning.

If you like this video and want to learn more, make sure to comment, like and subscribe to our channel. See you at the next one.

Transcript for:Understanding K-Nearest Neighbors Algorithm

Transcript for:
Understanding K-Nearest Neighbors Algorithm