Understanding K-Nearest Neighbors Algorithm

Sep 9, 2024

Lecture Notes: K-Nearest Neighbors (KNN) Algorithm

Overview

  • Focus on K-Nearest Neighbors (KNN) algorithm for classification.
  • Objective: Predict kyphosis disease in children based on features such as age, number, and start.
  • KNN can be used for regression tasks, but the focus is on classification.

Basics of KNN

  • KNN finds similar data points in training data to make predictions.
  • Example: Classifying t-shirt sizes (large or small) based on weight (kg) and height (cm).
    • Red class: large size.
    • Blue class: small size.

Working of KNN

  1. Select a Value for K
    • K represents the number of neighbors considered.
    • It's a tunable parameter.
  2. Calculate Euclidean Distance
    • Distance between a new data point and all points in the dataset.
    • Formula: [ \text{Distance} = \sqrt{(x2 - x1)^2 + (y2 - y1)^2} ]
  3. Pick K Closest Data Points
    • Choose points with the smallest distances.
  4. Majority Vote
    • Determine the class based on the majority of selected neighbors.
    • Classify the new point based on the dominant class.

Example Process

  • Example adopted from ListenData.com.
  • Data includes height, weight, and t-shirt size (small or large).
  • Calculate Euclidean distances and rank them.
  • Select K (e.g., K=5) and identify the five closest data points.
  • Perform majority voting among these points to classify the new data point.
    • If majority is small, classify as small; otherwise, classify as large.

Visual Representation

  • Training data: Small size vs. Large size.
  • New data point classification via KNN.
  • Majority voting decides the final class.

Next Steps

  • Next lesson: KNN in AWS SageMaker.
    • Discuss CPU and data requirements.
    • Demonstration of coding the algorithm in SageMaker Studio.

  • Stay tuned for the next lesson.
  • Best of luck!