📊

K-Means Clustering Algorithm

Jul 19, 2024

K-Means Clustering Algorithm

Introduction

  • Goal: Divide dataset into clusters using K-Means and Euclidean distance.
  • Data Points: A1, A2, A3, B1, B2, B3, C1, C2.
  • Initial Centroids: A1, B1, C1 (provided)
    • If not provided, any points can be chosen as initial centroids.

Steps to Apply K-Means Algorithm

  1. Calculate distances: Use Euclidean distance formula to compute distances from data points to centroids.
    • Euclidean Distance: ( \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} )
  2. Assign Clusters: Select the smallest distance to assign points to clusters
  3. Update Centroids: Calculate the new centroids for each cluster.
  4. Repeat: Recalculate distances with new centroids and reassign points until convergence.

Distance Calculation Example

  • Formula: [ \text{Distance} = \sqrt{(x2 - x1)^2 + (y2 - y1)^2} ]
    • Example: Distance between A1 (2, 10) and centroid (2, 10)
      • [ \sqrt{(2-2)^2 + (10-10)^2} = 0 ]
    • Example: Distance for another point calculation:
      • [ \sqrt{(2-2)^2 + (10-5)^2} = 5 ]

First Iteration

  • Clusters assignment based on distances
    • A1 to Cluster 1
    • A2, C1 to Cluster 3
    • A3, B1, B2, B3, C2 to Cluster 2
  • New Centroids Calculation:
    • Cluster 1: A1
    • Cluster 2: Mean of A3, B1, B2, B3, C2
    • Cluster 3: Mean of A2, C1

Following Iterations

  • Recalculate distances using new centroids.
  • Reassign clusters: Based on the smallest distance.
  • Update centroids as necessary.
  • Continue until cluster assignments do not change.

Convergence

  • Final clusters established when assignments stop changing.
    • Example final clusters:
      • Cluster 1: A1, B1, C2
      • Cluster 2: A3, B2, B3
      • Cluster 3: A2, C1

Conclusion

  • K-Means Algorithm effectively partitions data points into clusters using iterative distance recalculations and centroid updates.