📊

K-Means Clustering Algorithm

Jul 19, 2024

K-Means Clustering Algorithm

Introduction

Goal: Divide dataset into clusters using K-Means and Euclidean distance.
Data Points: A1, A2, A3, B1, B2, B3, C1, C2.
Initial Centroids: A1, B1, C1 (provided)
- If not provided, any points can be chosen as initial centroids.

Steps to Apply K-Means Algorithm

Calculate distances: Use Euclidean distance formula to compute distances from data points to centroids.
- Euclidean Distance: ( \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} )
Assign Clusters: Select the smallest distance to assign points to clusters
Update Centroids: Calculate the new centroids for each cluster.
Repeat: Recalculate distances with new centroids and reassign points until convergence.

Distance Calculation Example

Formula: [ \text{Distance} = \sqrt{(x2 - x1)^2 + (y2 - y1)^2} ]
- Example: Distance between A1 (2, 10) and centroid (2, 10)
  - [ \sqrt{(2-2)^2 + (10-10)^2} = 0 ]
- Example: Distance for another point calculation:
  - [ \sqrt{(2-2)^2 + (10-5)^2} = 5 ]

First Iteration

Clusters assignment based on distances
- A1 to Cluster 1
- A2, C1 to Cluster 3
- A3, B1, B2, B3, C2 to Cluster 2
New Centroids Calculation:
- Cluster 1: A1
- Cluster 2: Mean of A3, B1, B2, B3, C2
- Cluster 3: Mean of A2, C1

Following Iterations

Recalculate distances using new centroids.
Reassign clusters: Based on the smallest distance.
Update centroids as necessary.
Continue until cluster assignments do not change.

Convergence

Final clusters established when assignments stop changing.
- Example final clusters:
  - Cluster 1: A1, B1, C2
  - Cluster 2: A3, B2, B3
  - Cluster 3: A2, C1

Conclusion

K-Means Algorithm effectively partitions data points into clusters using iterative distance recalculations and centroid updates.

Full transcript