Transcript for:
K-Means Clustering Algorithm

welcome back in this video I will discuss how to use k-means clustering algorithm to divide the given data set into different clusters in this case we have been given eight data points A1 A2 A3 B1 B2 B3 C1 and C2 we need to use equilibrium distance and the K means clustering algorithm also we have been given initial centroids that is A1 B1 C1 should be considered as the initial centroids considering this we need to divide the data set into different clusters over here if initial centroids are not given you can select any of these particular points as the initial centroids and then you can continue applying the cayman's algorithm over there now these are the data points given to us given this particular data points uh we need to calculate the distance from these particular data points to the initial centroids initial centroids are A1 B1 and C1 so I will write that particular A1 B1 C1 over here so what we need to do is we need to calculate the distance from this particular A1 to this particular 2 comma 10 A2 to 2 comma 10 A3 to 2 comma 10 and so on now the next question comes in front of us is how to calculate the distance between these data points to these centroids to calculate the distance we have to use the equation distance as stated in the problem definition the euclidean distance between two points that is P1 and P2 is always equivalent to square root of X2 minus x 1 bracket square plus Y 2 minus y 1 bracket Square over here now I will discuss how to calculate the distance between these data points to this centroid here according to this particular formula let us assume that this particular column is X1 here and this one is y1 this is X2 and this is Y2 now what we can do over here is we need to put this particular values in this particle equation so that we will get the distances here for this point data point A1 over here the X1 is 2 and y1 is 10 x 2 Y2 will remain same for this particular column here now x2 in this particular case is 2 and X1 is 2 2 minus 2 bracket square plus this will be bracket Square over here and then this one that is Y2 is 10 and y1 is also 10 over here if we calculate this particular thing it will become 2 minus 2 bracket square plus 10 minus 10 bracket square that is 0 so the answer will become 0 here so this will become 0. similarly I will show one more example here this is another example now again x 2 is equivalent to what again X2 is equivalent to 2 we have to take the square root of over here to minus X1 again it is 2. bracket square plus Y 2 Y 2 is 10 in this particular case so 10 minus 5 that is y1 is 5 over here so that is 5 so 10 minus 5 bracket Square come 2 minus 2 bracket square that is 0. uh 10 minus 5 bracket square is equivalent to I think it is 10 minus 10 is uh 10 minus 5 is 5 5 square is 25 square root of 25 is again is equal to 5 over here so this particular data point will become 5 in this case so similarly we need to calculate the distance is over here so once you calculate the distances you will get all these values similarly we need to calculate the distance from this particular data point to this centered also as well as to this particular centroid so once you calculate you will get these particular values once the distance calculation is over next what we need to do is we need to assign this particular data points to one of these particular clusters so what we need to do is we need to see the distances and the one which is having the smallest value to that we can assign the cluster let us say that this is the cluster number one this is cluster number two and this is cluster number three if you look at this particular distances this is the smallest one so this particular thing will be assigned to First cluster if you look at this particular data point the distance is is 5 4.24 and 3.16 so definitely 3.16 is smallest so we'll assign it to the third cluster and the same thing will be done for all the data points over here so after first iteration we will get this particular assignment over here so first data point that is A1 is assigned to one if you look at this one A3 B1 B2 B3 and then C2 is assigned to Second cluster and A2 and C1 is assigned to third cluster over here now once these assignments were done now we need to calculate the new centroids here for this particular first cluster we have only one data point so the centroid will remain same but when it comes to second one that is the second cluster has one two three four and five data points so we need to take the we need to calculate the central area that is 8 plus 5 plus 7 plus 6 plus 4 divided by 5. similarly this one that is 4 plus 8 plus 5 plus 4 plus 9 divided by 5. and once you do this birthday thing for second cluster we need to do it for the third cluster also third cluster has only two data points that is two plus one divided by two five plus two divided by 2 that is the last one so once you do the calculation you will get these are the new centroids over here once you get this particular new centroids we need to consider these centroids as the current centroids and then we need to continue from here onwards so I will make this as a current centroids now we need to calculate the distance from these data points to this particular centroids here again I will write this particular centroids over here and then we need to calculate the distances here so same formula we need to use and then we need to calculate the distance so once you calculate the distance again when you have to check which one is the smallest one out of these particular three distances so this one is the smallest one so it will be assigned to First cluster here between these three this is the smallest one so it will be assigned to third cluster and same will be repeated for all the data points here now this is how the assignment takes place this is called as a new cluster over here now if you look at this particular thing in the previous case this C2 was assigned to Second cluster but now it is assigned to First cluster the meaning is a data point has moved from one cluster to other cluster here so this is not it you can say that converged here we need to calculate the new centroids again here so how to calculate the new centroid again in first cluster we have two data points that is A1 and C2 here so we need to calculate the centered of these two things that is 2 plus 4 divided by 2 10 plus 9 divided by 2 that will be the new centroid here similarly we have to do it for second cluster and third cluster over here so we will get these three new centroids here so once you get this particular new centroid we will consider these as a current centroids before that this new clusters will be commensated over here so that is what I have done and then we will consider this as a current centroids from this particular data points to this particular current centers we will calculate the distance is here so once you calculate the distance again we have to start assignment like between these three which is the smallest one that is is definitely 1.12 is the smallest one between these three this one is smallest one so this will be assigned to First cluster third cluster and so on so this is how the assignment will look like now again if you look at here previously this particular data point that is B1 was assigned to Cluster number two but now it is assigned to Cluster number three cluster number one here the meaning of this one is again a data point is moved from one cluster to other cluster so we need to calculate the new centroids here so once you calculate the new centroid you will get something like this and then this new cluster will become the current cluster here and then we need to consider this particular new centroid as the current centroids and then we need to calculate the distance to this particular centroids here so again once you calculate this particular distances we have to assign this particular data points to one of these particular clusters over here in this case this particular data point has 1.94 7.56 and 6.52 as the distance is so this is the smallest one so it will be assigned to 1 here the same process has to be followed for all the data points after the cluster assignment it looks something like this and if you compare the previous assignment and the current assignment both are exactly same here it shows that all the data points have converged to this particular new clusters here so once that is done we need to write down the final clusters over here the final clusters are this particular A1 belongs to First cluster B1 belongs to First cluster and C2 belongs to flux cluster over here similarly this A2 and C1 belongs to third cluster remaining that is A3 B2 and B3 belongs to Second cluster over here so this is how actually we can easily use the k-mins clustering algorithm with the euclidean distance to divide the data points into different clusters here I hope the concept is clear if you like the video do like and share with your friends press the Subscribe button for more videos press the Bell icon for regular updates thank you for watching