Transcript for:
Understanding Hierarchical Mean Clustering

Hello all, today we will be discussing about hierarchical mean clustering intuition. We will try to understand what is the maths behind hierarchical clustering. In our previous video, we have already seen something about k-means clustering algorithm. But in this particular video, we will try to understand what is the maths behind hierarchical clustering. And hierarchical clustering is one of the very good unsupervised machine learning. that works similar to a k-means clustering only the technique is completely different so this is my hierarchical clustering just let me remove all these things so this is my hierarchical cluster now for this hierarchical clustering how it works is that as i said that this is my unsupervised machine learning technique this is an unsupervised machine so initially i'll be having some points so this point suppose i mark it as in this black points right now you can see that i just have six points over here now what does what does it happen with respect to the six points in a hierarchy cluster first of all we try to find out each and every points are basically specified as different clusters initially so these all are different clusters okay now what will happen is that we'll try to find out the two nearest point or the two nearest cluster in this since we are considering each and every point as a single cluster so suppose these two points are very nearest and these two points i want to specify in the right hand side and another diagram which is called as dendogram this is basically called as dendogram in the dendogram in the x-axis i have points in the y-axis i have distance now if i am considering this two point and i name it as p1 and Suppose if I draw P1 and P2 and suppose the distance between this is 0.5. What I'm going to do is that I'm going to combine this and this will be 0.5. So this is one type of dendogram that we have created for P1 and P2. Then after that, we'll try to find out the distance between this cluster and this point and try to find out which are the next nearest point. And from that, I found out that these two are the nearest point. here I get specify this as P 3 and P 4 right so what I'm going to do is that I'm going to define another point that is P 3 and P 4 and suppose this distance is somewhere on 1 I'm going to combine this and this distance will be now currently I've got two groups still they are points and this again we have found out that these are the next nearest one so suppose I make it as P 5 P 6 then again I will be having something like E5 and P 6 suppose I calculate the distance somewhere here. It is coming around 1.5 Because this is one point. I have combined this then finally I'll try to find out which will be the nearest clusters to this and suppose I found out that these two clusters are very very near What I'm going to do. I'm going to combine these two clusters Where I have points like P 3 P 4 P 5 pieces I can combine this together like this where my distance is actually 2 Finally, I'll combine this whole group as one cluster where I'll include P1 and P2 because these are my next nearest one Suppose this is 2.5. This P1 and P2 will get connected here So this is how my dendrogram looks like right, but what is the main aim? Just remember that this is an unsupervised Machine learning technique and our main aim is to basically find out that what should be the exact number of clusters Should I use in order to classify my point properly? not classify instead group my points properly clustering works basically on the similarity of the data right similarity of the data is basically that we are calculating over here with respect to the distance and this distance is measured by something called as euclidean distance in my previous video also i have discovered about euclidean distance this formula is basically given by if i have two points x1 y1 and x2 y2 i can give value I can give it by square root of x2 minus x1 whole square plus y2 minus y1 whole square. So this will be the distance between two points P1 and P2, which is represented by x1, y1 and x2, y2. Now the next thing is that how can I find out the number of exact clusters that I need to classify this problem? So there is a simple hack that is used. This hack is nothing that we need to find out how many groups. We need to find out the longest vertical line such that none of the horizontal line passes through. so i can't consider this particular line because there is a horizontal line that passes through this particular point similarly i can't consider this line also because there is a horizontal line passing through this point okay so similarly i can't consider this i can't consider this i can't consider this also because there is a horizontal line passing through it it's a horizontal line now i can consider this particular line because this may be the longest line compared to all the others so what i do is that i'll draw a straight line that passes through this particular point and i'll try to see how many points it is passing through so there are two points that is passing through so i can specify that the number of clusters that i can use for this problem is basically and this is the hack basically used for dendro and this is just a rough diagram your clusters may also get changed to three that depends on how many points it is passing as soon as i find out the longest vertical and once i am able to do this i'll use an sk learn and try to use this article clustering and i'll be able to group this whole data into two clusters based on similar based on euclidean distance so this was the whole idea about hierarchical clustering I hope you like this particular video guys and please subscribe the channel if you are not already done and Keep learning. I'll see you up in the next video. Thank you one and all have a great day