Understanding AVL Trees and Operations

all right welcome back to data structures land today we continue and complete our segment on binary trees so this is part two if you missed part one go back and watch part one um last time we talked about binary trees in general uh we had uh each node stored an item uh which and also a left pointer and a right pointer to other nodes and a parent pointer to another node uh this was an example of a tree B and C or A's children a is the parent of B and C and also the root of the entire tree we defined the height of a node we didn't use this too much yet but we're going to use it a lot today so remember the height is as drawn in red here um height of the node is the length of the longest downward path counting edges so B for example has a length two path so we write a two here you can also think of it is if you just live within the sub tree rooted at b b's subtree uh then uh what is the maximum depth of those nodes if you prefer to think about it that way either way is fine um and in particular we distinguished H the height of the root node as the height of the entire tree and what we achieved last time was basically all of our operations ran in order H time so um we had subtree insert subtree delete subtree first and last we could compute the predecessor and successor of a node all in order H time so as long as H was small uh we were happy um and remember what does predecessor and successor mean it's talking about an implicit order in the tree which is what we call traversal order which is defined recursively as recursively Traverse left sub tree then output the root then recursively Traverse the right sub tree so in this example the traversal order is f is the if you go all the way left that was the first in the traversal order uh then we have right make some space here uh then we have D then we have B then we do the right sub tree of B which is e then we have the root because we finished the left sub tree of the root so that's a and then we have C okay so there's an implicit linear order encoded by this tree and the whole point of binary trees is that we can efficiently update the tree uh much faster than we could explicitly write down an order in an array or something like that so binary trees let us quickly Now quickly is not so quick right now because everything is order H and in the worst case h is linear because we could have a tree like this but today we're going to make we're going to guarantee that H is log n and so the goal of today is to take all of these operations that run in order H time and get them to run in order log n time just by modifying the data structure we've already seen so we've done a lot of the hard work just a little bit more work we need to do today in something called AVL trees or height balance okay but before we get there I want to talk a little bit more at the very end of last leure we talked about once you have these subtree operations so I can insert and delete into subtree um how do I actually use that to solve the problems that we care about in this class which are sequence data structure and set data structure so um we talked mostly about the set data structure last time so in general we're going to Define what traversal order we maintain by a binary tree and so for a set um because we're for the set interface we're interested in doing uh queries like find next and find previous given a key if it's not there tell me the previous one or the next one this is something we could do with binary search and so the the big cool thing that binary trees let us do if we let the traversal order always be uh all of the items stored in increasing key order then we are effectively maintaining the items in order in in the traversal order sense again we're not explicitly maintaining them in order but up here we're maintaining a tree that that represents items in key order and so this lets us do uh a sub tree find operation which you could easily use to implement find and find previous and so on um as follows we start at the root of the tree so we could say node equals root initially and then we can recursively search for a key K as follows we check well if the item at the root has a key that's bigger than K let me draw a little picture uh so we're at some node here this is a node and it has a left sub tree and a right sub tree and there's some item with some key so if the key we're looking for is less than the nodes item that means it's down here in the left sub tree and so we recurse on node. left if they're equal that means that this item is the item we're looking for so we can just turn it or the node depending on what you're looking for and if the key in here is greater than the key we're looking for then we'll recurse to the right this if you think about it a little bit this is exactly binary search on an array it just happens to be on a tree instead if you think of an array um like this B where does binary stretch do it first looks at the key in the middle I'm going to draw that as the root uh and then uh it recurses either on the left chunk which I will draw recursively or on the right chunk and so if you happen to have a perfect binary tree like this one it is simulating exactly uh binary search in this array but this we're going to be able to maintain dynamically not perfect anymore but close uh whereas this we could not maintain in sorted order okay so uh this is like a generalization of binary search to work on trees instead of on arrays and for this reason set binary trees are called binary search trees because they are the tree version of binary search so there's many equivalent names so binary search tree is another name for set binary tree and the key thing that makes this algorithm work is so-called binary search tree property which is um all the keys in the left sub tree of a node are less than uh the root and all or of that node and that key that key is less than all the keys in the right sub tree and this is true recursively all the way down and so that's that's how you prove that this algorithm is correct by this property why is this true because uh if we can maintain traversal order to be increasing key then that's exactly what traversal order means it tells you all the things in the left sub tree come before the root which come before all the things in the right sub tree so uh this property implies this one and how do you maintain things in increasing key order it's pretty easy uh if you want to insert an item where does it belong well you do this search to find where it would belong if it was there if it's there you can overwrite the values stored with that key if it's not um this search will fall off the tree at some point and that's where you insert your uh a new node in your tree okay that was discovered in recitation so I don't want to dwell on it what I want to focus on today is the other application how do we this is for representing a set which is relatively easy uh a CH we sort of set ourselves up for but we need a little more work is to make sequence binary trees so suppose I have a binary tree and what I would like mentioned at the end of last time is that I want the traversal order of my tree to be the sequence order the order that I'm trying to represent that's changed by operations like insert at and so I just like to do the same thing but now I have to think about how do I do a search how do I do insert at that sort of thing and here is an algorithm for what I would like to work but it's not going to quite work yet so suppose I give you a sub tree so specified by a node so there's all the descendants of that node and I'd like to know what is in the traversal order of that subtree which starts here and ends here and the root will be somewhere in the middle give me the node so this if I ask I equals z i want to get this leftmost descendant if I ask for I equals the size of the tree minus one I want to get The rightmost Descendant that was the first and last in the sub tree that we talked about but we don't we know how to find the first and last just walk left or walk right but we don't know how to find the I node in order H time is the goal right now not log in and the idea is well it seems like size matters sorry if you heard otherwise um so in particular I I mentioned size when I was talking about the the last node in the sequence that the index of that node is size of this subtree minus one so let's define the size of a node to be uh the number of nodes in its sub tree we're calling that subtree paren node okay including the node itself so if I somehow knew the size this seems important for understanding indexes let's just assume that I knew that magically in constant time okay uh then I claim that the size of the left sub tree so why don't I expand this diagram a little bit so we we have node as before but we have a left sub tree and a right sub tree so this node here is node. left this node here is node. right they might not exist but let's ignore those exceptional cases for now let's suppose we knew not only the size of node but we knew the size of node. left so that is the size of this tree on the left I'm going to call that NL so let's suppose that there are NL nodes down here why I claim that lets me do the equivalent of a binary search cuz I'm looking for some index I and if I is less than NL then I know that it must be down here for example if I equals z it's going to be in the left sub tree as long as NL is greater than zero right so uh that's exactly this check if I is less than NL I'm going to recurse to the left call subt tree at of no do left comma I that's what's written here um if I equals NL if you think about it for a second so NL is the number of nodes here and so that means this node has index NL the index of this node is NL and so if I equals the if the one index we're looking for is that one then we just return this node we're done and otherwise I is greater than NL and that means that the node we're looking for is in the right sub tree because it comes after the root again that's what it means that's what traversal order means so if we Define it to be sequence order uh then we know all the things that come after this node which is index NL must be over here now when we recurse down here our numbering system changes because out in for node zero is here and then for node. right Zer is here so we need to do a little bit of subtraction here which is when we recurse to the right we take IUS NL minus one minus NL for these guys minus one for the root node um and that will give us the index we're looking for within this sub okay so it's my point is this algorithm is basically the same as this algorithm but this one uses keys because we're dealing with a set and in sets we assume items have keys over here we items don't have to have keys in fact we're not touching the items at all we're just asking what is the I item in my sequence which is the same thing as what is the I item in my traversal order which is the same thing as asking what is the I node in the traversal order and this algorithm gives it to you exactly the same way in order H time okay now I'm not going to show you all the operations but you can use subt tree at to implement uh get at and set at you just find the appropriate node and return the item or modify the item uh or you can use it to most crucially you can use it to do insert at and delete at this is a new thing we've never been able to do before uh what do you do just like over here if I'm trying to insert an item I search for that item over here uh so if I'm trying to insert at I for example I look for I and then uh for insert at I I want to add a new item just before that one and conveniently we already have I didn't mention but we have a subtree insert we had two versions before and after I think we cover it after which is success before use predecessor uh but we can just call subtree insert before at that node and boom we will have added a new item just before it um and great so magically somehow uh we have inserted in the middle of the sequence and all of the indices update because I'm not storing indices instead I'm to search for an item at index I I'm using the search algorithm but there's a problem what's the problem this seems a little too good to be true I insert in the middle of this tree and then somehow I can magically search and still find the I item even though all the indices to the right of that item incremented by one it's almost true question answer yeah because we have to update the SI because we have to update the sizes right I didn't say how do I compute the size of the left sub tree so that is the next topic we're almost done this this will actually work it's really quite awesome but uh for it to work we need something called sub tree augmentation which I'll talk about generally and then we'll apply it to size so the idea with subtree augmentation is that uh each node in our binary tree uh can store uh constant number of extra Fields why not and in particular if these fields are of a particular type or maybe I'll call them properties I'm going to define a a subtree property uh to be something that can be computed from the properties of the nodes children so I should say this is of a node so children are node. left and no. right uh you can also access constant amount of other stuff for example the node itself uh but the point of a subtree property is it's downward looking if I have a node here and I want to compute some property about it call it uh we want to store P of the node and suppose we already know p over here the property computed for the left subtree or for the left node and we already know the property for the right node then uh what I'd like is for this to be computable in constant time so I can compute P of this node given P of the left node and P of the right node that's a sub tree property now in particular size is a sub tree property why because I can write this kind of recurrence uh node. size equals no do left do size is very tedious to write plus no do right. size plus one thank you size of the sub of the entire sub tree here called node um is the size of the left sub tree plus the size of the right sub tree plus one for that node itself okay so this is an update rule takes constant time to evaluate it's two additions sorry my T's look kind of like plus signs make the pluses a little bigger okay so we're summing those three things boom we can get node do size so I claim that uh if as long as my property has this feature I can maintain it dynamically as I'm changing the tree okay now this is a little bit of a forward reference because we haven't said exactly how we're going to change the tree yet but uh question if um no SI is recursive yeah how is it happen would okay why is this okay good question so one natural way you can think of this as a recursion which gives you a recursive algorithm uh so I wrote but I didn't write it but I could have written size of node equals this size of node. left plus and that would give you a linear time algorithm to count the size and if you don't have any information that is what you would do and that would be very painful so that would would make this algorithm really slow if I'm calling sizes a recursive function it's bad uh what I'm instead doing is storing the sizes on every single node okay and precomputing this so in fact I'm going to Define uh size of node in so this is the definition mathematically but the algorithm for this function is just going to be return node. size okay so that's constant time so I the challenge now is I have to keep these sizes up to dat no matter what I do to the tree and you could look back at last lecture and see okay what were all the changes that I did in a tree uh we only did changes during insert and delete and I will just claim to you when we did uh insert and delete what they ended up doing uh in the end they add or remove a leaf of the tree okay remember Leaf was a node with no children um so let's just think about if I add a new Leaf in a tree here's a tree suppose I add a leaf here which sub trees change well which sub trees contain that node it it is its own new sub tree then its parents it's in it's in its parents sub tree and its grandparents sub tree and and the overall sub tree okay in general these nodes are called the ancestors of this node that we added and those are the ones that update this sub tree over here didn't change it didn't change size and because it's a sub tree property no subtree property will change over here because the sub tree was untouched okay so um when I touch this guy I just have to update the sub tree property here update the subtree property here update the subtree property here how many of these are there yeah H I'll say order H to be safe but I think it's exactly H um so also when I remove a leaf the same thing if I remove this Leaf then the the sub trees that change are exactly its former ancestors uh cool so um we're going to update those order H uh ancestors um in order up the tree so what do I mean by update I mean apply this rule for sub for size it's this rule but for in general the subtree property gives me an update rule that takes constant time and so I'm going to apply that update rule to this node which will fix whatever property is stored in there maybe there's more than one property and then I'll apply it to this node and because this is already correct by induction and this is already correct because I didn't touch this subre it's unchanged then I can update the value at this node the property at this node in constant time then I update this one and because this one is already correct by induction and this one is already correct because this sub tree is unchanged I can update the property correctly here in constant time okay so when I make a change in order H time because I'm making H calls to this constant time algorithm I can update a constant number of sub tree properties this is very powerful data structure argumentation is super useful you will use it on your problem set we will use it again today let me give you some examples of sub tree properties they could would be uh common ones are like the sum or the product or the Min or the max or sum of squares or all sorts of things of some feature of every node in subtree okay in fact uh sub tree size is an example of such a thing it is the sum over all nodes in the sub tree of the value one okay that's another way to say count the number of nodes okay but you could also say what's the sum of the keys in these nodes or you could say what's the maximum key in these nodes or you could say uh um what is the maximum value in these notes you can take any property it doesn't have to be key it doesn't have to be anything in particular it's very powerful you can take all sums products and Main maintain them as long as they're downward looking as long as you're only thinking about the sub tree okay uh some examples of things you cannot maintain are not a nodes index so if you get a little bit too excited about augmentation you might think oh it could do everything right I needed to support subtree at or let's just say uh get at globally I wanted to know what is the I node in my tree well I'll just use data structure augumentation and store in every node what is its index zero through n minus one I can't maintain that efficiently because if I insert at the beginning of my traversal order then all the indices change right so that that's an example of a of a edit so if I insert a new Noe over here so this guy's index was Zero now it's one this guy's index was one now it's two this was two two now it's three and so on every node changes its index index is not a subtree property and that's why we can't maintain it because it depends on all of the nodes in the tree or it depends on all the nodes to its left all the predecessors so for example this guy's index depends on how many nodes are over here on the left which is not in the sub tree of that Noe so that's where you have to be careful don't use Global properties of the tree you can only use subtree sub tree properties okay another example is uh depth depth is annoying to maintain uh but it's not obvious why yet we will see that in a moment okay the rest of today is about going from order H to order log n which is what this slide is showing us so at this point you should believe that we can do all of the sequence data structure operations in order H time except for build and iterate which take linear time uh and that we can do all of the set operations in order H time except build and iterate which Take N log n and n respectively and our goal is to now bound uh H by log n we know it's possible at some level because there are trees that have logarithmic height that's like this Perfect Tree here but we also know we have to be careful because there are some bad trees like this chain so um if H equals log n we call this a balanced binary tree there are many balanced binary trees in the world maybe a dozen or two a lot of different data structures question hi you um you said like not to think about things on a global level think of them at a sub level can you explain what that okay what does it mean for property to be uh local to a sub tree versus global the best answer is this definition but that's maybe not the most intuitive definition this is what I mean something that can be computed just knowing information about your left and right children that is the meaning of subre property and those are the only things you're allowed to maintain because those are the only things uh that are easy to update by walking up this path and the contrast is that a global property like index I it's Global in particular because I can do one change add one no node and all of the nodes properties change so that's an extreme example of global we want this very particular notion of local uh because that's what we can actually afford to recompute hopefully that clarifies yeah doesn't size you're right that if we added oh no let's okay let's think about that if we added a new parent to the tree this is not something that we've ever done um but even if we did that which sub trees change only this one right this node it's it's a new totally new sub tree but the sub tree of this node is completely unchanged because sub trees are always downward looking if I added a new root um I didn't change any sub except for one so size is a sube property now there are I mean I can completely redraw the tree and that's an operation that requires everything to be recomputed okay so it is limited exactly what I'm allowed to do in the tree but I claim everything will do last class and today uh we can afford this augmentation so it's a feature not of all binary trees necessarily but the ones that we would cover yeah what is a Min binary binary tree yeah okay let's uh this will make a little more sense in a moment when I say what we're actually going to do with trees we uh we need a new tool for manipulating a tree what we've done so far we've done some swapping of items and we did adding removing a leaf that's not enough we're going to need something else to let us guarantee logarithmic height and that's something else is called a rotation what does this something else need to do this is just a tool for rebalancing the tree so it should not change the data that's represented by the tree what is the data represented by the tree traversal order traversal order is sacran where not allowed to touch it it's already defined two different ways depending on whether you're using a set or sequence so we want to modify the tree in a way that doesn't modify the traversal order so this we're exploiting redundancy if you wrote down the traversal order in an array there's exactly one representation of a given order but in a tree there's many representations it could be a long line it could be a balanced thing they could represent the exact same order on the nodes if you label them right okay in fact they're exponentially many different representations of the same thing and we're going to exploit that the same order and Define this is just a thing you need to know me label a x b y c you can tell I've drawn this diagram before many many times this is a very powerful tool in all tree data structures which are most of data structures uh and they are called right rotate of Y and left rotate of x uh so if I have this tree which I'm just blackboxing some of the sub into little triangles if I have a node and it it has a left child then I'm allowed to right rotate this Edge which means take this Edge and go like this 90 Dees kind of uh or you could just think of it as rewriting it this way now you might also keep in track of the parent pointer parent pointer uh moves around before this was the parent of Y now it's the parent of X okay so we X and Y are switching places but if we we couldn't just swap these items around that would change traversal order in this picture X comes before y because X is in the left sub tree of Y in traversal order and over here now Y is in the right sub tree of X so it comes after X so in both cases X comes before Y and indeed in all of these pictures uh the traversal order I mean not just for X and Y but also for a b and c the traversal order is consistent it is a uh x b y c where when I write a triangle I mean recursively the traversal order of all the things in the Triangle so if you just apply the traversal order algorithm here versus here you get the same output which means these these operations preserve traversal order great so this is a thing that we can do in a tree that won't affect any of the stuff we've done so far it's a tool that we can used to rebalance notice um you know how deep things are in the tree changes our problem with this linear tree is that there are some nodes of linear depth we want to get rid of those how we could take these edges and start rotating them up if you look at depths um you know in this picture A and B are deeper than C and in this picture uh B and C are deeper than a so you know there a trade-off this one moved up this one moved down this one stayed at the same depth so hopefully if if a is too deep and C is too shallow uh they can trade off like this okay it may sound difficult but in fact there's a pretty simple way uh which are called AVL trees uh that maintain balance in a particular way called height balance this is uh if we take the height of no dot left uh actually I'd prefer to no do right minus height of node do left so this thing is called um skew of the node I want this to always be minus1 0 or plus one okay so this is saying that if I have any node and I look if it's left sube and it's right sub tree I measure their heights remember that's downward dist to maximum distance to a leaf I measure the height of this tree maximum height I measure the maximum height of this sube I want these to be within one of each other maybe ideally they're equal that would be the perfect case but let's let them differ by one so maybe this is K and this is k + one or maybe this is K and this is K minus one what's the in this picture what is the height of this node it's good practice k+ 2 good what's the longest path from this node to a leaf well it could go through this sub tree and that would be length k+ 1 cuz it's K in here plus one for this Edge or it could be through here and that's k+ 1 plus one so the biggest is to go to the right so the height if I told you the height of these sub trees we can derive the height of this note we're going to use that a lot in a moment so first claim is that if I could maintain height balance then I can uh I will guarantee that H equals log n so in other words height balance implies balance so let's prove that first quickly and then the interesting part is how do we actually prove or how do we actually maintain the balance property we're going to do that using rotations but how is a big question okay so why does height balance imply balance so uh what this is saying is that all height balanc trees have logarithmic height so what I'd like to think about is sort of the Le least balanced height balanced tree uh the least balanced one is going to have at every node a mismatch the the left let's say the left sube is shallower than the right subtree by one and recursively all the way down so every node has a a Gap a uh what do we call it uh a skew of one okay which I'm going to write I'm going to introduce some notation I'll write a descending rightward Arrow if this one is is higher than the left sub tree uh so the easy way to think about this is this is sort of the our worst case this is going to be the the fewest nodes for the maximum depth let's just count how many nodes are in this tree I'm going to write that as a recurrence um which is the number of nodes in a tree of height H so if this whole tree has height h H uh as we said in this picture if I just subtract two from all these numbers then this one has height uh H minus 2 and this one has height H minus one okay so how many nodes are in here well this is a recurrence I'm going to write so this will be N Sub h-2 this will be N Sub hus1 and then I just count how many nodes are in this picture it is n subh -1 plus n subh - 2 + 1 or this note now you might ask what is NH or recurrence for uh but it is the number of nodes in this sort of worst case uh for if the worst case has total height H so you can also think of it as what is the minimum number of nodes I could have in an AVL tree which is a height balance tree uh that has height h in a height balanced tree okay so now I just need to solve this recurrence this recurrence look familiar is it's like Fibonacci numbers if I remove the plus one it's Fibonacci and if you happen to know the Fibonacci numbers grow as like a golden ratio to the N then we know that this is exponential which is what we want because if NH is exponential in h that means H is logarithmic in N because log is inverse of exponential okay but maybe you don't know about Fibonacci numbers and uh so we can still uh easily show that this is exponential as follows I want to prove that it's at least an exponential because that gives me that H is at most logarithmic um so we need a lower bound and so we have these two terms which are hard to compare n subh minus 1 and N Sub hus 2 it's kind of ugly uh but if we're allowed to be sloppy and we'll see if we're not too sloppy and still get an exponential answer um let's just make them equal uh like so um so this is a true statement in fact strictly greater than uh why because so I removed the plus one that should only make something smaller and um I replaced n subh minus1 with n subh minus 2 here I'm implic using a fact which is obvious by induction that uh this tree on height if I take this tree versus this tree this one has more nodes than this one right if I have larger height this construction is going to build a bigger tree at least as big doesn't need even need to be strictly bigger so certainly n subh minus one is greater than or equal to n subh minus 2 now this is 2 * n subus 2 and this is an easy recurrence this is just powers of two right I keep multiplying by two and subtracting two from H so this solves to 2 to the h 2 maybe with a floor or something okay but uh it is I'm using a base case here which is N Sub 0al 1 uh maybe it's a ceiling then but the point is this is exponential so this implies that the height is always at most 2 * log n uh this two corresponds to this to if you just invert this formula this was number of nodes is going to be at least uh 2 the H2 and so H is at most to log n so it's not log n that would be perfect but it's within a factor of two of log n so AVL trees are always quite balanced number of levels is at most double what you need to store end nodes great we're left with the main Magic not domain magic that's different uh and let's see we're going to use subt tree augmentation keep that big remaining challenge is how do we maintain this height balance property using rotations we have all the ingredients lined up for us we have sub Tre augmentation what does that let me do that's relevant to AVL trees well it lets me store height right I need to be able to compute the height of a node that in general takes linear time because I have to look at all the downward paths all the leaves within that sub tree but height is a sub tree property so uh yes height why because uh let me just write it here node. height equals 1 + Max of node. left. height and no do right. height and of Max let me put this in a box this equation or I guess it's an assignment operation this is a one is the thing we've been doing over and over when I said what is the height of this node you just figured that out right you took the height of the left sub tree maxed with the height of the right sub tree and added one for to count for these edges okay so this is a general update rule it matches this subtree property pattern if I have the property of left and right I can compute it for node and this takes constant time to do and so it's a sub tree property and so I can maintain through all the things I'm doing the height of every node oh by the way whenever I do a rotation I'm also going to have to update my sube properties when I rotate this Edge a does not change B does not change C does not change so that's good but X's sub tree changes it now has why it didn't before so we're going to have to also uh update the augmentation here Y and we're going to have to update the augmentation in X and we're going to have to update the augmentation of all of the ancestors of X eventually um so rotation is locally just changing a constant number of pointers so I usually think of rotations as taking constant time uh but eventually we will have to do this is constant time locally uh but we will need to up uh H ancestors in order to store all of keep all of our augmentations up to date okay we'll worry about that later all right so great now we have the height of all the nodes we can compute the SK skew of all the nodes cool uh we have this rotation operation and we want to maintain uh this height balance property height of left node left and right of every node is within is plus or minus one or zero um cool so uh I said over here somewhere whenever we so the only things that change the tree are when we insert or delete a new node and it and the way that we implemented those so far is to add or remove a leaf so we should still be thinking about adding or removing a leaf the problem is when I add a new Leaf now maybe this tree is higher than it used to be so um some node here may no longer be height balanced but because height is a subtree property the only nodes we need to check are the ones up this ancestor path so and there's only log n of them because now height is log n that's what we just proved as long as we have this property now we right now don't have it for like maybe these few notes but it will be it was log n before it's at most log n 2 log n plus one right now cuz we just added a Noe so what I want to do is check all of these ancestor nodes in sequence from bottom up and find one that's out of balance so let's take the um lowest uh outof balance node I'm going to call that X now because we just inserted or deleted a single leaf it's only out of balance by one right because we only changed height one height went up by one or one height went down by one and before all of our SKS were plus or minus one or zero so now it's going the bad case is when it's plus or minus two if it happens to still be in this range for all the nodes we're happy but if it's outside this range it's only going to be out by one uh so this means the skew is in plus 2 orus 2 and let's say that it's two by symmetry so um my picture is I'm going to draw double right arrow to say that this sub tree is two higher than this sub tree okay so that's bad and we want to fix it the obvious thing to do is to rotate this Edge because that'll make this this is too this is too high and this is too low so if we rotate this should go down by one and this should go up by one and that works most of the time so case one is that skew of Y what is y I want y to be the right child of X because we have positive skew we know there is a right child now uh because this was the lowest bad we know that Y is actually good it's either right heavy or even the two sub trees have the same height or left heavy the easy cases uh are when skew of Y is either one or zero which I will draw all right so double right arrow let's say single right arrow so this uh I'm just going to add some labels here to make this picture consistent uh k + one K plus two I'm writing the heights so this is an example where C is is taller than b uh B A and B are the same height and then if you compute the heights up here indeed uh this one is right leaning this one is doubly right leaning this one has height K plus one this one has height K minus one that's bad but if we do this right rotation on X we get exactly what we want so I'm just going to copy the labels on ABC we have k minus1 k minus one and K and then recompute that means this guy has height K this one has height k + 1 and now all the nodes in this picture that I've highlighted A and C haven't changed they were height balanced before they still are um but now X and Y X wasn't height balanced before y was now both X and Y are height balanced okay that's case one in case two the SK of Y is flat which means that this is a k and this is a k and this is a k + one and this this is a K plus 2 okay but still all the nodes are balanced height balanced they're still plus or minus one so those are the easy cases unfortunately there is a hard case case three but there's only one and it's not that much harder so it's when skew of Y is minus one in this case we need to look at the left child of Y uh okay and to be alphabetical I'm going to rename this to Z sorry so this one again is double right arrow this one is now left arrow and this is letter Y and so we have a b c and d potential sub trees hanging off of them and I'm going to label the heights of these things these are each K minus one or K minus 2 this one's K minus one and now compute the inside so this is going to be height k for this to be uh left leaning so this is k + 1 and this is k + 2 okay but the problem is this is two higher than this the height of Z is two higher than the height of a in this case if I do this rotation things get worse actually I'll just tell you the right thing to do is this is the one thing you need to memorize and let me draw the result you can also just think of it as redrawing the tree like this uh but it's easier from an analysis perspective to think about it is two rotations so then we can just reduce as long as we know rotations work then we know that this thing works Works meaning it preserves traversal order and we can maintain all the augmentations so now if I copy over these labels the height labels I have K minus one I have for these two guys K minus one or k minus 2 biggest one is Kus one this is K minus one and so this will be K this will be K this will be k + one and lo and behold we have a nice height balanc Tree in all three cases for this one node now this was the low node once we update this one it could be that um we changed the height of the root before it was k + 2 now it's k+ one or sometimes we keep it the same like over in this case uh and so now we have to check the parent maybe the parent is out of balance we just keep walking up the node and also maintain all the augmentations as we go then we'll keep track of height and subtree size if we want them or any other augmentations and after order H operations we will have restored the height balance property which means all the way through H equals order log n and so all of our operations now are magically order log n

Transcript for:Understanding AVL Trees and Operations

Transcript for:
Understanding AVL Trees and Operations