Transcript for:
Lecture on Binary Tries

welcome back in the last video I introduced the concept of binary tries how these are tries that we treat as sequences and we look at the integer index kind of as a string but in its binary representation uh in this video what I want to start doing is drawing some of these and show you what they look like uh once again the binary try is an immutable data structure and so likee a lot of our data structures where you have an outer class that has nodes inside of it the elements of the binary try are the entire thing and I will refer to these as vectors uh and at a fundamental level they have arrays inside of them and there are two types of nodes in our binary try uh one node is the leaf node and the other node is an internal node and so Leaf nodes store data and internal nodes store other nodes uh the we'll we'll see kind of how these work for indexing and for adding start with adding for my example I am actually going to focus on the situation where we break things in twos so in the last video I showed how we could do this kind of in group our bits into fours for this because it's easier to draw I am going to group things into twos now in reality every one of our integers is 32 bits uh which if we were grouping them into fours means that there would be eight levels to our tree um but when your sequence is short that's really not efficient it's not it's not useful to go 0000000000 0 and most of the time you're going to be indexing things that start with a lot of zeros uh instead what we do is we only keep the parts of the tree that we need and we keep track kind of of what portion of the number we're looking at in particular the leaves are always going to look at our smallest grouping so if we were grouping by fours we would have what I did earlier where we pull off the lowest four bits when we Group by twos it'll be the lowest two bits because two bits uh can only represent 0 through three we wind up storing these things in arrays that can't be bigger than four if we were doing four bits they would be arrays that can't be bigger than 16 so to start off with uh you have kind of a nil value an empty for your uh vector and that empty Vector would actually have an array of size zero in it I don't really have a good way of of drawing that out but if we add to that what we get back is a new Vector uh I will make a vector we'll call it V1 and it has as as I mentioned an array but this array only has one element in it and I'll use letters as the contents here and so V1 refers to our array with a single letter in it what happens if we call add on V1 so if I said V1 do add a b uh well that is going to make a new Leaf node that has two elements in it and so we make a new Leaf node the first element gets copied over the SEC second element gets set and V2 refers to that new uh that new Vector okay fairly straightforward note that this was a full copy in here um what happens when we call add again well as I said earlier we can go up to a size of four we're not up to that size yet so we wind up with three elements uh when we call add on V2 to make V3 our V3 has the A and the B copied over and then it adds the C to it uh all of these elements are you know independent the fact that we added something to V1 doesn't change V1 adding something to V2 doesn't change V2 these these chunks of memory are still here if we weren't referring to them anymore the garbage collector come and pick them up if we are referring to them they will be kept around so that we can use them but they're not being mutated oh there's one more ad that's kind of you know another layer of of AD that's not that interesting and that is if we add one more we get an array with four values in it and I'll call it V4 okay the next ad is where things get interesting though our uh arrays can't get bigger than four and so this now has to these have all been Leaf nodes at this point the next step when we make our V5 and so we're going to declare a variable it'll be V5 uh when we call add this node it checks itself and it says hey I already have four elements in me I cannot uh add another element in here so what it does is it creates an internal node and that internal node I'm going to scooch up our V5 a little bit here that internal node has an array with two elements in it and those elements are not our data those elements are the the uh our references to other nodes in this case they are Leaf nodes but if we were higher up in the in the try they would be other internal nodes and the first one of these actually refers to our full node here we're going to reuse it at this point this is part of what makes the the binary try uh efficient if we kept copying things over and over again that would be an immutable array uh which winds up being not very very efficient for for adding and removing and then we have another leaf that has our one single element in it and it points to that new node there and of course V5 points to this internal node so we have this internal node that has two the first child has the first four elements we added the second child has the other element we added what happens if we call add on V5 my diagram is going to get a little bit interesting here but so we're going to make a V6 and it's the result of adding to V5 now V5 is an internal node once again we can't change these nodes they're immutable the first part is full still so what we're going to do is we basically need to make a new Final leaf and in order to reference it we're going to have a new internal node so I have a new internal node it still only needs two elements in it the first element though is going to keep referencing that full node that we had the second element is going to be an array of two with our the one value we had copied over and our one new value let's see see actually let's do that and just connect them in different order select okay so we get a picture that looks like that and we need to connect up V6 here V7 would have three elements V8 would have two full leaves and at v9 we're just going to assume dot dot dot uh so by v9 we would have for V8 we would have had another leaf that would have been efgh we need to make a new internal node but this time our internal node has to go up to three elements and we need another leaf that has our one new element the first connects to there the second connects to there the third connects to there and that would be v9 now in order to keep track of this turns out the only data we store in our leaves is going to be array of whatever our element type is so if our element type is e this is going to be an array of e uh at the internal level we need to store two pieces of information one piece of information is the array and it will be an array of our Vector type because it could be leaves it could be internals both of which are subtypes of our Vector type it also needs to store how far up it is in the tree and the reason for that is that we need to know how far to downshift so uh however far up it is in the tree we need to downshift by in some ways we could call it level times the group size so however many bits were grouping together uh up here I was grouping them into four for a drawing example I grouped them into just twos but whatever level we're at uh times that group size that way we know how far to downshift and by the way if I were only doing two bits this would need to be and three uh not not an F uh I will note that if I write this in code I would put 0x3 even though 0x3 and just plane three are the exact same thing by putting the 0x3 it makes it clear to anyone reading this that I care about the binary representation and I'm treating this as as a binary it's it's very standard to to write these things out in hexadecimal if you write them out in decimal it's potentially less clear to uh to your reader what you're doing uh so that's it for this video what I'm going to do is I'm going to come back in the next one and we'll start laying out the code a little bit to to talk through how we would create these operations