Transcript for:
Hashing Lecture Notes

welcome to the fourth lecture of 6006. um today we are going to be talking about hashing last lecture on tuesday um professor solomon was talking about uh set data structures right storing things so that you can query items by their key right by what they intrinsically are versus what professor domain was talking about last week which was sequence data structures where we impose an external order on these items and we want you to maintain those but i'm not i don't really i'm not supporting operations where i'm looking stuff up based on what they are right that's what the set interface is for right so we're going to be talking a little bit more about the set interface today so last on tuesday you saw two ways of implementing the set interface right one using just a unsorted array just i threw these things in an array and i could do a linear scan of my items to support basically any of these operations it's a little exercise you can go through i think they show it to you in the recitation notes but if you'd like to implement it for yourself that's fine uh in and then we saw a slightly better data structure at least for the find operations can i look something up whether this key is in my uh set interface we can do that faster we can do that in log n time with a build overhead that's about n log n right because we showed you three ways to sort two of them were n squared one of them was n log n which is as good as we showed you how to do yesterday so the question then becomes can i build that data structure faster that'll be a subject of next week's thursday lecture but this week we're going to concentrate on this static find right right we got log n which is an exponential improvement over linear right but the question here now becomes can i do faster than log n time and what we're going to do at the first part of this lecture is show you that no you what's up what no okay that you can't do faster than log n time in the caveat that we are in a slightly more restricted model of computation that we were then what we introduced to you a couple weeks ago and then so if we're not in that that more constrained model of computation we can actually do faster right and doing faster is i mean log n is already pretty good right log n is not going to be larger than like 30 for any problem that you're going to be talk talking about uh in the real world on real computers but uh a factor of 30 is still bad right i would prefer to do faster with those constant factors when i can it's not a constant factor it's a logarithmic factor but you get what i'm saying okay so what we're going to do is first prove that you can't do faster for for fine does everyone understand remember what find key meant right i have a key i have a bunch of items that have keys associated with them and i want to see if one of the items that i'm storing contains a key that is the same as the one that i searched for right the item might contain other things but it in particular has a search key that i'm maintaining the set on so that it supports find operation search operations based on that key quickly does that make sense so there's the find one that we want to improve and we also want to improve this insert delete we want to be make this this data structure dynamic right because we we might uh do those operations quite a bit and so this lecture is about optimizing those three things okay so first i'm going to show you that we can't do faster than log n for find which is a little weird okay the model of computation i'm going to be proving this lower bound on right i'm saying that how i'm going to approach this is i'm going to say that any way that i store these the items that i'm storing in this data structure for any way i store these things any algorithm of this certain type is going to require at least logarithmic time that's that's what we're going to try to prove and the model of computation that's that's weaker than what we've been talking about previously is what i'm going to call the comparison model and a comparison model means is that the items the objects i'm storing i can kind of think of them as black boxes i don't get to touch these things except the only way that i can distinguish between them is to say given a key and an item or two items i'm i can do a comparison on those keys right are these keys the same uh is this key bigger than this one is it smaller than this one those are kind of the only operations i get to do with them i don't get to look at what the say if the keys are numbers i don't get to look at what number that is right i just get to take two keys and compare them and actually all of the search algorithms that we saw on tuesday were comparison sort algorithms right what you did was you stepped through the program at some point you came to a branch and you looked at two keys and you branched based on whether one key was bigger than another right that was a comparison and then you moved some stuff around but that was the general paradigm those those three sorting operations uh lived in this comparison model right you've got uh you know comparison operations like are they equal less than greater than maybe greater than or equal less than or equal right generally you have all these operations that you could do maybe not equal right but the key thing here is that there are only two possible outputs to each of these comparators right there's only two things there's there's only one one thing that i can branch on it's going to branch into two different lines right it's either true and i do some other computation or it's false and i'll do a different set of computation right that makes sense so what i'm going to do is i'm going to view a comparison an algorithm in the comparison model as what i like to call a decision tree right so if i specify an algorithm to you the first thing it's going to do if i don't compare items at all i'm kind of screwed because i'll never be able to tell if my keys and they're not so i have to do some comparisons so you know i'll do some computation you know maybe i find out the length of the array and i do some constant time stuff but at some point i'll do a comparison and i'll i'll branch right i'll come to this node and if the comparison like maybe a less than right if it's true i'm going to go this way in my computation and if it's false i'm going to go this way in my computation right and i'm going to keep doing that with various comparisons um sure until i get down here to some [Music] leaf in which i'm i'm not branching right the internal nodes here are representing comparisons but the leaves are representing i stopped my computation i'm outputting something does that make sense what i'm kind of trying to do i'm kind of changing my algorithm to be put in this kind of graphical way where i'm branching what my program could possibly do based on the comparisons that i do right i'm not i'm not actually counting the rest of the work that the program does right i'm really only looking at the comparisons right because i know that i i'll need to compare some things eventually to figure out what my items are and if that's the only way i can distinguish items then i have to do those comparisons to to find out right does that make sense all right so what i have is a binary tree that's representing the comparisons done by my algorithm okay so it starts at one comparison and then it branches how many leaves must i have in my tree what does that question mean right like in in terms of the program what's up the number comparison the number of comparisons no that's the number of internal nodes that i have in the algorithm and actually the number of comparisons that i do in an execution of the algorithm is just along a path from here to the to a leaf right so what do the leaves actually represent those are represent outputs right i'm going to output something here and yeah the number of i okay so i need at least so what is the output to my search algorithm maybe it's the an index of an item that contains this key or maybe i return the item uh is the output right the item so the of the thing i'm storing and i'm storing n things so i need at least n outputs right because i need to be able to return any of the items that i'm storing based on a different search key parameter if it's going to be correct i actually need one more output why do i need one more output if it's not in there right right so any correct comparison searching algorithm i've i'm doing some comparisons to find this thing needs to have at least n plus one leaves right otherwise it can't be correct because i could look up the one that i i'm not returning in that set and it would never be able to return that value does that make sense yeah what's n n is always the num in for a data structure n is the number of things stored in that data structure at that time right so the number of items in the data structure that's what it means in all of these tables any other questions okay so now we get to the fun part how many comparisons does this algorithm have to do yeah up there what's up all right your colleague is jumping ahead for a second but really i have to do as many comparisons in the worst case as the longest route to leaf path in this tree right because as i'm executing this algorithm i'll go you know down this thing always branching down and at some point i'll get to a leaf and in the worst case if i happen to need to return this particular output right then i'll have to walk down the longest thing right it's the longest path so and the longest path is the same as the height of the tree okay so the question then becomes what is the minimum height of any binary tree that has at least n plus one leaves ever understand why we're asking that question okay so unrest yeah why it needs n plus one leads if it's a correct algorithm it needs to return it needs to be able to return any of the n items that i'm storing or say that the key that i'm looking for is not there great question okay so what is the minimum height of any binary tree that has n plus one at least n plus one leaves you'll you can solve that uh you can actually state a recurrence for that and solve that you're going to do that in your recitation but it's log n right like the best you can do is if this is a balanced binary tree right so the min height is going to be at least log n height right right where the min height is logarithmic right so it's actually theta right here but but if i just said height here i would be lower bounding the height right if i could have a linear height right if i just chained comparisons down one by one if i was doing a linear search for example right all right so this is saying that if i'm just restricting to comparisons i have to spend at least logarithmic time to be able to find whether this key is in my set right but i don't want logarithmic time i want faster so how can i do that i have one operation in my model of computation i presented a couple weeks ago that allows me to do faster which allows me to do something stronger than comparisons comparisons have a constant branching factor in particular i can i if i do this operation this constant time operation i can branch to two different locations right it's like an if kind of situation if or else right and in fact if i had constant branching factor for any constant here right if i had three or four if it was bounded by a constant the height of this tree would still be bounded by a log base the constant of that number of leaves okay so i need in some sense to be able to branch a non-constant amount right so how can i branch a non-constant amount this is a little tricky right we had this really neat operation in the random access machine that we could randomly uh go to any place in memory in constant time based on a number right it was a super powerful thing because within a single constant time operation i could go to any space in memory right that's that's potentially much larger than linear branching factor depending on the size of my model and the size of my machine right so that's a very powerful operation can we use that to find quicker anyone have any ideas sure uh we're going to get to hashing in a second but this is a simpler concept a concept in hashing hashing something you probably are familiar with already we've kind of been using it implicitly in some of our sequence data structure things what we're going to do is if i have a an item that has key 10 okay i'm going to keep an array and store that item 10 spaces away from the front of the array right at index nine or the tenth index does that make sense if i store that item at that location in memory i can use this random access to that location and see if there's something there if there's something there i return that item does that make sense this is what i call a direct access array it's really no different than the arrays that we've been talking about earlier in the class we got an array and if i have an item here with key equals 10 i'll stick it here in the 10th place now i can only now store one item with the key 10 in my thing and that's one of the stipulations we had on our set data structures right if we tried to insert something with the same key as something already stored there we're going to replace the item right that's what the semantics of our set interface was but that's okay that's that's that's satisfying uh the conditions of our send interface so if we store it there that's fantastic how how long does it take to find if we have an item with the the key 10 it takes constant time worst case great how about inserting or deleting something what's what's that again constant time we've solved all of our problems this is amazing okay what's not amazing about this why don't we just do this all the time ah i don't know how high the numbers go right so let's say i'm storing i don't know a number associated with the the three or four hundred of you that are in this classroom right but i'm storing your mit ids how big are those numbers those are like nine digit numbers right pretty long numbers so what i would need to do in in if i was storing your keys as uh mit ids i would need an array that has indices that span the entire space of nine digit numbers right that's like 10 to the nine or nine uh 10 to the nine thank you ten to the nine is the the size of a direct access array i would have to build to be able to use this um [Music] this technique to to create a direct access array to search on your mit ids when there's only really 300 of you in here right so 300 or 400 is an n that's much smaller than the size of the numbers that i'm trying to store what i'm going to use as a variable to to talk about the size of the numbers i'm storing i'm going to say u is the maximum size of any number that i'm storing okay it's the size of the universe of space of keys that i'm storing does that make sense okay so to instantiate a direct access array of that size i have to allocate that amount of space and so if that is much bigger than n then i'm kind of screwed right because i'm using much more space and these order operations are bad also right because essentially if i am storing these things uh non-continuously i kind of just have to scan down the thing to find the next element for example right okay what's your question a direct access array is a set data structure that's why it's a set interface up there so uh this is uh your colleague is asking whether you can use a direct access array to implement a set i mean a sequence and actually i think you'll see in your recitation notes you have code that can take a set data structure and implement a sequence data structure and take a sequence data structure and implement a set data structure they just won't necessarily have very good run time so this direct access array semantics is really just good good for these specific set operations that make sense yeah what you u is this the size of the largest key that i'm allowed to store that makes sense right and the the direct access array is supporting you up to you size keys does that make sense okay we're going to move on for a second so what's that that's the problem right we when you largest key we are we're assuming integers here right integer keys right so in the comparison model we could store any arbitrary objects that that supported a comparison here we really need to have integer keys or else we're not going to be able to use those as addresses right so we're making a an assumption on the inputs that i can only store integers now i can't store arbitrary objects items with keys and in particular i also need to this is a subtlety that's in the word ram model how can i be assured that these keys can be looked up in constant time how does my seat my this little cpu right it's got some number of registers it can act upon how big is those registers what well so they're they're right now they're 64 bits but in general they're w they're the size of your word on your machine that's how many uh two to the w is the number of addresses i can access so implicitly i'm kind of if i'm going to be able to use this direct access array i need to make sure that the u is you know less than 2 to the w right if i want these operations to run in constant time right if i have keys that are much larger than this i'm going to need to do something else okay but this is this is kind of the assumption in in this class when we give you like an array of integers or an array of strings or something like that on your problem set or on an exam right the assumption is unless we give you a bounds on the size of those things right like the number of characters in your string or the size of the number and you can assume that those things will fit in in one word of memory okay w is the word size of your machine right the number of bits that your machine can do operations on in constant time any other questions okay so we have this problem we're using way too much space right when we have a large universe of keys so how do we get around that problem any ideas sure okay so what your colleague is saying instead of just storing one value at each place maybe store more than one value if we're using this uh idea right where i am storing my key at the index of the key that's getting around the us having to have unique keys in our data structure it's not getting around this space usage problem does that make sense we will end up storing multiple things at indices but there's an another trick that i'm looking for right now right we have a lot of space that we would need to allocate for this data structure what's an alternative instead of allocating a lot of space we allocate less space right let's allocate less space all right so we have a really like this is our space of keys right you right but instead i want to store those things in a direct access array of maybe size n right something like the order of the things that i'm going to be storing i'm going to relax that and say we're going to make this a length m that's you know around the size of the things i'm storing okay and what i'm going to do is i'm going to try to map this space of keys this large space of keys from like zero to u minus one or something like that right down to a range that's zero to m minus one okay right i'm gonna want a function this is what i'm going to call h which maps this range down to a smaller range does that make sense i'm gonna have some function that takes that large space of keys sticks them down here okay and instead of storing at an index of the key i'm going to put the key through this function the key space into a compressed space and store it at that index location does that make sense sure oh your colleague is comes comes up with the the question i was going to ask right away which was what's the problem here the problem is it's the potential that we might be story have to store more than one thing at the same index location right if i have a function that matches this big space down to this small space i gotta have multiple of these things going to the same places here right it's got a it can't be injective right but just based on pigeonhole principle i have more of these things at least two of them have to go to something over here in fact if i have say u is bigger than n squared for example right there for any function i give you that maps this large space down to the small space n of these things will map to the same place right so if i choose a bad function here then i'll have to store n things at the same index location and if i go there i have to kind of check to see whether any of those are the things that i'm looking for i haven't gained anything right i really want a hash function that will evenly distribute keys over this space right does that make sense but we have a problem here if we need to store multiple things at a given location in memory can't do that i have one thing i can put there so i have two options on how to deal what what i call collisions right if i have two items here like a and b these are different keys right in my universe of space but it's possible that they both map down to some hash that has the same value right so where do i if i if i first hash a and a is i put put a there where do i put b there are kind of two options um is the second data structure a linked list so that um it can store me okay so what your colleague was saying can i store this one as a linked list and then i can just insert a guy right next to where it was what's the problem there is linked lists are linked lists good with direct accessing by an index no they're terrible with get at and set at right they take linear time there right so really the whole point of direct access array is that there is an array underneath and i can do this index from arithmetic and go down to the next thing so i really don't want to replace a linked list as this data structure yeah what's up we can make it really unlikely sure i don't know what likely means because i'm giving you a hash function one hash function and i don't know what the inputs are yeah okay right so there are actually two solutions here one is i i maybe if i choose m to be larger than n right there's going to be extra space in here i'll just stick it somewhere else in the existing array right how i find an open space is a little complicated but this is a technique called open addressing which is much more common than the technique we're going to be talking about today in implementations python uses an open addressing scheme which is essentially find another place in the array to to put this collision but it's open addressing is notoriously difficult to analyze so we're not going to do that in this class there's a much easier technique that we have an implementation for you in the recitation handouts it's what your colleague up here i can't find him over there was saying was instead of storing it somewhere else in the existing direct access array down here which we usually call the hash table right instead of storing it somewhere else in that hash table we'll instead add that key store a pointer to another data structure right some other data structure that can store a bunch of things just like any sequence data structure like a dynamic array or a linked list or anything right all i need to do is be able to stick a bunch of things on there uh when there are collisions and then when i go up to look for that thing i'll just look through all of the things in that data structure and see if my key exists does that make sense now we want to make sure that those additional data structure structures which i'll call chains right we want to make sure that those chains are short right they don't i don't want them to be long right so what i'm going to do is when i have this collision here instead i'll have a pointer to some i don't know maybe make it a dynamic array or a linked list or something like that and i'll put a here and i'll put b here and then later when i look up key k right or look up key a or b let's look up key b i'll go to this hashed value here i'll put it through the hash function i'll go to this index i'll go to the data structure the chain associated with that index and i'll look at all of these items i'm just going to do a linear find i'm going to look i could put any data structure here but i'm going to look at this one see if it's b it's not b look at this one it is b i return yes does that make sense this is an idea called chaining i can put anything i want there commonly we talk about putting a linked list there but you can put you know a dynamic array there you can put a sorted array there to make it easier to check whether the key is there you can put anything you want there the point of this lecture is going to try to show that there's a choice of hash function i can make that make sure that these chains are small that it really doesn't matter how i store them there right because i can just i have i have time if there's a constant number of things stored there i can just look at all of them and do whatever i want and still get constant time yeah so and like let's just say like for some reason the number of things is that most of them get like multiple yeah so what your your colleague is saying is kind of at initialization what is stored here right initially it points to an empty data structure right i'm just going to initialize all of these things to have now you get some overhead here right we're paying something for this some extra space in having pointer and another data structure at all of these things or you could have the semantics where if i only have one thing here i'm going to store that thing at this location but if i have multiple it points to a data structure these are kind of complicated you know implementation details but you get the basic idea right if i just have a zero size data structure at all of these things i'm still going to have a constant factor overhead right it's still going to be a linear size data structure as long as m is linear in n does that make sense okay so how do we pick a good hash function i already told you that any fixed hash function i give you is going to experience collisions and if u is large right then there is the possibility that i for some input all of the things in my set go directly to the same hashed index value so that ain't great let's ignore that for a second what's the easiest way to get down from this large space of keys down to a small one what's the easiest thing you could do yeah modulus great this is called the division method okay and what it's the function is is essentially it's going to take a key and it's going to set it equal to b k mod m okay i'm going to take something of a large space i'm going to mod it so that it kind of just wraps around right perfectly valid thing to do it satisfies what we're doing in a hash table and if my keys are completely uniformly distributed ran like if if when i use my hash function all of the keys here are uniformly distributed over this this larger space then actually this isn't a such a bad thing right but that's imposing some kind of distribution requirements on the type of inputs i'm allowed to use with this hash function for it to be of have good performance right but this plus a little bit of extra you know mixing and bit manipulation is essentially what python does it essentially all it does is kind of jumbles up that key for some fixed amount of jumbling and then mods it m and sticks it there but there are some it's it's hard-coded in the the python library what this hash function is and so there exists some sequences of inserts into a hash table in python which will be really bad in terms of performance because these these chain lengths or the amount number of collisions that i'll get at a single hash is going to be large right but they do that for other reasons they want a deterministic hash function they want something that i do the program again it's going to do the same thing underneath right but sometimes python gets it wrong but if your data that you're storing is sufficiently uncorrelated to the hash function that they've chosen which usually it is this is a pretty good performance okay but this is not a uh practical class well it is a practical class but one of the things that we are uh that's the emphasis of this class is making sure we can prove that this is good in theory as well right i don't want to know that sometimes this will be good i really want to know that if i choose if i if i make this data structure and i put some inputs on it i want a running time that is independent on what inputs i decided to use independent of what keys i decided to store does that make sense right in some sense i want but it's impossible for me to pick a fixed hash function that will achieve this right because i just told you that if u is large right this is if u is large then there exists inputs that map everything to one place so i feel i'm screwed right i can't there's no way to solve this problem that's true if i want a deterministic hash function right i want the thing to be repeatable to do the same thing over and over again for any set of inputs what can i do instead weaken my notion of what constant time is to do better okay use a non-deterministic what does not what does non-deterministic mean right it means don't choose a hash function up front choose one randomly later right so have the user they pick whatever inputs they're gonna do and then i'm gonna pick a hash function randomly they don't know which hash function i'm gonna pick so it's hard for them to give me an input that's bad right okay so how do i'm going to choose a random hash function how do i choose a random can i choose a a hash function from the space of all hash functions what is the space of all hash functions of this form right for every one of these values i give a value in here right that would be a completely like i choose for each one of these independently random number between this range how many such hash functions are there m m to the this number that's a lot of things right so i can't do that what i can do is fix a family of hash functions where if i choose one from randomly i get good performance and so the hash function i'm going to use and we're going to spend the rest of the time on is what i call a universal hash function it has it satisfies what we call a universal hash property so universal hash function and this is a little bit of a weird nomenclature because i'm defining this to you as the universal hash function but actually universal is a uh you know a descriptor right there exists many universal hash functions this just happens to be an example of one of them okay so the hash function uh or really right so we're gonna so here's the hash function it doesn't look actually all that different goodness gracious how many parentheses are there mod p mod m okay so it's kind of doing the same thing as what's happening up here right but before modding by m i'm multiplying it by a number i'm adding a number i'm taking it mod another number and then i'm modding by m okay this is a little weird and not only that this is still a fixed hash function i don't want that i want to generalize this to be a family of hash functions which are this h a b k for some random choice of a b in this larger range all right this is a lot of notation here essentially what this is saying is i have a hash family okay it's parameterized by the length of my hash function and some fixed large random prime that's bigger than u just i'm going to pick some large prime number okay and that's going to be fixed when i make the hash table okay and then when i instantiate the hash table i'm going to choose randomly one of these things by choosing a random a and a random b from this range does that make sense this is a not equal to zero right if i had zero here i kind of lose the key information and that's no good does this make sense so what this is do is multiplying this key by some random number adding some random number modding by this prime and then modding by the size of my thing okay so it's doing a bunch of jumbling and there's some randomness involved here i'm choosing the hash function by choosing an a b randomly from this thing so when i start up my program right i'm going to instantiate this thing with some random a and b not deterministically right the user when they're using this thing doesn't know which a and b i picked right so it's really hard for them to give me a bad example right and this universal hash function this universal hash family shall we say really this is a family of functions and i'm choosing one randomly within that family is universal and universality says that what is it what is the property of universality it means that the probability by choosing a hash function from this hash family that a certain key collides with another key is less than or equal to one over m for all any uh different two keys in my universe does that make sense so if i basically this thing has the property that if i randomly pick or if i for any two keys right that i pick in my universe space if i randomly choose a hash function the probability that these things collide is less than one over m why is that good this is in some sense a measure of how well distributed these things are i want these things to collide with one over m probability so that these things don't collide very likely it's not very likely for these things to collide does that make sense so we won't prove that this hash family satisfies this universality property you'll do that no four six but we can use this result to show that if we use a universal ha this universal hash family that the length of our change chains is expected to be constant length okay so we're going to use this property to prove that okay how do we prove that we're going to do a little probability okay so how are we going to prove that i'm going to define a random variable an indicator random variable does anyone remember what an indicator random variable is yeah it's a it's a variable that with some amount of probability is one and one minus that probability is zero right so i'm going to define this indicator random variable i x i j is a random variable over my choice over choice of a hash function in my hash family and what does this mean it means x i j equals one if hash ki equals hkj these things collide right and zero otherwise okay so i'm choosing randomly over this hash family if for two keys i and j key i and key j if these things collide that's going to be one if they don't then it's zero okay then how can we write a formula for the length of a a chain in this model right so the size of a chain right or let's put it here the size of the chain at i right at i in my hash table is going to equal i'm going to call that the random variable x i that's going to equal the sum over j equals 0 to is it over i think u minus 1 of summation or sorry of x i j right so basically if i fix this uh this location i right if i fix this location i this is where this key goes right so i'm looking at sorry this is the size of chain at h of k i sorry so i look at wherever k i goes is hashed right and i see how many things collide with it right i'm just summing over all of these things right because this is one if there's a collision and zero if there's not does that make sense so this is the size of the chain at the index location mapped to by ki okay so here's where your probability comes in what's the expected value of this chain length over my random choice okay expected value over choosing a hash function from this universal hash family of this chain length well that's just i can put in my definition here that's the expected value of the summation over j of x i j what do i know about expectations and summations if these variables are independent from each other say say what linearity of expectation right basically the expectation of the sum of these independent random variables is the same as the summation of their expectations right so this is equal to the summation over j of the expectations of these individual ones okay one of these j's is the same as i right j j loops over all of the things from zero to u minus one right one of them is i right so when x h i is h j what is what is the expected value that they collide one right so i'm going to refactor this as being this where j does not equal i plus one are people okay with that because if i equals if if if j and i are equal they definitely collide right they're the same key right so i'm expected to have one guy there which was the original key x i right but otherwise we can use this universal property right that says if they're not equal and they collide which is exactly this case right the probability that that happens is 1 over m right and since it's an indicator random variable the expectation is their outcomes times their probabilities right so 1 times that probability plus 0 times 1 minus that probability right which is just 1 over m right so now we get the summation of one over m for j not equal to i plus one oh and this sorry i did this wrong this isn't u this is n we're storing n keys right okay so now i'm looping over j this over all of those things how many things are there and minus one things right so this should equal one plus n minus one over m okay so that's what universality gives us so as long as we choose m to be like larger than n or at least linear in n then we're expected to have our chain lengths be constant right because this thing becomes a constant if m is at least order n does that make sense okay the last thing i'm going to leave you with is how do we make this thing dynamic if we're growing the number of things we're storing in this thing it's possible that as we grow n for a fixed m this thing will stop being m will stop being linear in n right well then all we have to do is if we get too far we rebuild the entire thing the entire hash table with the new m right just like we did with a dynamic array and you can prove we're not going to do that here but you can prove that you won't do that operation too often if you're resizing in the right way and so you just rebuild completely after a certain number of operations okay so that's hashing next week we're going to be talking about doing a faster sort