Hey everyone, welcome back to my channel. The new video in this video is starting with Hash Maps, one of the most important topics for data structures and algorithms. And if you don't already know, we're doing the complete data structures, algorithms and interview preparation bootcamp.
So the link to the playlist to the course can be found in the description below. You can check out the previous videos as well. And the best way to support is to like, share and comment and subscribe if you haven't already. But Hash maps, very important topic.
I know I say it with every new concept like okay this is the most important topic but yeah hash maps is very important topic because hash maps, the concept of hash maps is used in other sort of problems. It may be used with trees, with the dynamic programming, with sorting, with heaps. It is used quite a lot with heaps as well and we've covered an introduction to heaps so we'll do hash maps.
We'll do some nice questions in another video but this video is about this video is about HashMaps so we'll learn what are HashMaps, implement from scratch, see a lot of theory around HashMaps and around various ways by which people implement HashMaps, some of the limitations and the solutions to those limitations. So I'll tell you a lot of theory part but there will be a lot of code part in this video as well because we'll be implementing various versions of HashMaps ourselves and the code can be found in the description below the code that i'll be writing so you can run it on your browser easily and check it out as well so yeah code sample and everything in the description below also the notes and assignments and everything can also be found in the description below and yeah i'm very excited because it's a very fun topic i love hash maps and this video is more than enough for you to learn about hash maps and how they work and everything next video will do some questions Let's get started. Okay, so let's start with Hash Maps.
The code and the notes that I'm writing will also be found in the description below. But what I just recommend is just watch the video. You don't have to make any notes because it will be available.
The lecture will be online and you can google search whenever you get in a doubt. But still, if you want, you can make notes. The most important thing is that the video will be available online all the time on YouTube. and then you can revise anytime and all the important things I say you can make a timestamp note or whatever that's one of the best ways to you know go ahead with it but yeah let's get started hash maps so as you know I always start my lectures like you don't know anything about hash maps by the way if you have been following the data sector algorithms bootcamp please do that because we are continuing that series so in the previous sections we have done binary search trees we have done heaps and all sorts of things so with binary search tree you know we can get element or whatever in o of log of n time by industry very simple we've already done this but now the question is what if i want to get stuff in constant amount of time i have inserted 1000 elements in a data structure and i want to get that in constant amount of time in just one in one one step okay no searching no nothing needed no traversal no loops nothing okay so there are various ways to store items right so if you store number of items in like an array you store many items in an array and you're like okay try to search for an item which is so on and so forth try to search for 14 so what do you do you just start traversing i'll start checking from here and i will keep looking keep looking where the item is this is what this is o of n time we all know this binary search tree we know binary search tree will do o of log n time in order to search but the question is i want to search in constant time if someone tells search for this element in this data structure no traversing nothing needed no for loop needed i need to search it in constant time how can we do that we can do this using hash maps First we will look at why we are using it and then we will look at how it works.
So don't worry about the internal implementation right now. I am telling you you can access in constant amount of time. Yes, take my word for it. Believe me.
How we are doing it, what are the mechanism behind it, how it works internally, we will look into this video only so don't worry. Okay. So for now we know hash maps we can access data in constant amount of time. Okay how do we do that? No idea.
We will look at it later on. But for now we know constant amount of time. So the structure of hash map is sort of like this you have key value pairs you have a key and you have a value pair so for example key represents the name value represents the marks for example this is let's say what a hash map is key value pairs one entry in the hash map is kunal kunal is one of the keys value is 88 this is actually how much i got in my computer science exam in in high school. So people often ask me Kunal were you really good at coding when you were a little kid? No, I got 88 out of 100 which in India is not very good.
So I started coding seriously in first year of college when I did not when I was like really upset for not getting in a good college. Before first year of college I was very careless about studies. So I was not the best person in my high school for coding. It was actually Karan Singh Bora, my good friend.
Shout out to Karan. But Kunal 88 marks that is one element in the HashMap. Again don't worry about Kunal, how is this key value being stored? No, don't worry about it. We will look at the internal implementation.
For now just take a look at it in a diagram form only. Let's say it's being stored in a diagram form. How it is being stored?
What is the internal implementation? How is it made? We'll take a look at it later on. okay what sort of data structure and stuff is used to create hash map don't worry about it just theory part okay focus on the theory right now not at all about the code implementation please because i know people get worried about it okay forget the code part again for like i don't know i'll keep explaining this it will take some time maybe one hour i don't know i'm recording live so forget about code till i mention code okay no code needed right now we will look at theory and how it works and everything Let's say you have a key and a value pair in one entry of one element of a hash map.
Okay, so first element is having consisting of a key value pair, which is Kunal, 88 marks. We have Karan, 99 marks. We have Rahul, which is 95 marks, so on and so forth. So this hash map here has three items, these three items. And every item has a key and that key has a value.
okay let's say the reference variable of this hash map is map so you can say map dot get and i can pass kunal over here this will give me this will return the value which is equal to 88 in constant amount of time constant time i search for rahul i say map dot get rahul hey map give me the value of Rahul. Map will be like okay the value of Rahul is 95. Map will not be like okay let me search first Kunal is this what the user is asking? No.
Move forward to the next Karan is this what user is asking? No. Move forward to the next Rahul is this what the user is asking?
No. Yes. Then it will give Rahul. No.
That is not how it works. There is no traversing. Directly it will jump on to Rahul and it will give me that answer.
How? Don't worry about it. We'll cover it later. But that is what it's doing.
That is what it is doing. So I hope you are able to understand what HashMap is doing and how it's like actually giving me the answer and how we're trying to store it in HashMap the data in a theory sense. Let's move forward.
So what are some of the real world like implementations of HashMap? Why are we using HashMaps at all? It's used in a lot of places.
So in like compilers and interpreters for like C, C++, Java, Python, whatever, it is used there as well internally when the programming language is built to convert, to point the names to the variables. Okay, when you're like, hey, print A and A is equal to 10. So that value internally how the programming is too in-depth, but internally what I'm saying is. It also uses HashMaps when forming the programming language.
Simple example I can give you is network routers. So routing the IP addresses that also uses HashMaps because you need access in like constant time. And for network servers you have the port numbers that directs to like your socket or your application.
Virtual memory, so for example sometimes you create virtual machines or your virtual memory that is linked to physical memory or whatever. Right. and so many places it's being used now other places it is being used in like also in cryptography that is where hashing is used quite a lot. String search, grep, for example if you use the grep command that uses hash maps.
Okay, so a lot of places where maps are being used, hash maps. Okay, so let's see how this actually now works. Okay, now let me explain it to you in a very simple way how it works. So I'll explain theory first, we'll do code later and everything but please try not to get confused. I request you, it is not that complex.
Just think about what I'm saying. Okay, let's say we have a hash table. We have a hash table. Let's say it's in the form of an array. Okay, we have a table like this.
Let's say in the form of an array. Let's say I have an array like this. Okay.
let's say i'm storing it in an array let's say the idea is that it contains some elements and i'm like does this element exist in the array or not give me in constant amount of time or better just return the index where it is stored so if this contains like say let's say it contains kunal this contains rahul this contains sivo and I'm like give me the index of CVO it will not search it will not be like okay is this CVO is this CVO is this CVO is this CVO is it will not do linear search it will give me in constant time it will be like okay CVO exists yes that is true CVO exists how so how do we do this so the idea for hashing is one thing you have to understand is the concept of hash code now hash code is nothing really It is just a way of getting a number, an integer. Okay, you give it anything. It can be an object, it can be an integer, it can be a name, a string, a character, a float. Whatever you want to give it, you give it to the function and it will create a number out of it. Using some formula.
Using some formula. So... two problems arise over here in this scenario that in the array which is the keys have to be integers because sometimes you're like kunal so kunal you need a number for kunal how are you going to convert kunal into a number you use the hash code function okay problem one we need all elements as positive numbers.
We use the hash code function. I'll show you in just a minute. And I know it's confusing so don't worry I will explain it in more detail what I'm just saying. So bear with me for five minutes.
The second problem is the hash code that I get can be very large. So the idea is reduce it. okay this is known as hashing okay so what i can say is that reduce i'm not going to write it now i can write it reduce all elements in table to a size m okay so what i mean by this is let me let me just give you an example let me just give you an example let's say initially it's empty now let me explain it to you let me explain let's say initially the size of the array is one two three four five six seven eight or i can just say nine ten initially let's say the size of the hash hash table is 10 internally it's an array now i'm like hey kunal please insert the word cvo in it in the hash table which is formed like an array so you're going to be like okay this is actually not an integer i need to convert it into an integer so i will be like give me the hash of cvo java is going to be like okay this is equal to let's say something like 83479 Now this is a really big number. This index, what you can do is, let's say for every for every what you can do is, let's say Okay, I got an idea.
I got an idea to explain it to you. Okay. Let's say initially the size is equal to 10 of the array. Let's say the hash of Cvo is equal to giving me a number 7. Please don't get confused with hash. Okay, it's just, let's say a function in Java.
It may not be called hash. I will tell you what it's called. You just give it anything.
It will convert it into a number. Now, why does it matter how it converts it into a number? Maybe it's taking ASCII values or whatever. Maybe if you say hash of 20,000, it will give you what? Some other number.
Maybe it will be 20,000. Because I think to a particular number, the hash of a number it's the number itself you're not understanding let me let me first explain to you what hash number means let me just show you the code give me a second let me show you a code java hash maps I'll explain this later, okay? Don't worry. It's very simple, don't worry.
I'll explain it to you in just a minute. Hash code. Returns the hashed code. Okay. I'll just explain to you what hash code means.
Give me a second. Okay? So here in the object class, there is a hash code method over here.
So it just returns the hash code value for the object. If you have done object oriented programming, then you know that I can run it for any, I can run it for any because object class is, you know, the parent class. All the other classes are inside the object class.
So if I'm zooming in, as you can see over here, if I run it again, so it's like name Kunal. what is the hash code of kunal it's saying seven seven two eight five eight four four seven now you'll be like what is this is this the reference way reference address number no it's a random it's an integer that's it not random because there's some sort of method so for example i can say string class hash no hash code javascript hash code if we want to see the internal implementation Stack overflow like this. Okay, so maybe if you ask for hash code, it will be like taking the Ascii value and doing some you know calculation And doing some left shift right shift, whatever you want to call it some addition subtraction and then converting the string into a number You can just check the official java docs if you want to look at it Similarly, if I say Rahul, it will again take r, ascii value of r and a and a and convert it and merge it and do left shift, right shift, whatever you want to call it and then give me a number.
Here you go. So, if the value is same, the hash code will always be same because behind the function, it's only a formula. Please don't ask me about object class.
I will get very angry. I have explained like a complete object oriented programming playlist if you are asking me Kunal how are you doing name dot hash code I can't help you please watch object oriented programming playlist okay because this is object class method so I'm applying it in every other class I can apply I can do it integer also if I'm saying int a is equal to something like this what is going to be the hash code of this Usually the hash code of a number is the number itself to a particular value Exact answer you can find in the Java documentation So the hash code of 4,2,3,5,6,7,8 will be 4,2,3,5,6,7,8 because this is a unique number This is a unique number I use hash code Y because I want a unique number value and every number is a unique number So what is the point of changing this number into some other number? It will give the... Oh sorry, that was a primitive.
There we go. Same number itself. I can run it again.
See same number it's giving me. Okay, this is what hash code is. Give it anything it will convert it into a Number. Okay, so moving on here if I say hash code of cvo is let's say 7 So i'm going to be like, okay just insert it into index number 7. So here I will add SIVO. I will say hash code of Kunal is equal to let's say it calculates and give me another number called 3. So I'll be like okay I will add Kunal over here.
Hash of Rahul is going to be equal to let's say 2. I will add Rahul over here. Now when you try to get the item, I will like give me the item Kunal. It will be like, okay, let me just calculate the hash of Kunal.
Hash of Kunal is coming out to be 3. This because is a mathematical formula, so it will be, let's say, constant time. We can say, okay, don't go into too much details. I know some people who are too over smart. They will be like, oh, it's not constant.
It's average constant time. No. you are also wrong it's called amortized constant time which i will explain later for now just think constant time okay for now let's imagine it's all constant time the hash function it's giving me three so in constant time you will be like three and you will be like okay yeah it exists so that's how we are solving the problem of getting stuff in constant time but the reality is that this number is not one digit number as you just saw it's giving me a lot of if it's giving something like this for example what will you do if it's giving me a really big number it's not giving a smaller number it's giving a really big number but the size of my array is 10. so now we can do hashing hashing basically means if you're getting like larger numbers okay you're getting some big enough bigger numbers you can either do something like this number itself in the index of this number so have the array of the size of 7,34,985 no not good i want to insert all these items in the size 10 only but yeah that's what i want to do so we want to insert cvo it will be like okay let me calculate hash of cvo it will give this number now where to put this number so now it's known as hashing hashing now basically means we are trying to reduce all of these numbers in the format of let's say size 10. So all the numbers should come less than 10. So how do I create all these numbers less than 10? I can just take remainder with 10 modulo 10. Can I not do that? If I take any number let's say see 734985 modulo 10 getting me the remainder which is 5. this is coming out to be 5 so CO will be here similarly this modulo 10 will come out to be 7 Kunal will be here modulo 10 will be come out to be 3 Rahul will be here simple now also is constant time only because hash function getting me the number in constant time and i am reducing that number into a smaller digit of size 10. Not a problem?
Simple? Very easy? How we're doing it?
Okay. But now you'll be like Kunal, would it not be possible that that multiple, sometimes it may be possible that two items, even though they may not have the same hash code, now you'll be like Kunal, how do you know they will not have the same hash code? Prove it. I will prove it. Okay, don't worry, I will not let it go.
But for example, let's say after you do the modulo, there can be let's say if i do if i do something like if i do something like hash of armo this is coming out to be three nine four seven eight three now i do modulo ten it's giving me three so at index three i should put armo but now at index three rahul is already there what do i do this is known as a collision this is known as a collision when you get multiple indices as same that's known as a collision there are ways to deal with collisions also let's see okay so how do we deal with collisions and by the way this um you know uh how i got like these these are known as hash functions um that i have over here the hash function like this hash function is just modulo by the array size okay but there are multiple hash functions that we'll look and we'll look into it right now so collision two ways to resolve collision way number one is chaining and way number two is open addressing okay sounds good so what is chaining chaining basically means let's say i have an array at every element in the array let's say there is a link list i want to insert kunal kunal will go over here let's say in the link list Kunal then I want to insert Rahul let's say Rahul goes over here Sivo goes over here then I have Armo and let's say the Armo hash so getting the hash of Armo modulo by the array size is giving me the same number as Kunal so I will not replace Kunal since it is a linked list I will add Armo over here let's say i take another element that is also adding in the same number so it will be like okay now another element let's say ps5 something random okay sounds good so you can see the worst case here is kunal what is what if there is a possible worst case that all the keys belong to the same element only let's say you're in in in in in putting 10 names and all the 10 names when you get the hash value hash code of it and modulo by the array size it's giving you three only so it's going to be like one n n size list linked list because if you want to search for armo now it will be like okay get me the hash of armo it will be like okay after the modulo and everything it's giving me three so at the third index i will start searching here i will traverse is the first element armo no Is the second element Rmo? Yes, that's how I will get it. So even though at this index I am getting in constant time, there is a traversal here. So in the worst case, let's say worst possible scenario, let's say your life is like very panoti, which is a Hindi word for bad luck, every item you're inserting is giving the same index value. So you'll keep on inserting in that linked list.
Yeah, that will happen. Okay, so how do we solve this problem? how do we solve this problem?
how do we solve this? so we try to we try our best how do we solve this problem? this problem we try our best that not all the elements come on the same index all the time we try our best so this is known as we cheat a little bit we cheat a little bit how we use simple uniform hashing i know this is the doubt people have but kunal what if all the hash code that you get and divide it by the array size or whatever hash hash hashing hashing function whatever the hashing function gives me the number what is that number is three all the time all the time let's say it's three yeah that is possible in that case your whole hash map point of view is gone because it's no more constant time so we cheat a little bit it is known as simple uniform hashing we make an assumption the assumption is that every key is equally likely to be hashed to any slot in the table independent of where all the previous keys are hashed so if we say n is equal to what?
total number of keys in table m is equal to what size of the table then the load factor which is equal to alpha is going to be equal to n divided by m but what does this mean this basically means let's say if you have n number of items you want to put in the hash table total slots are m then n divided by m is the expected keys per item okay expected number of keys in per slot per slot because we are making this assumption that every key is equally likely to be put in every so the probability we are saying is we are making an assumption that the probability if you enter a new key the probability that it will go in the index number 0 1 2 3 4 5 6 7 8 9 10 all the indices any one of the indices it will go to the probability is same okay so let's say the size is 10 and you're inserting 20 items then we are saying that we are expecting at least two items in one index because 20 divided by 10 is 2. okay so what does this mean if we are assuming It means that the time complexity now is equal to 1, O. This 1 is for the hash function that we applied and got into that point. And now for the linked list, it will be...
What is the maximum size that we are assuming of a linked list? Alpha. Assuming, please we are assuming over here that maximum it will be alpha. Okay, this is for the list.
link list. I will prove it a little bit more. Okay.
Don't worry. Okay. And if alpha itself is equal to O of 1 means constant time, then this also comes out to be constant time. Okay. And this happens when the size of the table is equal to big omega of n.
If you don't know what big omega means, time complexity lecture, check it out. So I hope you are able to understand what I mean by assumption. We are assuming that every item that I put, you are putting Kunal, Rahul, Armo, PS5, Sivo, it does not matter what the items are already in the array. It does not matter. The probability that Kunal or Armo or any other new word that I want to put is equally likely to be in any one of these indices it's not like okay index number three we are assuming is higher chance index number four we are assuming in higher chance here let's say i say index number three is higher chance because it already has three items no for every item i'm assuming that the probability is same that it will lie in every single any single one of those okay now when we talk about the hash functions that is used to achieve this thing hash function let's see The first hash function that we have seen is the division method.
So the hash value, key value, where I will put it, index value of a key k is going to be equal to what? k modulo m. Now this can be anything, this can be like your... size of array for example but the size of array it's like not very efficient if you will in terms of like okay it may give let's say it's not very that much random so what people do is they sometimes take m is equal to a prime number okay why because it will give you more random generated answer but not too close to the power of 2 or 10 because these are common in the real world okay sounds good this is one of the hash functions that we have already looked into right now there's another way multiplicative method multiplication method What is multiplication method? I can say that the hash key of a key k is equal to, I can say the formula a multiplied by k modulo 2 raised to the power m shift w minus r.
Here a is equal to random number. There's nothing to be confused over here. This is just formula.
Scientists and people, they have created these formulas. Not a problem. Okay. W is equal to number of bits in K.
Okay. And M is equal to, I can say not M here. This will be W over here. and I can say one more relation is m is equal to 2 raised to the power r. Many times multiplication method is used because multiplication here is faster, the bit operations than division.
And this is just one of the popular formulas that people have to calculate the hash value where the item will go. This is practical when a is odd number. Okay. And 2 raised to power w minus 1 is less than a is less than 2 raised to power w. And also a is not too close to 2 raised to power w minus 1 or 2 raised to power w.
This will give us a very random number and that's another function that people can use. There's not much to explain this, this is a mathematical formula and in interviews no one is going to ask you to generate this formula or code this formula but if you want to code it, it's very easy but no one will ask you much information about you know this this stuff. In a simple way we've learned about the division method but I hope you understand what that means. So it's very simple. Again, don't be confused.
It's just a math formula. Okay, don't be confused. Okay.
Now, moving on. Another method of hashing that I want to tell you is universal hashing. right now i'm just telling you how numbers are converted into like you know like the indices and stuff so the hash value so here i can say for example hash for a key i can say is equal to a into k plus b okay modulo p where i can say oh also modulo m and here i can say that a and b are random numbers this much is much simpler okay this formula that belongs to can say 0 to p minus 1 0 say 2 p minus 1 okay so this is a very simple formula you give it a hash of k for a number let's say for a key and it's going to be like okay multiplied by a plus b modulo p modulo m so on for foot p is also a large prime number p is a large prime number.
Okay, sounds good. The reason I'm sharing is because using this formula, it says using this formula that the probability that if you have two different keys, and the probability that hash value of key 1 is equal to hash value of key 2 this is equal to 1 divided by m using this formula this is what the probability comes out we know we were talking about what if we do multiple numbers multiple items and they have the same number so the probability for that here is 1 divided by m ok So moving on we have just learned about like some basic functions. We have not even started HashMaps in Java yet.
Okay I will show you the interface and map and everything but it's important to learn about the internal details. Okay doubt you will be having. The doubt you may be hacking is how much should be the size of the table.
Okay you said initially we took random 10 for example. How long should be the size and what to do if let's say you are having a finite size of the table first of all you have 10 then you keep inserting thousand elements in it obviously we know chaining will happen obviously because slots are 10 and you have inserted thousand elements so how do you make sure what the size of the array is how large the table should be so size of the table you want the size of the table to be theta of n all the time okay again theta we've covered in space and time complexity right and if you have the size as very small then it means that your table will be slow if you have it very big it means that you are wasting resources wasteful okay so the idea is Start small and then grow. Okay, cool.
So the idea is let's say you have five elements at first you have just inserted 9 8 3 and 5 here you've inserted 7 here so there are five elements let's say you have inserted the size is five of the five of the table and you have inserted five elements already so after the table is full what you do is you double the size so you will now create 10 you will add in the items 9 8 7 3 5 and then you will add in new items okay so you can double it or you can let's say increase it one by one also if you increase it one by one let's say when number of elements becomes equal to the size you let's say increase the size by one So at every step what you're doing is you're increasing the size by one. Increase size by one, copy all the elements, then add in a new element. Increase size by one, copy all the elements, then add in a new element.
So n inserts will cost how much? O of 1 plus 2 because you're inserting the elements again and again. Copy, pasting, 6 plus 7 plus at every step 8. For nth element you will be doing n operations because copying it n time then adding a number one okay this comes out to be what sum of n numbers o of n squared so in order to add n elements you are doing o of n squared time complexity so better way to do is multiply by 2 means double the size okay so what do you mean by double the size do size equal to 2 okay table doubling and the idea is that let's say you have one item so you just insert in one item in the array now you want to insert another item so what you will do is you will double it and then you will insert another item so if you double it it will take two after that third item will just go there fourth item you have to double it again and so for example initially let's say you have just one so you added one over here now you want to insert three so it will be like okay it is full so i will double it and then insert three okay you want to insert another item let's say you want to insert a four so what you'll do is you will double it again now you insert it four okay then you want to insert 5 then you want to insert 6 so you want to insert 6 you will double it again you will insert 6 so this double cost what 4 then you want to insert again so at every 2 raised to power i you are doubling very simple math right so for n till n this is going to be equal to what if you want to insert n elements this is coming out to be o of n so to insert so in simple terms when you double the table cost to insert n items comes out to be o of n because of this formula because of this thing okay this is very simple just simple maths right okay so n items are on an average cost let's say on an average in quotation on an average inserting an item's cost o of n so inserting one item is going to cost o of 1. this is known as amortized constant time or like average constant time on an average okay remember how we did array list in array list what was happening array list also internally is an array but when it gets full or half or whatever we double it we double it In that case also, insert and in arraylist and stuff is also O amortized constant time.
We say it's constant time but it's on an average constant time. Okay, cool. Similarly if the size of the array becomes if it has half the number of elements then I can shrink the array.
okay sounds good but what you're doing is this is also not very efficient because this is o of n per operation so instead for shrinking what we have to do is when m when the size here i can say that when n becomes m by 4 number of elements become m by 4 then half the size now this is o of 1 amortized time okay sounds good easy let's move forward okay now we looked at chaining one of the ways to prevent collisions another way to prevent collision is known as open addressing again i have not looked at the code or the java part right now so don't worry we will look into it later on like after this after the theory part but now we're talking about open addressing what does open addressing mean so how do we deal collisions using open addressing it means no chaining so we use all the indices or elements like slots in the table only we will not create any linked list or whatever okay so idea here is only one item per slot okay so the size should be greater than equal to the number of items so what the hash function is going to do is it's going to probe probe basically means it's going to try it's going to try to insert an item okay so for example i have an empty let's say i have a i have a table over here let's say i create a table like this okay this has some items already 139 46 88 and i want to insert something so i will find the hash value for i want to insert 33 at the first try or the zeroth try okay zero means the first try and let's say this points me to this index so it will be like okay this is already available try something for this now on the second try second try it will be like okay now it's pointing to this it's gonna be like okay this is also available insert 33 on the third try it will let's say point to this index it'll be like okay this is empty i will add 33 over here so instead of creating a linked list or whatever if you create a if you get the hash function gives you a value and that value index is already occupied Previously we were creating a linked list now to be like, okay, we won't do that. We will look at the next possible item That's basically what it means. Okay sounds good.
Okay now since when I'm searching for 33, right? So it's like 33, I'm searching, let's say if I'm searching for 33, then it will first search for 139. Then it will let's say search for 46 and then it will come to 33. Make sense? So it's going searching for 139, 46 and then 33. So if I want to delete an item, I can't just make this empty. I will have to put a pointer here or some note here called.
delete like a flag I have put over here so that when I'm trying to search for 33 it will be like 139 delete it means that there was something here that got deleted and after that 33 okay I will show you how this thing happens how it moves let's let's look at it right now so what are some of the probing strategies let's say we found an index but it doesn't work so it will look for other index how will it look for other index probing strategies first one is known as linear probing okay h of key for the i-th time is going to be equal to it's going to be equal to what calculate the hash function normally okay plus i modulo m okay so i just calculate a hash function 0 33 it's pointing to this then it's going to be like okay i will look at the second try 33 of 1 whatever item was plus 1 means the next index so i'm linearly jumping first it was over here then it will check this one then this one then this one then this one then this one linearly jumping so it will add item here similarly i go over here it will add item here so it is linearly jumping so let's say if i have like some elements like this right let's say index number m minus one 0 h of k of i because it's linear it will be like in in like this size or something okay because i'm jumping linearly all right there's a problem here a cluster is being formed problem is clustering you're not utilizing all the space and a cluster is being formed please jump a little farther okay so what you can do is uh double hashing h of key for the i earth try is going to be equal to first hash function i use plus i multiplied by second hash function that i will use modulo m so it remains in the same size of the array now i'm not just increasing by linear numbers i'm actually increasing by thing by a another factor so it's making a more jump okay cool right Also, this will actually hit all the slots like in permutation if h2 of k is relative prime to m for all k. See, the thing is, these sorts of things that I am mentioning are for people who really like maths like me. Interviewer will not ask you this. and these things are also not required, you know these mathematical complex stuff for data structures algorithms, not required.
Okay but for fun fact I'm telling you, if you're the kind of person who doesn't like math that much, you can skip over it, it's fine. But I'm just saying if you're not able to understand what I'm writing mathematical complex formulas, don't worry, look at some of the basics, learn some basics, but you can ideally totally skip it as well. You can directly jump onto the code part when I do.
how hash maps function in java this is just for general knowledge okay why because here i can say why is this like going to hit all the slots of the permutation if this thing is relatively prime to m because if i write the formula h1k plus i multiplied by h2k mod m okay this i can say is actually equal to h1k plus j multiplied by h2k mod m and here i can say this means m divides i minus j okay so for example here i can say that m is equal to let's say 2 raised to power r so make if you make the second key always odd so for example size of the array let's say 2 raised to power 4 which is equal to 16 then h2 of k can be equal to 5. Okay, so I would say don't worry much about this thing. This was a little extra thing. I'd say you can skip it. It's fine.
No one will ask you that much in detail about how double hashing works internally. No one will ask you about double hashing either way. But if I talk about like, you know, the hashing function, let me just tell you one other last important one before we move on to the code. This is uniform hashing assumption. It is not.
same as the simple uniform hashing that I looked really. So the assumption is that every key is equally likely to have every key is equally likely to have m factorial permutations as the probe sequence. Okay, because there are m elements, so how many sequences can a key have. First it will try this one, then this array, then this this one, this one, this one, this one, this one, this one, this one.
So there are many sequences. First it is trying index number 3, 4, 5, 8, 1, 2. Like that's a permutation. So how many total permutations can it have? m factorial. So the assumption now we are making is that every key is equally likely to have m factorial permutations.
Okay so let's say you use Open addressing to insert n items into a table of size M, right? So under the uniform hashing assumption I can say that the cost of the next operation Okay is going to be equal to cost of next Operation is going to be less than equal to 1 upon 1 minus alpha. I will explain this in just a minute how that is possible because as alpha grows the number of expected probes that you need will also grow means that we will resize the table for example when alpha becomes high so we'll try to keep alpha small okay so for example i can say that okay we all know what alpha is load factor n divided by m which is all always equal to less than one so if i say alpha is equal to 90 let's say 90 it is full then i can have 10 expected probes okay if you want to look at a proof like i know what this sound to know how did you just put this random number one upon one minus alpha let's take a look okay let me explain to you the cost of next operation how it's coming this so uh how do we do it So let's say you want to insert an item with the key k and let's say that item is not in the table.
What is the probability that the very first item, very first element that it finds will be the empty element? Let's say there are total m slots. n you have already inserted. Then how many are the empty slots?
m minus n. First, success. chances because we are assuming that everyone has an equally likely chance.
Let's say this is equal to probability P. If the first one fails, then the second success is equal to what? Remaining slots minus that current slot, total minus 1, out of total minus 1. This is obviously I know greater than equal to what? m minus n upon m.
and I know this is equal to p, right? So p comes out to be what? 1 upon n minus m which is equal to 1 minus alpha. Sounds good? So with every trial I can say for the next trial if you want to do third success for example right m minus n divided by m minus 2 which is also greater than equal to p right so this probability p is coming out to be so as the number of operations increase that will always be equal to greater than equal to p and p is equal to what 1 minus alpha so in mathematics you know when you get the expected number of trials using a probability you just have to reverse it so expected trials comes out to be 1 upon P which is equal to 1 upon 1 minus alpha this thing so this is the cost of next operation okay how many operations it will take for next item okay similarly delete will also take o of 1 upon 1 minus alpha sorry my notes are a little messy but the important thing is the video keeps going okay so this is the expected number of trial for success okay that's open addressing i hope this one was clear so very simple stuff not not much complicated just try to think what we're doing over here everything i've written on the board But now the real question is open addressing versus the chaining that we did previously.
So when to use which? Open addressing, better cache performance. Someone asks you in interview, better cache performance, like better memory usage because pointers are not needed. pointers not needed and chaining when this is less sensitive to hash functions because open addressing requires some extra care to avoid clustering and the load factor, for example, open addressing degrades past 70% because the array is full now, number of slots are getting smaller like the number of slots sounds good cool that was basically about it we'll now look at java and how you can do java you'll have more fun now because i know theory can be a little bit boring but uh yeah hashing being used in cryptographs as well for example when you store your password you don't just store your password as a text you convert it into a hash then you store it and digital signatures and all sorts of things um you do it that way but let's take a look at some code samples and some internal implementation of how these things work. We'll implement the chaining and stuff but yeah and also see what map java offers inbuilt.
So yeah one more thing I want to tell you is that the items that are stored in the hash map are not in sorted order because you know it's getting random numbers so it will be stored randomly. But I hope you understand how that works. Now let's take a look at java.
So in java there's this interface called map and there's a set interface in java also which is like you can add collection of items with no duplicate elements and you can get items in constant time but there's no key value okay just one item is there so here's a hash set and linked hash set reset so on and so forth i'll show you the code in just a minute but here you can see interface map it has some implementing classes Enum map, the main ones we'll look at is HashMap. There's the Linked HashMap. For example, you can check out what the linked hash... So it has the hash table and linked list, implementation and the one we just looked. Okay?
So that one is there. It also has something that you might want to look at called TreeMap. Remember I told you that you can't store in sorted order, but in TreeMap, the... items you store in the map are in sorted order.
Okay, and this is a red and black tree based implementation We have not done red and black trees right now, but we'll do it later on but Yeah, and also it's giving a note that it is not synchronized So if multiple threads access the map concurrently it will modify the structure So if you want to use tree if you want to use a map But also want to keep items in sorted order use tree map It's very simple to use like because these are abstract data types you know meaning it will just give you the methods like get, set, you know delete and things like that and you don't have to worry about the internal implementation. Contains key, size and get object, you know find the first key, last key, put all, put, remove so on and so forth. So you don't have to worry about the internal implementation of T-Map, but you can take a look at the Java code But we'll take a look at as well.
We'll also, you know, obviously create our own implementation from scratch and everything Um, but yeah, the most popular one is HashMap. So you can see over here simple key value pair a very simple stuff and Yeah, some of the methods are available over here. Let's say put key value remove this key and return a collection of values contained in the map the size of the map you can get so on and so forth if it contains the key so let's just look into that so if i'm doing hash hash maps if i'm just like zoom out a little bit hope you can see here i can say hash map map is equal to new what i can do is i can add in a type hash map of type string comma int map dot put kunal 989 gotten 99 rahul map dot get marks of current oh what is wrong with me today 99 okay current's mark is saying 99 so it did not do any traversal or anything it just gave me 99. other functions you can try it is entry set retainers return a set of the view set it will just return a set of the set view of the mappings contained in the map okay you can get um get or default that's another one so you can either search for it if it doesn't exist then it will give you a default value okay get or default get or default let's say apoor apoor is not in the database so just give me 78 instead see 78 so just explore the documentation contains key contains value if it contains key map dot you know contains key current yes it is true it contains current true easy is empty key set it will return all the keys similarly you have set over here right you can have a hash set so it is just a collection of items that are not duplicate and okay also you can access the items in constant time so i can say actually let me just create function public no never mind i'll just do it like this it's okay this can be a hash set of integers okay hash set has add so i can add and i can can clear and i can can return an iterator so i can iterate upon it and i can get the size of this so on and so forth so set dot add 56 set dot add i'll just try to add a bunch of things in it let's say i add 56 again it will obviously not Include it twice.
See, no duplicates. 56 I inserted twice, but it will only give me once. Okay, has set We will see in future videos how we are utilizing HashMap and HashSet and everything. Similarly you can use TreeMap, just convert it into a TreeMap. So internally it will be same way.
Okay internally it's same, like it's internally it's different but the way you're using it is same. But TreeMap now it's storing it in sorted order. Okay. so tree map if you want to get items it will also give you items in sorted order descending key set return a reverse order navigate navigable set view of the keys in the map so you know object-oriented programming now so you can explore i'll just say explore all of these methods on your own. Easy stuff, not a problem.
All the documentation, everything is given, right? But what I'm going to do is, this all is fine, but what we're going to do is, we're going to do our own implementation. We're going to do our own implementation of HashMaps, okay? I'll try to do, let's say I'll do chaining using linked list, okay?
Let's try to do that. Let's do our own implementation of HashMaps. One more thing I wanted to mention here, you can see a HashMap and HashTable. So what is the difference between these two? The best way to find it is actually looking into it.
So HashMap, here I can say that HashMap is not synchronized, so it is not thread safe and cannot be shared between many threads without using proper synchronization code, whereas HashTable is synchronized. Okay. So all the things we learned about load factor and initial capacity everything default load factor 0.75 Uh all the things theory thing we did you can already see over here So we've done we've done some pretty good so far in the lecture.
Okay. Also hash map allows one null key and multiple null values whereas hash table does not allow any you know null key or value and hash map is basically preferred over hash table if thread synchronization is not needed. Now you may be asking why hash table does not allow null and hash map does?
Well because if you want to successfully store and retrieve objects from a hash table, the objects used as keys must implement the hash code method and the equals method. Okay since null is not an object it cannot implement these methods. HashMap is basically an advanced version and improvement of HashTable and HashMap was created later on that is why it has this functionality but that's the answer to the question why HashTable doesn't allow null but HashMap does it's all written in the documentation i love reading documentation you should as well okay so i'll just do a simple implementation um this is obviously not very optimized but just to explain to you i can say public not public i can just say class map using hash and here I will create a constructor actually just copy this so that I don't have to add in again and again what I'm going to do is internally it will be an array entity type entities and I can say entities is equal to new entity array of let's say 100 elements for now and what i'm going to do is i'm going to create an mtt as well so private class entity this is going to have what string every entity has what key and value constructor this dot key is equal to key this dot value is equal to value okay good enough right now some basic methods we'll make uh let's say we can do put public void put string key string value you can obviously use the you know the what do you call it i forgot um the the magic ones that uh mixing python with java wild cards think yeah so you know um wild cards in java not wild cards um what is called the the type that you can provide Generic right it's known as generic. I have covered generics.
Okay allow a type of Compile type safety so generics in Java you can You can use generics If you want this is a proof that you don't have to always know everything in this Google And that's how I showed you people often ask me. How do you know so many things? I don't I forget stuff all the time and I use Google and that's okay because i never memorize like the syntax and the theory and keyword and parts and stuff like that basics always should be clear but anyway i want to put an item how do we do that we have to get the hash first math dot absolute i can say key dot hash code modulo the size of the array entities dot length okay then entities of hash is equal to i'll create a new entity obviously this is not good one because it is overriding no chaining nothing if you find something that is same same it will override it okay i can get if i give it a key i need to get that so again i will just get the hash constant time if entities of hash is not equal to null and entities of hash dot key dot equals the key that I am searching for return entities of hash dot value otherwise return null.
this check I added because there might be the case that you have removed that key. Speaking of which, I can remove a key also. I can again get the hash. After I get the hash, I will check if entities of hash is not equal to null and the key is the one that I'm searching for. Right?
In that case, what am I going to do? I'm going to remove it. Simple, but not very optimized.
We just made our own hash map. Map using hash. Mango, king of fruits, map.put, sweet red fruit, map.put.
What's the spelling of lychee? Maybe that's it. nals favorite fruit map dot get apple in constant time amortized constant time it will give me the answer okay so i will run this sweet red fruit not a problem but it's not very well optimized okay sounds good cool what i can do now is instead of this i can do link list so map using link list let's do that okay so let's make it a little bit prettier and nicer so i can say hash map separate chain or i can just say smaller name i can say hash map final okay so internal implementation we will have around using linked list and stuff hash map final let's try to make it using generics so i will add in key comma value all right sounds good we will have an array list of link list of type entity so every element in the array list or the array that i have will be a link list and that link list will have types of entity right and the entity i can copy it's the same as here private class entity okay so private class entity i can hide this so that you can see it clearly string key value and the simple stuff all right that looks good actually I'm going to put it inside nice that looks good what are we going to do now I'm going to have an initial size I can say private int size is equal to initially 0 and I will have a load factor private load load factor is 0.5 load so it means that 0.5 we all know what this means load factors inserted items divided by the total size of the array so for example if you need to add 100 items and you have already added 50 items then if you add the 51st item it is going to double the size copy the all elements then add in the 51st item amortized constant type the same thing we did in theory okay i'll create a constructor right gonna create a list new array list now i can say for int i is equal to 0 i is less than 10 i plus plus let's say the initial size i can say list dot add new linked list okay initially let's say i've added 10 empty linked list Okay, so far we have done nothing. We have just set some stuff in. But the main part is if I want to add an element.
public void put kkey value I will just get the hash key dot hash code modulo list dot size okay now i'll say link list of type entity so you have a particular element and let's say you want to insert that element um you have you have got an element let's say kunal you've got the number so you need to get this link list first so we'll go to the overall array lists that index it will give you a linked list of type entity so linked list of type entity is equal to in the main list dot get that index now you have that index now try to search for these entity you can say for entity in entities i can say if entity dot key dot equals the key that I have provided you can say entity.value means you can update it is equal to value and then return entities.add you can say new entity key comma value but before adding I have to check the load factor if float of size divided by list dot size so current elements that i have added divided by list dot size if this is greater than the load factor then rehash means double the list and whenever you add you also do size plus plus okay so double the list so how do we do rehash rehash is very simple I hope you're able to understand because I'm only doing the things that I did in the theory. So now we are rehashing. I can just say system.out.println. We are now rehashing and I can say that, okay, let's say This array is full.
So Sorry, these are 50% full So I will copy every single linked list at every single index into a new array list of double the size Okay, array list of type linked list of type entity old is equal to list And I can say list is equal to new array list size let's say initially 0 and i can say copy all the previous elements for int i is equal to 0 i is less than old dot size multiplied by 2 and i plus plus so i will just say list.add new linked list. Okay, so what I did was I just saved the old list in another variable and I just created the new list and that size of that new list is currently empty but it is double the size of the previous list. Now I will just take every element from the previous list and add it in the new list.
So I can say for every linked list in the previous list and every entry in that list, just insert it. So accordingly it will do the hash functions and everything and it will insert it. Okay, very easy simple stuff now similarly I can say Public value get pass a key.
It will give you the value I can all I can I need the hash So I will calculate the hash. Here you go actually put it here okay got the hash value link list of type entity entities is equal to list dot get at that index and now in that index of the array you have a link list so at that index you have a link list and try to search if the answer you want the key that you want is in this link list or not if entity dot key dot equals the key that you are searching for then return the value otherwise it is not found return none okay sounds good similarly i can create a remove public void remove key you can also return the value over here if you want int hash calculates entities is equal to list dot hash got it target entity is initially let's say null for entity in entities like all i'm just gonna similarly i was searching i will search as it searches again if i found it then i can say i have found it target is equal to entity break entities.remove target and also don't forget to do size minus minus that's it pretty simple stuff you can also make other functions like contains key or not public boolean contains key I can say return get key is not equal to null okay that is return true if you found the key otherwise return false pulse okay similarly if you want to you know how we did uh just print the object so if i say hash map final map is equal to new hash map final and i try to print that map first of all it will give me some gibberish value but i can do that i can fix that by overriding the toString method i've covered this entire use case in object-oriented programming how to pretty print objects custom objects that i have built how to pretty print those using the override function the override annotation i'm not going to explain the override one because i've done it in detail how to pretty print objects i have done it in detail in the object when programming playlist and please don't ask me what override means so string builder builder is equal to new string builder builder.append let's say my array will start my if my output will start with the closing brackets my output will end with this and then i can say for link list type entity entities in the main list and for every entity in that list builder.append i can say and entry dot dot key and i can say builder.append equals entry dot value and also then append a comma Simple. That's it. Created our own HashMap with linked list, generic, all those terms that we learned as well.
Very simple. So we did a lot of stuff. Don't worry about it.
You can run this using this like just go to the main function, run it. It's pretty straightforward. But thanks a lot for watching. I'd say if you like this video, make sure you like, share and subscribe.
In the next videos, we'll do some more questions. If you have any questions, leave those in the comment section below but still I'll run this before we wrap up. One mistake I made while copy pasting is this should be of type K, type V, type K and type V. Okay, to fix in some typos I can go to my main, I can say hash map final and it should print a sweet red fruit.
Cool? Similarly what I can do is I can... just print the map as well run see lychee kunal's favorite fruit comma mango kunal king's king of fruit comma apple a sweet red fruit there you go because of the overriding the two string method okay start with this end with this added this so when you print an object directly just a reminder it will look at the two string method and because of the hierarchy it will first look at the overridden one. I'm not, why am I explaining? Just go watch the object-oriented programming playlist.
But that's basically about it and yeah make sure you like, share and subscribe. Leave the comments, comment helps as well and you can take a screenshot, share it on socials using hashtag DSA with Kunal, share what you learned about HashMaps, you know share your journey. join the learning public initiative assignments code everything can be found in the description below if you want to run this code directly like i'm running just using this browser i will leave it in the description below so you can check out the url and i will update the assignments and stuff after i'm done the next few videos will be around code around questions interview questions and some other concepts that utilize hash maps so we'll do that but uh yeah i'm looking forward to you know seeing your learning journey and yeah any questions or any appreciation let me know in the comment section below and check out the links in the description and the previous dsa videos that i've done and other bootcamps and i'll see you in the next one have a great day