Anagram Concepts and Algorithms

Hi everyone, my name is Shashwat Tiwari and in today's video we are going to talk about anagrams. So without any delay, let's start today's lecture. The question is, given two strings A and B consisting of lowercase characters, the task is to check whether two strings are anagrams of each other or not. An anagram of a string is another string that contains same characters. Only the order of characters can be different. For example, act and tack are anagrams of each other. So you will be given two strings, basically you have a Boolean function here which you have to implement, isEnagram. I will give you two strings and you have to tell whether those strings are enagrams or not. Now what is enagram? Ok, so what are enagrams? I have written a very good example here. We have two strings, string A and string B. A string is my ABCD and B's values are DABC. Now if you look carefully, both the characters in both the strings are exactly same. So if someone ever asks you what is anagram, so two such strings whose characters are exactly the same but their order is different here normal ABCD is written and here D, A, B, C are written the same characters, same frequency, no new character is added no character is removed, just jumbled if you give a string and jumble it it becomes anagram, so here A and B are anagrams The question is very simple, I will give you these two strings A and B and I will tell you if these two strings are anagrams or not. So how can we solve this question? See, it is very simple, you should have already come to your mind that here these two strings are the same number of characters, only their order is different. So I will compare their order by doing the same thing. And how will the order be the same? Obviously, you have to bring these two in a common state and what is that common state? Sorting. If I sort A, then its value will be A, B, C, D only. But if I sort B, then its value will be, by the way, is sort. I got used to writing half. A, B, C, D. Now see, let's compare it. Where A is here, A will be here. If it is not like this, then it is not anagram. Because sorting puts it in the same order. So A here, A here, let's move ahead. B is same, C is same, D is same, done. Now if there is an extra E here, then you say that there are no anagrams. Or if a D disappears here, then you say that there are no anagrams. So its length should be the same and its character should be exactly the same. But what is the complexity of sorting time? As far as I remember, I have taught you all the algorithms in n square, in the worst case. Which one I have not taught you? One I have not taught you Merce sort and the other I have not taught you Quit sort. So their is a little better, n log n times. But that is not a better solution. What could be a better solution? I will compare both of them by using normal sorting. Match is fine till then. Exit as soon as match fails. I will show you how to do this. First of all, I have to sort both of them. But unfortunately, string doesn't have dot sort method. I will convert it into character array. And then we will sort it. char arr Okay, I'll assign and then I'm Lisa K a dot to care Harry. Okay, okay So obviously copy carleta copy and same here But please be dot to care a day. Okay, this M. Lick's a carry of B Up sorting kill a man a coin bill functions in but I know or Rupert say I'm a rip us both sorry functions I mean, we'll get a raise dot sort. Okay, yeah, apparently second a error of a Okay, then a raise dot sort arr of b. Now both these are sort. Now simple int index a is equal to 0 and int index b is equal to 0. Now what I have to write here is while index a is less than a.length. One more thing you have to keep in mind here that if their length is not the same then there is no point in checking anything. because if one is longer than the other, it means that they don't have the same number of characters so we can normally check here that if a.length is not equal to b.length if their length is not the same, then what we can do is return false these can never be anagrams now when their length is same, I converted them into character arrays now because at this point you know that the length of both the strings is same so I will denote it with a.length or b.length, there will be no change or you can write it as a.length for continuity whatever you write, we made the character array with the same string so the length of both will be same obviously So if it is less than the length of your index, and similarly your index is less than the length of your index. Now array index is not equal to array index . If the characters are not the same, then there can't be any anagrams. So I can say return false. Otherwise, what I have to do here is, index a++ and index b++ and after this story ends, return true this is a normal concept that if their length is not same then return then anagram can't be done length is same then sort both of them point one from index a and one from index b and keep checking if both the characters are same simultaneously I have to increment both of them at the same time because positioning is very important if it is sorted then it should be same then only we will get the anagrams and here let not equal to so now let's run it compile and run perfect, now it is coming right let's submit it too I guess all the test cases will run or maybe TLE will come because it is not a good solution so now it is running but actually we have to optimize this solution now see Length will be same, it will be N and N will be N only. That's why we are checking the anagram otherwise you saw we were returning on the first line. Its time complexity is N square plus N square in worst case. You are using N plus N for traversal but you know that N square is taken as time complexity. Why am I saying N square again and again and not N log N? Because I haven't taught you Merge sort and Quick sort yet. Although there is Erase sort, it uses Tim sort. Tim sort is actually a combination of Merge sort. and insertion sort in, a combination of both of them makes a Tim sort. So they use it, that's why it takes so much time complexity of n log n. Right? So if someone asks you, you have to give the same answer, n log n. But I haven't taught you yet, so I said O . Okay? No problem. Now we understood that n square is n log n, but solution is taking time. It is giving a problem. I have to solve its problems. If you look at this question carefully, then A, B, C, D, okay? And D, A, C, B. I have to check both these strings. Now, can I say that we can play cut-and-cut in this? If I minus both these strings, what will be the answer? I know that strings don't get minus. Don't say that, bro, you are crazy. I don't know that. But let's say, A to A got cancelled. B to B got cancelled. C to C got cancelled. D to D got cancelled. So, what happened here? Nothing happened. Empty was left. So, what is empty? Enagrams. But I can't do this minus sign directly on string. So what I have to do now? I have to use my brain. So what was the meaning of showing this cut-and-cut? Basically, we are checking the frequency here. If A has come here once, then it should be here only once. It should not be 0 times or more than 1. It should be exactly one time. What I will do is, I will take out the count of all these. I have checked it. I have made an array. 1, 2, 3. 0, 1, 2, 3. Let's say 0th index is of A, 1st index is of B and so on. So now I have taken out all the frequencies, so I know that this is it. Now when I take out its frequency, what will I do? I will make an array. 1, 2, 3. And here also I will check that if it is D, then I did 1 on D. If it is A, then I did 1 on A. If it is C, then I did 1 on C. If it is B, then I did 1 on B. Now when you go, you can compare it again. See this, its frequency is 1, its frequency is 1, its This one and this one are exactly same. So what happened to you? Anagrams are done. Okay, but here you saved the sorting. How much time did it take you to create this array? O of n. How much time did it take you to create this array? O of n. Now you will say that this can also be O of m. Both of them are not the same length. Brother, we have already checked. If the length is not the same, then we are not performing anything. Then we should not check anything. Okay, so what happened here? That you created this with this traversal. you created this from the travel set. How much size did you use? 2 of n. You made two arrays, right? And then you used extra O and M. Apart from this calculation, to compare them, whether they are same or not. The approach is good, but we can optimize it a little more. You must be seeing that it is exactly same. So why should I use two arrays? I could have used a single array. So what I will do is, let's remove it again. So, how will you get the positioning of A? If you know the question, then there will be lower case characters. That means from small a to small z. Nothing else will happen. I can map it very nicely. I have taken capital here. So if I say its ASCII value is 65, then what is its value? 66 and so on. If I subtract 65 from all of them, the same concept that we used in Radix sort, Counting sort and all algorithms, the same concept we used in pigeonhole, I said, value minus 1, I just want to find the frequency. So I will write the value of A minus 65, so its ASCII is 65. 65 minus 65 will give you 0. So you will get to know that it is the 0th index. Similarly, in case of D, 68 minus 65 is 3. So it is the 3rd index. Like this. So you understood this mapping. Now in the question, because I have lower case, I will have to subtract from 97. Because 97 has the same value. Small a to small z. So it starts from 97. 97, 98 and so on. So, I will iterate from here. First, I got A. A has 97. 97 minus 97 is 0. Whatever is on the 0th index, go there and add 1. Then B. B has 98. 98 minus 97 is 1. Go to the first index and add 1. Similarly, C has 99. So, 99 minus 97 is 2. Second index plus 1. D has 3. And the same error will be of any size. All the alphabets you have. In English, there are 26 alphabets. Okay? Now I have to traverse the second string. Till now I have used O time plus O space. I haven't done anything else. Now let's see, D is given. D's S will be the third index. Minus it and you can see that 100 minus 97 is the third index. Now what I will do here is, I will do minus 1. So when I saw these values, I did plus 1. When I saw these values, I did minus 1. So if these two are anagrams, So the final result will be minus 1, minus 1, minus 1 which means it will be completely zero. Okay? So you have O and O . Okay? This is to traverse the first string, this is to traverse the second string, and finally to traverse this array. To check if I am getting any non-zero. And what was O ? This is your space. Okay? You used extra space. So this method is comparatively better. Okay? You are going 3N times 4. but if you haven't used sorting then you can't find n square logic or n log n logic so let's quickly code it first check length a.length length function is not equal to b.length if both of them are not same then immediately return false don't think anything else, just write false next you have to make frequency array int frequency array it will be of size new and how many characters are there? 26 alphabets and now what I have to do is I can say that I can calculate both of them in one loop so index a is equal to 0 and index b is equal to 0 so the 2n that I told you to calculate a and b you can do that in single loop and make n from 2n like both of them have same length so there will be no problem what we can say is if while index a is less than len let's find the length so int len is equal to a dot leng th why i did this? because both a and b have same length now if it is less than len and index b is less than len in this case what i have to do in first string i.e in a string the corresponding s of the character has to be found and its frequency is plus 1 what does it mean? what is the character? charA I can write it a.charAt index a now let's do it copy then paste this will be your index b this is your bth character and I can write it charB now let's find the frequency index I can write it index of a, this will be your charA-97 similarly we will remove b, so first a's copy paste this will be your charB, this will be your freakIndexB ok, now very simple, go to the frequency array go to freakIndexA and write here plus equal 1 and here you have to write and paste what will it do? but which one? frequency b means if a is a character then it is plus 1 and b is minus 1 now if we cancel out plus minus then we will know that it should not be non zero frequency array should be zero only neither negative nor positive if we get even one non zero then it means these are not anagrams but first of all because we have to finish this loop so I will say index a++ we have seen one character so we will read further index b++ because this is your while loop not for loop If you want, you can do it. we can use for loop to avoid this now for int i is equal to 0 i is less than 26 characters and then i plus plus 1 frequency is non zero i can say frequency of i is not equal to 0 non zero 1 is also available so return false send it back from here otherwise return true ok So yeah, that's it. Let's run it quickly. I hope it should run. I hope I haven't made any mistake. It's done. Let's submit it and see. So it's done. This was today's video. We saved the time complexity. We used n instead of 2n. But you know that because their powers are the same, 2n and n will be considered linear. It means that it will be considered big of n even after solving it. So there is not much optimization theoretically. But yeah, there is practically optimization. So this was today's video, I hope you understood today's video and liked today's question. Now see you in the next video, till then bye bye and take care.

An anagram of a string is another string that contains same characters. Only the order of characters can be different. For example, act and tack are anagrams of each other.

So you will be given two strings, basically you have a Boolean function here which you have to implement, isEnagram. I will give you two strings and you have to tell whether those strings are enagrams or not. Now what is enagram? Ok, so what are enagrams?

I have written a very good example here. We have two strings, string A and string B. A string is my ABCD and B's values are DABC. Now if you look carefully, both the characters in both the strings are exactly same. So if someone ever asks you what is anagram, so two such strings whose characters are exactly the same but their order is different here normal ABCD is written and here D, A, B, C are written the same characters, same frequency, no new character is added no character is removed, just jumbled if you give a string and jumble it it becomes anagram, so here A and B are anagrams The question is very simple, I will give you these two strings A and B and I will tell you if these two strings are anagrams or not.

So how can we solve this question? See, it is very simple, you should have already come to your mind that here these two strings are the same number of characters, only their order is different. So I will compare their order by doing the same thing. And how will the order be the same?

Obviously, you have to bring these two in a common state and what is that common state? Sorting. If I sort A, then its value will be A, B, C, D only.

But if I sort B, then its value will be, by the way, is sort. I got used to writing half. A, B, C, D. Now see, let's compare it.

Where A is here, A will be here. If it is not like this, then it is not anagram. Because sorting puts it in the same order.

So A here, A here, let's move ahead. B is same, C is same, D is same, done. Now if there is an extra E here, then you say that there are no anagrams.

Or if a D disappears here, then you say that there are no anagrams. So its length should be the same and its character should be exactly the same. But what is the complexity of sorting time? As far as I remember, I have taught you all the algorithms in n square, in the worst case.

Which one I have not taught you? One I have not taught you Merce sort and the other I have not taught you Quit sort. So their is a little better, n log n times. But that is not a better solution. What could be a better solution?

I will compare both of them by using normal sorting. Match is fine till then. Exit as soon as match fails.

I will show you how to do this. First of all, I have to sort both of them. But unfortunately, string doesn't have dot sort method.

I will convert it into character array. And then we will sort it. char arr Okay, I'll assign and then I'm Lisa K a dot to care Harry.

Okay, okay So obviously copy carleta copy and same here But please be dot to care a day. Okay, this M. Lick's a carry of B Up sorting kill a man a coin bill functions in but I know or Rupert say I'm a rip us both sorry functions I mean, we'll get a raise dot sort.

Okay, yeah, apparently second a error of a Okay, then a raise dot sort arr of b. Now both these are sort. Now simple int index a is equal to 0 and int index b is equal to 0. Now what I have to write here is while index a is less than a.length. One more thing you have to keep in mind here that if their length is not the same then there is no point in checking anything.

because if one is longer than the other, it means that they don't have the same number of characters so we can normally check here that if a.length is not equal to b.length if their length is not the same, then what we can do is return false these can never be anagrams now when their length is same, I converted them into character arrays now because at this point you know that the length of both the strings is same so I will denote it with a.length or b.length, there will be no change or you can write it as a.length for continuity whatever you write, we made the character array with the same string so the length of both will be same obviously So if it is less than the length of your index, and similarly your index is less than the length of your index. Now array index is not equal to array index . If the characters are not the same, then there can't be any anagrams. So I can say return false.

Otherwise, what I have to do here is, index a++ and index b++ and after this story ends, return true this is a normal concept that if their length is not same then return then anagram can't be done length is same then sort both of them point one from index a and one from index b and keep checking if both the characters are same simultaneously I have to increment both of them at the same time because positioning is very important if it is sorted then it should be same then only we will get the anagrams and here let not equal to so now let's run it compile and run perfect, now it is coming right let's submit it too I guess all the test cases will run or maybe TLE will come because it is not a good solution so now it is running but actually we have to optimize this solution now see Length will be same, it will be N and N will be N only. That's why we are checking the anagram otherwise you saw we were returning on the first line. Its time complexity is N square plus N square in worst case.

You are using N plus N for traversal but you know that N square is taken as time complexity. Why am I saying N square again and again and not N log N? Because I haven't taught you Merge sort and Quick sort yet. Although there is Erase sort, it uses Tim sort.

Tim sort is actually a combination of Merge sort. and insertion sort in, a combination of both of them makes a Tim sort. So they use it, that's why it takes so much time complexity of n log n. Right?

So if someone asks you, you have to give the same answer, n log n. But I haven't taught you yet, so I said O . Okay? No problem.

Now we understood that n square is n log n, but solution is taking time. It is giving a problem. I have to solve its problems.

If you look at this question carefully, then A, B, C, D, okay? And D, A, C, B. I have to check both these strings. Now, can I say that we can play cut-and-cut in this?

If I minus both these strings, what will be the answer? I know that strings don't get minus. Don't say that, bro, you are crazy.

I don't know that. But let's say, A to A got cancelled. B to B got cancelled. C to C got cancelled. D to D got cancelled.

So, what happened here? Nothing happened. Empty was left. So, what is empty?

Enagrams. But I can't do this minus sign directly on string. So what I have to do now?

I have to use my brain. So what was the meaning of showing this cut-and-cut? Basically, we are checking the frequency here.

If A has come here once, then it should be here only once. It should not be 0 times or more than 1. It should be exactly one time. What I will do is, I will take out the count of all these. I have checked it. I have made an array.

1, 2, 3. 0, 1, 2, 3. Let's say 0th index is of A, 1st index is of B and so on. So now I have taken out all the frequencies, so I know that this is it. Now when I take out its frequency, what will I do?

I will make an array. 1, 2, 3. And here also I will check that if it is D, then I did 1 on D. If it is A, then I did 1 on A. If it is C, then I did 1 on C. If it is B, then I did 1 on B.

Now when you go, you can compare it again. See this, its frequency is 1, its frequency is 1, its This one and this one are exactly same. So what happened to you?

Anagrams are done. Okay, but here you saved the sorting. How much time did it take you to create this array?

O of n. How much time did it take you to create this array? O of n. Now you will say that this can also be O of m. Both of them are not the same length.

Brother, we have already checked. If the length is not the same, then we are not performing anything. Then we should not check anything.

Okay, so what happened here? That you created this with this traversal. you created this from the travel set.

How much size did you use? 2 of n. You made two arrays, right? And then you used extra O and M.

Apart from this calculation, to compare them, whether they are same or not. The approach is good, but we can optimize it a little more. You must be seeing that it is exactly same. So why should I use two arrays?

I could have used a single array. So what I will do is, let's remove it again. So, how will you get the positioning of A? If you know the question, then there will be lower case characters.

That means from small a to small z. Nothing else will happen. I can map it very nicely. I have taken capital here. So if I say its ASCII value is 65, then what is its value?

66 and so on. If I subtract 65 from all of them, the same concept that we used in Radix sort, Counting sort and all algorithms, the same concept we used in pigeonhole, I said, value minus 1, I just want to find the frequency. So I will write the value of A minus 65, so its ASCII is 65. 65 minus 65 will give you 0. So you will get to know that it is the 0th index.

Similarly, in case of D, 68 minus 65 is 3. So it is the 3rd index. Like this. So you understood this mapping.

Now in the question, because I have lower case, I will have to subtract from 97. Because 97 has the same value. Small a to small z. So it starts from 97. 97, 98 and so on. So, I will iterate from here.

First, I got A. A has 97. 97 minus 97 is 0. Whatever is on the 0th index, go there and add 1. Then B. B has 98. 98 minus 97 is 1. Go to the first index and add 1. Similarly, C has 99. So, 99 minus 97 is 2. Second index plus 1. D has 3. And the same error will be of any size. All the alphabets you have.

In English, there are 26 alphabets. Okay? Now I have to traverse the second string. Till now I have used O time plus O space.

I haven't done anything else. Now let's see, D is given. D's S will be the third index. Minus it and you can see that 100 minus 97 is the third index. Now what I will do here is, I will do minus 1. So when I saw these values, I did plus 1. When I saw these values, I did minus 1. So if these two are anagrams, So the final result will be minus 1, minus 1, minus 1 which means it will be completely zero.

Okay? So you have O and O . Okay? This is to traverse the first string, this is to traverse the second string, and finally to traverse this array.

To check if I am getting any non-zero. And what was O ? This is your space. Okay? You used extra space.

So this method is comparatively better. Okay? You are going 3N times 4. but if you haven't used sorting then you can't find n square logic or n log n logic so let's quickly code it first check length a.length length function is not equal to b.length if both of them are not same then immediately return false don't think anything else, just write false next you have to make frequency array int frequency array it will be of size new and how many characters are there?

26 alphabets and now what I have to do is I can say that I can calculate both of them in one loop so index a is equal to 0 and index b is equal to 0 so the 2n that I told you to calculate a and b you can do that in single loop and make n from 2n like both of them have same length so there will be no problem what we can say is if while index a is less than len let's find the length so int len is equal to a dot leng th why i did this? because both a and b have same length now if it is less than len and index b is less than len in this case what i have to do in first string i.e in a string the corresponding s of the character has to be found and its frequency is plus 1 what does it mean? what is the character? charA I can write it a.charAt index a now let's do it copy then paste this will be your index b this is your bth character and I can write it charB now let's find the frequency index I can write it index of a, this will be your charA-97 similarly we will remove b, so first a's copy paste this will be your charB, this will be your freakIndexB ok, now very simple, go to the frequency array go to freakIndexA and write here plus equal 1 and here you have to write and paste what will it do?

but which one? frequency b means if a is a character then it is plus 1 and b is minus 1 now if we cancel out plus minus then we will know that it should not be non zero frequency array should be zero only neither negative nor positive if we get even one non zero then it means these are not anagrams but first of all because we have to finish this loop so I will say index a++ we have seen one character so we will read further index b++ because this is your while loop not for loop If you want, you can do it. we can use for loop to avoid this now for int i is equal to 0 i is less than 26 characters and then i plus plus 1 frequency is non zero i can say frequency of i is not equal to 0 non zero 1 is also available so return false send it back from here otherwise return true ok So yeah, that's it. Let's run it quickly. I hope it should run.

I hope I haven't made any mistake. It's done. Let's submit it and see. So it's done.

This was today's video. We saved the time complexity. We used n instead of 2n.

But you know that because their powers are the same, 2n and n will be considered linear. It means that it will be considered big of n even after solving it. So there is not much optimization theoretically.

But yeah, there is practically optimization. So this was today's video, I hope you understood today's video and liked today's question. Now see you in the next video, till then bye bye and take care.

Transcript for:Anagram Concepts and Algorithms

Transcript for:
Anagram Concepts and Algorithms