CS224 Advanced Algorithms Key Takeaways

So this is CS224 Advanced Algorithms. My name is Jelani Nelson. And we have a TF who's in the back.

It's Jeffrey with his hand up. If you want to contact us... You should email cs224-f14-staff at cs.harvard.edu. Also, there's a yellow sheet of paper that's going around.

You should fill it out. Let's see, what else should I say? And there's a course website. So I won't bother writing the URL of the course website on the board.

Just Google the course or Google my name, and it's linked to my website. One thing I will say about the course website is we have a mailing list. So please go to the website and sign up. Put yourself on the mailing list. Before I get started with things, I guess I'll tell you some logistical things about the course.

Then I'll tell you what the goals of the course are. And then I'll start on something. OK. So logistics. Oh, I completely did it backwards.

OK. So there are three components to this course in terms of grading. One is scribing and this is 10% of your grade.

Basically you just do it and you get the 10%. So there's no textbook for this course. Students will take turns taking notes on what I say in lecture and the course is recorded so you can go back over the lecture and see anything you missed and then basically write up some lecture notes in LaTeX describing the lecture and there's a template on the website a LaTeX template to use. Two is p-sets that's 60% of your grade okay and the third one is a final project which is the remaining 30%. And that's just written.

So there are details on the website about the final project. You do it. The last day of reading period, you submit your project. And I agreed it.

Okay, so let's see. So there's a proposal for your final project. It's on the website, but I think it's October 30th. And then project due last day of reading period. Okay.

So yeah, I'll read through the proposals and make sure I like the idea of the project and give you feedback regarding it, and then you spend the last six weeks working on the project. Roughly six weeks. Let's see what else.

P-sets. All P-sets should be law-teched. And you submit them by email.

And also, PSETs have page limits, meaning your PSETs should not be longer than the specified page limit. just typing mindlessly if they don't know the answer to a problem. Actually, brief solutions are appreciated.

There's one part of the course that... So... We'll see how many people actually stay in this class.

Right now it looks like a lot, but it's shopping period. Depending on the final size of the class, or actually this will most likely happen, students will also take turns being graders. You probably only have to do it once during the semester.

But a team of maybe three to four or some number of students together with the TF will meet once a week or once per pset and grade that pset. OK. So that's a required part of the class. class. Students have to be graders at least once.

Okay, and these things are first come, first serve. ...at least once, possibly twice if the class gets very small. These things are first-come, first-served, so if there's a date that you really want, you know you'll be available to scribe, for example, then sign up right away before it gets taken. Also, scribe notes are due the following day. day.

So you have a little more than 24 hours to describe your lecture notes. They're due 9 p.m. on the following day. Is there something I wanted to say about that?

Yeah, I mean... I know it's a short amount of time, so just do the best job you can, and then I might make a pass over them myself and make some edits. Okay. Good. So, I think that's all I want to say about logistics.

Any questions about that? Yeah? male audience member 2 in I'll put them in a, okay, I'll, maybe I'll, maybe talk to me afterward, because I, yeah, I don't, anything else?

Yeah. So only a subset of students will be scribing each lecture? One, like, you know, one person will be the scribe for that lecture. Actually, yeah, I need a scribe for today. So who's, who thinks that they're going to be in this class for sure and is willing to scribe today's lecture?

Okay. Good. Any other questions?

Okay, so Good so this is advanced algorithms Who's here is who has taken the CS 124 or some form of algorithms course before this? Okay, a lot most people I guess so I guess the main difference between CS124 and CS224, well, first of all, I guess algorithms is very broad. So even though 124 was a whole semester of algorithms, we didn't see all there is to know about algorithms.

algorithms, even in terms of topics. So in 2.24, we'll see some models for analyzing efficiency or some measures of efficiency or models of algorithms that we didn't see in 1.24. Also, I guess it will be more theory focused.

There won't be any programming assignments, although for the final project, you can do an implementation project. That's described on the website. But the PSETs will be purely just written PSETs, no programming. What else do I want to say? So I guess the goals of this course, I guess they're what you would expect, increased ability to be.

to analyze and create algorithms. We're going to see lots of different techniques in this class for analyzing algorithms, many of which were not in 1.24. And also modeling, looking at different models. We're seeing inspiration from models, looking at different models within which to analyze algorithms.

So in 1.24, we usually just looked at, say, running time. And running time, we didn't, I guess, really ever define it. It was just the number of steps of the algorithm.

And we also looked at memory, minimizing the amount of memory used by the algorithm. But here there will be other parameters that we'll look at as well. Okay.

So I think I'm just going to get started. I used the wrong board first. So speaking of models, so who here has seen sorting?

Okay, good. That's what I expected. Who here knows that you can't sort n numbers faster than n log n? Okay, so that's actually, I lied to you, you can sort n numbers faster than n log n.

Okay, so today and the next lecture, we're going to see some... We're not going to do the full sorting algorithm, but we're going to look at a related problem, a data structural problem, which is a predecessor. We'll look at the static predecessor problem.

Let's say static predecessor problem. So this is a data structural problem. The data structure represents a set S of items, X1 up to Xn. And we support...

One kind of query, which is predecessor of X. Predecessor of X is the max element, which is less than X. Or let me say predecessor of Z is the maximum of X and S.

And let me write it like this. The maximum x and s such that x is less than z. We want low space and fast query. And this word, so predecessor, you see why that's there. Static, in data structure speak, static just means that the set S of items doesn't change.

If you had said a dynamic data structural problem, like dynamic predecessor, you'd also support the insertion and deletion of items. We'll also look at, so static versus dynamic. Static.

no insertions, dynamic insertions. Okay. So what's one way someone knows how to solve a dynamic or static predecessor quickly? Using, say, linear space. And a binary search tree.

Oh, store the numbers and do binary search. Yeah, so an example solution that works. Store numbers sorted and then do binary search.

Is this static or dynamic? Okay, so this is static, and what's the query time? Log in.

Okay, and what if you wanted log in dynamic query time? Yeah, so log n, so I'll log n dynamic query using a balanced BST, say like a red-black tree or something. And the second solution also supports log n for updates, for insertions and deletions.

OK. So if you use, I mean, this is not a, oh. Should I, no.

OK. This is not usually the way people teach sorting algorithms at first, but I claim that if you have such a solution to dynamic predecessor, then you can get, and let's say all the numbers are distinct, you can get a fast sorting algorithm. Right, so what do you do to get a fast sorting algorithm using dynamic predecessor?

You first go through in linear time your input and find the maximum. And then you go through the input again and just insert them all into... a predecessor data structure.

Okay, and then now you output the max, you compute its predecessor, now you have the second biggest, compute its predecessor, et cetera, and you'll retrieve all the items in sorted order. So you've just sorted the elements using a dynamic predecessor data structure. data structure. And what I'm going to show you today and as well as Thursday is a faster dynamic predecessor, faster than log n. So actually, I won't exactly show you that.

Today I'll show you two data structures, one of which is dynamic, and the other I'll show you the static version. It can be made dynamic, but it's more complicated. I'll just show you the basic ideas to get the static.

a static data structure. But I promise I'll give you a reference that shows you it can be made dynamic. And if you use those data structures, you can beat n log n for sorting. But some people raised their hand when they said that they knew that sorting couldn't be done faster than n log n.

So for the people who said they know sorting can't be done faster than n log n, what assumption are you making about the sorting algorithm? Comparison, it's comparison sort. It's comparison-based sorting, right?

Which means you have n items, and in each step, your algorithm is allowed to choose two items and compare them, and based on the results of the comparison, it can make further comparisons. Okay. But that's not how real computers work. So when you code in C, first of all, all the input numbers are integers, let's say, or floats.

They're something that fit in some, say, 32 or 64-bit word. And you can do bitwise XOR and bit shifting and all kinds of other operations, which are not just comparison and multiplication. So that inspires the WordRAM model. OK, so items are integers in the range from 0, 1, up to 2 to the w minus 1. OK, and w is the word size.

And the universe size U is 2 to the W, this 2 to the W minus 1, is 2 to the W minus 1. And we also assume that pointers fit in a word. So for the last assumption, if you have a data structure that's storing n items, presumably your data structure is using at least n space to even remember what the items were. So we know that space is at least n.

Okay, and if a pointer fits in a word, well a pointer is what an address into our space so W Should be at least log of the space Which we just said is at least log n Okay, so we're always going to assume that our word size W is at least log n Okay And what I'm going to show you today and on Thursday are two different predecessor data structures that get different bounds. One is going to be better when w is small, like closer to log n. One is going to be better when w is very large. So two data structures.

So one. Is the, it's called the Van, well that's lowercase, Van M. de Bois trees.

This is from what year? Somewhere in some time in the 70s. So I'll frequently put the conference or journal name in the year. So this is for the scribes.

And this is due to Van Emde Boas. And if you Google it, you'll find a reference. So please put a reference in the scribe notes.

And what this gets is update. This is dynamic, so it supports updates. Update and query are both going to be log w time.

Okay, and the second thing that I'm going to cover and it's And we're going to show also that well, let me let me say something else The unfortunate thing though is going to be that the space The space is going to be you And I like linear space independent of the universe size. Imagine if you have a 64-bit machine, u is 2 to the 64. So I don't want to use 2 to the 64 space ever. And we'll see that this can be made theta n with randomization.

And we'll also see a related data structure called YFAST trees tries, which get the same bounds. And this is due to Willard and IPL83. So originally Van Emde Boas in his paper didn't get linear space.

But the move from U space to linear space is going to turn out to be pretty simple. In the second data structure we're going to see. So this is the one that can be made dynamic, but I'm only going to present the static version in class, otherwise it gets too complicated. These are fusion trees. And this is due to Fredman and Willard, I believe in J.C.S.S. 93. Okay, and these support query in time log base w of n, and it's also linear space.

So already this beats binary search trees. If w is at least log n, log, remember log base w of n is the same thing as log n over log w. So this is never going to be more than log n over log log n.

But of course we could choose based if we know W, I mean we know the machine that we're coding on, if we know W we can choose the better of fusion trees and let's say van MWOS trees or YFAST trees. So we can that implies that we can achieve the min of log W and log base W of n. Right?

And the min of this is, if we want to maximize this expression, we'll do it when these two things are equal, which means log w equals log n over log w, which means log n is the square of log w. So this will be always at most square root. Log in. Okay, good.

And I mentioned that this can be made dynamic. So in particular, that means you can sort in time n times the square root of log n. Things that I won't cover in this class, this implies with dynamic fusion trees, O of n root log n sorting.

Okay. Questions? So there's going to be an...

Yeah, so there's an issue which I haven't discussed, which is the preprocessing time to actually create the data structure. So in the dynamic case, when you start with an empty data structure, that doesn't come into play. But with the static case, we're going to spend polynomial time to actually create this fusion tree, and that's going to be bad for sorting.

Any other questions? Okay, so you could ask, you know, is n root log n the best sorting algorithm in this model? And you can actually get faster sorting.

So you can get O of n log log n deterministic. This is due to Han in stock 2002. You can also get O of n square root log log n, expected time, randomized. This is due to Han and Thorup in Fox of 2002. which is about five months later. And it's an open question whether or not you can get a linear time sorting algorithm in this model. So it's possible.

There's nothing saying that you can't do it. OK. And let me go back to the WordRAM model before I actually present the van Emde Boas data structure. So I mentioned we can do more than just compare.

So what can we do? So in WordRAM, assume that given x, y fitting in a word, we can do basically all the things that you can do in, say, C. So you can do integer arithmetic, so plus, minus, I mean divide times minus, and this is integer division, so it rounds down. You can also do bitwise negation, XOR, OR, and AND.

You can't write AND properly. And you can also do bit shifting with some fixed constant, or with each other. Yeah..

Yeah. So we'll assume that for multiplication, it fits in two words. So the upper bits will be in the second word.

Yeah. OK. Any other questions?

I think it's also accurate to say, I mean, we don't need to, I think, make that assumption. There could be integer overflow, in which case we'll get the overflow of the correct answer, but you can simulate multiplying bigger numbers using... in the word RAM anyway, so maybe I'll leave that as an exercise. You might need to use a couple words yourself when you do the arithmetic. Okay, so we can do these in constant time.

So just out of curiosity, who's seen van Emde Boas trees? So one... Who's seen Fusion Trees? Okay, so I'm... Okay, good.

Just making sure I'm not teaching you something you've seen. Okay, so we're doing Van Emde Boas. Okay, so the basic idea is, well, I guess you guessed that we're going to do something with fudging with bits because we can't just do comparisons.

The basic idea is some kind of divide and conquer. Okay, so... Um...

So VEB tree will be defined recursively. So what a VEB tree will look like. And it'll be parametrized by the universe size. So let's say this is on a universe, say, of size u.

It will look like, if I open up what it looks like inside of that data structure, It'll have square root u VEB data structures on a universe of size square root u. And there will also be a top one of size on universes of size square root u. And separately, we'll also store one extra element, which is the minimum element in the data structure. And I'm going to say more. So let's say you're using some object-oriented programming language, and you wanted to declare the fields that your VEB data structure has.

So the fields of VEB. Let's say on a size U universe, you would have an array of size root U, a root U size array. Let's call this thing V. V is our VEB data structure.

We'd have V dot cluster. 0 up until v dot cluster square root u minus 1. And this is a veb square root u data structure. What I mean is the elements in here are numbers between 0 and square root u minus 1. We'll also store the max.

Maybe I'll say that too. Let's say we also store the max. So I'm going to write that down here.

We also have a v.summary is a veb square root u instance as well. And v.min, v.max. Are integers in the range from 0 up to u minus 1?

OK. Any questions? I haven't actually told you how you insert into this data. This will be a dynamic data structure.

So I haven't told you how you query, and I haven't told you how you insert. So let's see that. So say we have an item that we want to have living in this data structure. So x is some integer.

So we can write x in binary. And we can divide x into the leftmost half of the bits and the rightmost half of the bits. We call this c, and we call this i, so let's write x as ci. And notice that these numbers c and i are in the range from 0 up to root u minus 1. Right?

OK. So we're basically writing x in base root u. And the idea behind van Emde Boas trees is that, We will store, you know, if x lives in the data structure, then we will store the number i in the c-th cluster, okay, in this picture. Okay.

So now tell me. Given what I just said, how would you say do a query for the predecessor of X? And hopefully people agree that you can extract C and I each in constant time just by bitwise anding and shifting.

Okay, so how would you search for the predecessor of X in this recursive data structure as I've defined it? If I didn't tell you what the summary does, let me tell you what the summary does too. So I told you that I'll insert i into the seeth cluster.

Also, if the C-th cluster happened to be empty when I did the insertion, I'll also insert C itself into the summary. So the summary keeps track of which clusters are non-empty. That's the point of the summary. Okay, so now how would you do a predecessor? Yeah?

You would have to do a C-th cluster. Yeah. extract the free element, find the look in that cluster, like do the request......

it's the lowest element... OK, yeah, so what you said works. There's one recursive call you could save, which is we store the min explicitly.

Ah, yes, OK. So let me just repeat what you said, but using that fact. Are you here for advanced algorithms? Yeah.

OK, well. Yeah. Don't worry about that. OK, so here is the idea. So I can extract c and i each in constant time using shifts and masking with bitwise and.

And then what I do is I look in the c-th cluster. And I look at the minimum element in the c-th cluster. If I'm bigger than it, then I know my predecessor lives in the same cluster as me, and I can just recursively do a predecessor in that cluster, a predecessor on i in that cluster.

If there is no minimum element, if that cluster happened to be imp... or maybe I'm bigger than the min, bigger than or equal to the min, then I know my predecessor is not in my cluster. He's in the largest cluster before me that's not empty.

And how do I find that? I find that by doing a predecessor on C in the summary. And then I return the max inside of that cluster. Okay?

Okay, good. I don't need to recurse on that cluster. I just return the max. So let me write that down. So...

So predecessor takes as input V as well as this X which I write as CI. And I say the first if is if x is bigger than v dot max, so if x is bigger than everything in my data structure, I just return v dot max. Okay.

Otherwise... I look at the C-th cluster of V and I check its min and compare its min to me. I'll say V dot cluster C dot min is less than X. Then I'll just recurse.

okay otherwise otherwise what I have to look in the summary for the predecessor cluster so C prime will be my predecessor cluster And then I return the maximum element in that cluster. OK. Okay, so the next thing is the insertion algorithm.

Okay, so the first thing is... We're going to see why in a moment, but I'm going to treat the minimum element as being special. So the minimum element will be stored in this minimum field.

Where are my fields? The minimum element will be stored in the minimum field, but I won't also store it in its appropriate cluster. Okay? So it could be that the v.min is some non-empty value, and everything else in the data structure is empty. So if v is empty, I'm going to say v.min gets assigned to x, and then I'll return.

Sorry if my pseudocode is changing between C and Python. You know what I mean, I think. Okay. Otherwise, what do I do?

Based on what I've told you. and also with this constraint that the min only lives in v.min i'll say one thing if the min only lives out in v.min i'll do if x is less than v.min then i'll swap x with v.min before I continue. Okay, so now I know that the x I'm actually inserting is not the minimum element. Think of this v.min field as being like a buffer.

So the minimum always lives in that buffer and it's not actually recursively stored. Okay. So first I make sure that the thing I'm inserting recursively into the structure is not the minimum.

Okay. And then where do I have to put it? OK. And let's pretend that when I do this swap, the c and i are for the new x. I don't want to write more code.

So what I do is, Oh, and there's also this issue of before I recursively insert it into the cluster, I should check the summary to see if that cluster was empty. If so, I need to insert it into there as well. So how can I check if the cluster is empty? I can just check its minimum element and check that it's empty. So if v.cluster...

C dot min is a null value. Then I need to insert C into the summary. Then V dot summary, I'll insert into V dot summary the value C.

And then I'll insert i into cluster C. Okay. And you can think about what you would do for deletion. It's not really that much different conceptually. Okay so let's just analyze the running time of these procedures.

So predecessor... If uh... there's only ever I guess...

One recursive call, right? In any of the if cases, you'll have at most one recursive call. So we have the recurrence.

So for predecessor time, we have the recurrence that T of U, we start off with the universe of size U, is equal to T of U over 2 plus O of 1, T of root U, sorry. And if you remember your occurrences, this implies that t of u is of log log u, which is equal to log w. Remember, u is basically 2 to the w. u is 2 to the w.

OK, how about insertion? Yeah. So accessing any element in that? Oh, yeah.

So yeah. Constant time and not logic? That's right.

In constant time, you You can follow a pointer and read the value in that memory address. So how about insertion time? It looks a little worrisome, right?

Because this if doesn't necessarily return after the if. So if v.cluster.min is empty, you insert it into the summary. And then you again do another insertion. So it looks like, it looks like, t of u is at most 2 times t of root u plus o of 1, right?

Do people know... so another way of thinking about this is, maybe this will make it more obvious, is that T is at most 2 times T over 2 plus O, right? So if we're saying that W is getting square-rooted, that basically means this...

if we're saying you get square-rooted, it's like saying W got cut in two. So what does this resolve to? It should, yeah, w. Okay. Yeah.

So this solves the w. Okay. So that's not great. We're trying to get log w here. But I claim that this is overly pessimistic.

Why? Yeah.. Yeah, exactly.

So what said is, if this if actually happens, then the second if will be shallow and not recurse further, because the second if will be in this case and will immediately return. OK. So actually, this 2 really can be written as a 1. You can think about this one as, in that case, you can think about just moving this line here and then this being another else.

OK. And this implies that t of u is also o of log log u. OK. So that's the basic van Emde Boas tree data structure.

And that's also why we stored the min separately, to make the insertion cost log log u as opposed to w, which would be log u. If you treat the min as the same as any other object and store it recursively in the data structure, then there will be times when you have to recursively insert into the summary and recursively insert into the cluster, and that will make things W. Oh, yeah, I keep forgetting about these maxes.

Yeah. No, you don't have to. Yeah. But yes, that's a good point. And where's that paper where people are signing up?

Has it been passing around? Everyone saw the paper? Raise your hand if you didn't write your name down on that piece of paper.

So toward the front, I guess. Okay, so how about the space? What's the space of the van Emde Boas data structure? What's the recurrence, anyway?

S of you is equal to square root of u plus 1 for the summary, s of square root of u, plus a constant to store, say, the min. Okay. So this, I'm not going to solve it here.

This implies that s of u is theta of u. So this is U space, which is not great. So we're going to get instead linear space. So what's something that intuitively seems a little bit silly about this space requirement? You have a lot of empty clusters, right?

I mean... Yeah, that's it. OK, so what could you imagine doing instead? So we will see who here has seen hashing?

OK. So I gave you a hint. So you know that there's something called hashing. What would you do with this hashing in order to improve the space?

Yeah. So we're going to have a hash table. What does this hash table store? Sorry? Like, what are the keys in this?

So in hash tables, there are keys and values. OK, so what are the keys that will live in this hash table, and what will the values be? So we'll have a hash table.

Keys will be. Yes, the keys will be these cluster IDs. OK, so keys are cluster IDs C. And what will the value be? A bb. Yeah, a pointer to some.

So value is a pointer. to corresponding non-empty cluster. So for the empty clusters, they don't live here. And I claim that the space of now this scheme is linear.

It's theta of n. Why is it theta of n? How would you account for all the space that's being used in a way that makes it clear it's theta of n?

Yeah, each tree has a minimum element, right? So we can charge this pointer and value. This is like two words, right?

We can charge the cost of these two words of storing this cluster ID and pointer to the minimum element that's contained in that cluster. Okay? And now each minimum element, each element is stored as a minimum somewhere. Maybe at a leaf cluster in this recursion, but it's stored as a minimum somewhere.

And each minimum element is charged exactly once in this way. It's charged in the parent VEB tree that contains it. So charge the cost of storing C pointer to cluster C to the minimum element of cluster C. Each minimum is charged. So each item in this set, let's say each x and s actually, is charged exactly once.

Does that make sense? Questions about? Yeah. This assumes that...

Yeah, so... I guess I haven't covered hashing yet. We will see hashing later in the course. But it turns out that it's possible to... So first of all, let me say something about hashing.

Maybe this will answer... Tell me if it doesn't answer your question once I say it. So I think it's good to think in terms of problems, and then there are algorithms or data structures that solve those problems. So let's forget about hashing.

What's the problem that we want to solve? The problem we want to solve is what's called the dictionary problem. So short aside, so we have something called the dictionary problem. OK. And the goal of this problem is to store key value pairs.

I'm going to assume that the keys and the values each fit in a machine word. And the semantics are as follows. So query k. OK. What should that do?

It should return value associated with key k or null if k is not associated to anyone. And there's also insert, or you can think of as associate. So insert a key with a value which associates value v with key k. So you can look at both the static and dynamic versions of the dictionary problem. So in the static version, you're told all the key value pairs up front, and you never do future associations.

In the dynamic version, you can make updates to what keys are associated with, and you can also delete keys from the dictionary. And it turns out... I don't know if we'll see exactly this solution later in the course, but we'll see some solutions to the dictionary problem. Dynamic dictionary is possible with linear space.

Constant query time, worst case query time, worst case query, as well as constant expected insertion or updates, insertion, and also deletion. Let me write. So this is from, it's actually not that, it's fairly recent.

I need to look back at the reference, but this is from, by Ditzfeldinger and others. And, yeah, I mean, it's some form of hash table. Okay. If you know about universal hashing, that would get, it wouldn't get worst case constant query, it would get expected constant query.

It turns out you can get worst case. So the point is that whether it's expected or worst case, and actually you can even do better than expected constant time. You can do constant time with high probability.

But the point is really... what we need here is a dictionary. We need a dictionary that stores mapping.

The keys are the cluster IDs, and the values are these pointers to those non-empty clusters. And that's a solved problem, which we'll just take as a black box for now. Did that answer your question? Okay.

Any other questions? Yeah, I believe, well, I can even say, you can even make this with high probability. So I guess the question is, can this, I mean, the only thing you can hope to improve here is make it fully deterministic.

I don't know offhand whether that's been ruled out as a possibility. Okay, so I have about a little more than 15 minutes left. I mentioned that there's another data structure called a YFastTree or YFastTry, which also gets this bound.

I'll sort of just sketch the idea, which apparently is the... Which apparently I think is the original way van Emde Boas trees were made to support the bounds that I stated. And then much later other people came along and kind of reinterpreted the data structural idea. and came up with what I showed you. But the YFastTry, I mean, originally, Van Emde Boas data structures were described in a way that got USpace.

And it wasn't as clear as change this to a hash table. in the way that it was described to make it linear space. So I'll show you another way that you can get the same bound. What's one way that, pretend you didn't care about query time and you're willing to use USPACE. What's a very, very simple data structure for predecessor?

I want you to use exactly U bits of space. Yeah, a bit array. Okay, so have a bit array of size u, of length u. And the ith bit is 1 if i is in your set.

Otherwise, the ith bit is 0. So use a bit array. So another solution. We can use a bit array of length u.

So let's say u is 16 or something. So 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0. OK? So this corresponds to elements 0, 1, 2, 3, up to 15. And so what's the running time of predecessor now? It's you.

OK, that's terrible. So we're going to make one small tweak to this, which is we're going to build a perfect binary tree over all of these leaves. And each internal node is going to store the ore of its two children.

And we end up with a tree that looks like this. Okay. Now, suppose now someone gives me a zero. Suppose someone gives me a zero and asks for the predecessor. Okay.

Let me also say this. I'll do one more thing too. I'll store all the ones in a doubly-linked list.

So suppose someone asked me for the predecessor of a 1. What would I do? Yeah, go back one in the linked list, constant time. Suppose someone asked me for a predecessor of a 0. So for example, they asked me for the predecessor of this element. OK. What would I do?

OK, and then what? Go down. OK, so I go up the tree until I find a 1. And then I know that, so here, when I transitioned into a 1, I went up this way. So then I know that I am bigger than everything on the left-hand side.

So I should just find the maximum element in this subtree, which I can do by going down to 1's, for example. Okay, but what if someone asked me for the predecessor of this element? I would go up, and here's the first one.

So now, you know, I don't go and find the minimum element here. That's not my predecessor. But what is the minimum element here? It's my successor.

The minimum element here is my successor. And all the ones are stored in a doubly linked list. So I can just go back one, and that will be my predecessor. So I can just keep going up until I find a 1. And then. either I will find the predecessor or the successor depending on whether I went up that way or went back that way and I can get the predecessor okay now there's still one problem though which is what's the running time of this what's the height of this tree log of u.

So my running time, it seems like it would be log of u. Okay, so I'm trying to get log log u. So what does that suggest?

Binary search. Yeah, actually, I do want to do a binary search. So justify where am I doing a binary search and why am I doing a binary search.

OK. The lowest one, right. So the point is on any leaf to root path, the bits are monotone. Okay. Meaning that they're zero for a while, and then they're one after that.

So I can binary search and find the first one. And that will give me log log u. There's still a catch, though, which is, how do I binary search on following pointers?

Like, if I need to follow seven pointers, I need to actually follow them. How can I just access the guy who's seven above me? Right? You see what I mean?

Any ideas on that? Yeah. If we store it the right way, like point through? Yeah, so if it's stored the right way, so what's the right way? Yeah, so you could store this entire tree.

So I don't know if people remember binary heaps. You could store this entire tree as an array. This is index 0. That's index 1, and that's index 2. So in general, let me write this. Store tree as an array Okay, root is that index zero Node V has left child at 2v plus 1 and right child at 2v plus 2. So if I want to find the ancestor who's k above me, what do I need to do? Divide by what?

by 2 to the k, by integer division, right? So that's a bit shift. So I can find anyone who's k above me in constant time. Okay, so there's that. So this implies can find kth ancestor in constant time by doing bit shift to the right by k.

That's a bit shift. Another thing you could do. which uses more space, is to actually store the tree using pointers. But for each node, you don't just store its parent, but you store its 2 to the kth ancestor for every k. So it could also, for each node, store its 2 to the kth.

ancestor for each k going from 0 up to log log u because the tree has height log u. That would make the space of the data structure u times log log u or u times log w. Okay, but there's still the dominant part of the space is you.

Yeah. What do you mean? 0 to 16. Yeah. You mean the height of the tree is 16?

Yeah, height of the tree is 16. Okay. I'll either go to 8. Yeah. And then from 8, I can either go to 12 or 4. Right.

From 12, I'll either go to 10. I see. So we don't need all that. Is that accurate? Oh, I see.

You're saying I only need... Suppose the 8 one fails. So let's do a bigger power of 2, 1,024.

Suppose 512 fails, and now I have to go to 256. That fails, and I have to go to 128. Won't I need all those? Oh, I could just use the ancestor of that node. OK. I have to think about it.

Maybe what you're saying is accurate. How do you know that? So he's saying.

Yeah, both names. So I'll tell the exact number. The highest power of 2. Yeah.

Which divides your number is 2 to the L. Okay. The height of the tree is 2 to the L?

No, no, no. For every node. Yeah.

Take its number. Okay. And suppose the highest power of 2. Okay. Then you store 2 to the L by 2, then the thing that follows you by a distance of 2 to the L by 2, and the thing that precedes you by a distance of 2 to the L by 2. I see. So for example, if 4...

Okay, well, maybe what you're saying, I have to think about what you're saying, but anyway, I'm going to do better space right now because I'm still using U space. I'm still using more than U space, and I want to use linear space. So, here's, I mean, so the main trick to using, to getting linear space in the Vandam-DeBoas data structure was to use a hash table to store non-empty clusters.

So, I want to do something similar here. Okay, so any thoughts? Again, using hash tables. So this one maybe is a little trickier, so I'll just say it. To save space, only store the ones in a hash table.

So at each level of the tree, there are at most n 1's. So this hash table contains n times w items in it. So I'm using n times w space.

Or you can imagine having w different hash tables, one for each level of the tree, which stores all the ones that occur in that level. So for each level of tree, hash table. stores locations of ones. Okay, so as I'm doing my binary search, if I want to know whether something is a 1 or a 0, I just look it up in the hash table, and if it's a miss in the hash table, if it's not in my dictionary, then I know it's a 0. This implies space...

n times w and this is called an xfast tree. So if you look at the paper by, I mentioned that Y-fast tries are due to Willard. That was in 83. It's like a four-page paper, including the intro and bibliography.

And in the first page or two, he describes X-fast tries. And then he gives the Y-fast tries in the very next section, which eliminates that W. Okay, so...

Okay. And the trick to eliminating that W is a trick that is good to know. I guess I mentioned one goal in this course was knowing the right techniques to apply to data structures and algorithms. So this is a technique that gets used a lot. So from X fast to Y fast.

use what's called indirection. Okay, this is a... So the basic idea is...

I'll have my data structure... I'll have my XFASTtry. It takes N W space.

This is an XFASTtry... on N over W items. What do you know? It uses n space. But what's each item?

So these are my n over w items. Each item, if you open it up, it's actually like a super item that contains roughly w items. OK. So approximately, this contains the first w items, the next w items, et cetera. And one of them will be the representative item that actually lives in the x-fast try.

And the way that I'll store these items is I'll build a balanced binary search tree on them. OK. I have a first theta and a second.

Yes. Theta w, let's say. So I mean, there's a problem besides the space.

OK, so basically, there's one minute left. Let me just say this. I'll just sketch this. You've already seen a solution that gets this bound. So if you want to see the full details of the y-fast try, you can read about it on your own.

I just want to give you a flavor of what's going on here. So the space is nw. But actually, the time is also w instead of log w.

Why is the time w? The time is w. because if you think about this tree, if I insert a guy and change him to one, I need to like OR all the way up the tree. I need to change W things.

So the time is W and the space is off by a factor of W. And this idea basically fixes both issues. So in terms of the space, this is now definitely using linear space. In terms of the query time, the query time is still log log U, because I can search in an XFAST try in log log U time.

And then I reach the BST containing that super item. And to search in that BST takes log W time, which is again log log U. How about insertions? So the point is, these super items contain somewhere... Between, let's say, w over 2 and 2w items.

So they're going to contain theta w items. And the basic idea is going to be when I do an insertion, I'm not going to actually insert into the XFastTry. I'm going to find where it goes and insert it into the balanced BST. And only when this thing overflows, when it becomes size bigger than 2W, I'll split it into two super items.

and then insert those into the above one. But approximately, that only happens every W steps. So in an amortized sense, it only costs me a constant to do that. Well, it only costs me... Kind of I only have to do that once every W steps and then when I do it it costs me log of Log of W.

So I know I didn't go into the full details on that But that's I'm not gonna get too deep into why fast tries since they get the same exact bound as this I think simpler one But that's the basic idea Okay, so questions Classes basically done any final questions before next time we'll see fusion trees Oh. And actually, I do want to say one thing, which is, so I told you the bounds for fusion trees, and I told you the bounds for van Emde Boas trees. There's actually a paper that shows a matching lower bound, which shows that you can never do better than the min of van Emde Boas and fusion trees, or a slightly tweaked version of van Emde Boas.

So these are known to be optimal for linear space, for nearly linear space data structures. Okay. Okay. See you all.

Transcript for:CS224 Advanced Algorithms Key Takeaways

Transcript for:
CS224 Advanced Algorithms Key Takeaways