Transcript for:
Optimistic vs. Pessimistic Concurrency Control

i want to spend few moments to discuss the difference between optimistic concurrency control and pessimistic currency concurrency control now you might say i'm saying didn't you discuss this topic yes i did but every time i come across a database engine i learn a thing or two and and i add to my bag of knowledge title base that that makes me want to talk more about it and kind of flush things out and here's another thing by with running a youtube channel when you get comments some comments are so valuable they point how dumb you are right and he says oh hussein you you you forgot that mongodb version 3.2 two years ago they added this particular feature and they are no longer going to do this so it kind of invalidates a lot of how you you were thinking in that video so you need to make another video to kind of add that into consideration and see how your thinking changes so optimistic concurrency control and pessimistic concurrency control is so critical it's it's basically the atomic level of how you pick which path decides basically what database you're gonna run let's discuss this guys before we start if you love these kind of content databases stuff make sure to check out my introduction to database engineering udemy course over 14 hours or worth of exclusive or content this that you won't find anywhere in the world well you're going to find it in your enemy but yeah you get my point yes good stuff we have great questions uh i try to as much as possible to answer most of these great questions so there's a catalog of great q and a happening there already uh so great lectures over 100 lectures i believe more than that so yeah check it out uh the the link will be in the pinned comment below thank you so much welcome to the back in engineering show with your host josiah nasserodes and today i'm going to talk about optimistic versus concurrence pessimistic concurrency control if you don't want to support concurrency control at all you want to build a single user database that nobody else is going to just one single user is going to slam and edit this stuff right there's no multi-users there is no ability to uh to do transactions and and there's no ability to do isolation you know you're not you're not trying to isolate anyone from your reads or rights for that matter so if you do that if you don't do that you don't have to worry about anything you can click off this video but if you do which most databases do postgres mysql mongodb most databases need to support multi-user and don't think of multi-user as web users we're talking about multiple transactions that doesn't have to come from the same user let's say you say there are a thousand viewers those are technically multi-user but to the database they might come from the same database instance user you're not going to create a different database user for every single web interface right you don't do that so but but it's the idea of concurrency is that is the key here and how do you forget about all this mambo jumbo that we're talking about concurrency controlling all this stuff how would you stop and think about it forget about all the papers i've been writing about database systems and take this question and try to answer it yourself how would you prevent someone from changing a value that you just change and you want to persist how do you do that so you start a transaction and you and you change some value how do you then prevent someone else from changing that value because that's dangerous right unless that value is committed and the other transaction starts it should always see that latest change and then have the idea of changing it if you allow this update to happen then there is this thing as called lost update you did you made a change but you turn around and it's lost it there's a lot of misconception about this because a lost update only happens in the same transaction you wrote something right you didn't commit yet and you attempt to read it and you it's not there that is dangerous because basically there it causes corruption this is not lost update that i'm gonna describe you start a transaction you change a field value you commit and then you read it and it's not there that is not a lost transaction why because after you commit someone else could have started a transaction change the field and then commit it and then you read that committed value that that someone that transaction that changed so that's technically not a lost update so measuring lost update is very hard but effectively is it is almost like you're you're just reading this transaction history right so how would you build a system that prevents these kind of things a classic example is to use something they call called locking hey i'm updating this row i'm gonna lock it why because i don't want anyone to change that value as long as my transaction is running i want to go my own thing i'm doing update to other fields to other columns to other tables but but that value shouldn't really change as long as you touched it you changed it you don't want it to change during your transaction that's the definition of lost update right how do you do that if you write that value you can say lock that you can tell the database to lock it and by definition this is a global concept that tells any other transaction that says hey if another times i should attempt to write to this law right to this row that particular field we're talking about rows not columns here right there's no a constant of column based uh lock although it's fun but there isn't there there's only raw logs for simplicity you can build something like that but it's so expensive so i lock the roll so other person want to update another field of the same room they can't they and you can define what that means can't what does that mean you can fail or you can make them wait and that's what locks do what this model is called pessimistic because you are so pessimistic that you don't trust anyone you just say i'm gonna lock this thing and i'm just gonna prevent anyone from changing this thing and i don't care if anyone didn't because someone you might you might just go through the whole transaction and nobody actually tries to touch your row right but you take that you're pessimistic uh i think uh the example i give in in my other video that i made was uh it's like taking your umbrella every time you go out regardless that's pessimistic right because you think it's gonna rain don't get me wrong i love rain but if you don't like rain and you think it's gonna rain you always gonna take the umbrella with you i'm talking when you leave the car not in the car of course you can have it in the car you can throw it there but even if you go to the grocery store you will go for a walk you take the umbrella with you all the time that's pessimistic concurrency control right if it rained you have it rare right you might say what's wrong with this that seems like this is the best approach ever always take the umbrella with you the umbrella is heavy the umbrella cost strength right my skinny arm cannot handle the umbrella locks are the same way locks are expensive lux needs management you need to manage locks that are free you think locks are free they are not nothing in this world is free you gotta understand that everything is a program and everything needs to be coded and locks need to be coded and they are so goddamn expensive they are so god dang hard to get right and now you can get to this like oh there's the idea of raw level locks is an idea of table locks there is an idea of page logs right and then you can define what all these mean a table look that means hey don't don't allow anyone to touch this or table row lock that means don't allow anyone to update this now i'm going to ask you this question how would you implement raw level logs it could be another table right some some simple example keep in memory a running tr uh and running ledger of all the rows that have been locked some of you might say that's dumb why because what if a transaction updates a million row your memory is gonna need to manage a million raw oh yeah oh yeah well you have to have a millionaire our memory is memory free no memory is not free my friends memory is not free so lock management is expansive that's french by the way for expensive if you didn't get it lock management is expensive especially raw level locks that's why most databases try not to implement raw level logs and we can get into this role management and dive deep into how each database manage locks and that's how you pick a database by the way guys it's not because what is faster once it really depends on on you and what do you need i i still didn't come to the other council this is we're talking still pessimistic we're pessimistic concurrency control so lock management is short effect difficult right and then so so some some databases like sql server for example they manage locks in memory at the moment you put roblox in memory you're gonna blow up your memory very quickly so uh you can so sql server have this idea of something called lock escalation where it says okay i'm about to lock seven 7000 rows you know what if you're touching this many rows i don't trust you i'm gonna save memory and i'm gonna lock i'm gonna put one lock on the entire table yikes yes you just lock the entire table right which is which is sometimes it is memory efficient but you cause more weights on other transactions and if those transactions are loading running that up little rows you are doomed my friends you are doomed so secure server you can tune this that's where database tuning comes in dark horse coffee roasters are one of the best come to san diego and check them out not sponsored by the way other databases which is my favorite postgres manage rows row locks in disk and you might say that that's just dumb because that means i have to write to desk to lock the table well if you think about it if you write to a row aren't you writing to desk anyway how about we just lock it there okay so they are they are smart man i don't have time to explain the details of that i don't even get the level of details there are these inline hint bits they they manage all this stuff and i would like to talk to one someone from the postgres maintenance to kind of pick their brains on how they build things but but oh this is what interests me i love this stuff i love to talk about these different decisions that engineers make based on like if you go this way you go this way if you go this way then you go this way right you can you can build completely that's why database call completely different path based in a simple decision that they make a branch decision so postcards don't put the lower extra memory but they they do it on this so that means they have more io sql server on the other hand uh put it on memory but yeah you have lock escalation and all that stuff right my sequel i don't remember why sequel have uh i believe i believe it might be a memory as well i might be wrong there but they use a different approach i forgot but yeah every database uses different things so let's stop talking about pessimistic concurrency control and let's go to the lock free optimistic concurrency control we are optimistic people we are open-minded we like life and we are pro-humanity i don't know what i'm saying we are so optimistic that we say sir you can update this field you can update this raw and we're not going to unlock it why because chances that some other transactions are gonna updated are really low let's be let's see let's let's let's be optimistic a little bit don't take that damn umbrella with you if it rain well you can just go to the store and buy one or just run okay okay you're gonna get a little bit wet who cares so that's optimistic con currency control so what does that mean that means if you update a row just do your own thing and then commit but during the commit we're going to do a slight teeny beanie check at the commit level we're gonna say if the row change underneath you we're gonna fail your transaction some of you might cringe at that i understand but hey just retry it you gotta get wet go change your clothes son it's okay who cares it's just a little water it's a little bit of water british is a lip little bit of water ali so it doesn't little agua doesn't hurt okay so it's okay so if it fails just roll back fail give the user a little bit of a beautiful error that says okay just chill it's not a big deal and mongodb is is this model uses this thing that's called wire tiger and that i'm i'm researching more about wire target is exclusively it looks like it's exclusively uh log structure merge tree and designed around the idea of having uh it uses multi-core uses tries to suck the life out of your c multiple core cpu try as much as possible try to utilize them to the to the maximum efficiency right and that's what they claim i'm just reading right and the the most important part is they use optimistic concurrency control they say well i don't want to deal with the headache of managing my locks i don't want to take my umbrella with me everywhere i go umbrellas are heavy stop taking your umbrellas with you and instead just be optimistic my friend be optimistic why are you like this why it's okay if it fails just retry your transaction so now you put some more burden on the client side to retry the transactions and and most developers might be familiar with this model says okay yeah i know i get these videos all the time just retry which doesn't sound like a big deal yeah a little bit of code at your end and to be honest i i don't i don't have a favorite i just think every use case really depends on on how which path to take you want to take the optimistic path or you want to take the pessimistic path right usually usually databases don't call themselves pessimistic you you're not going to go to postcards and you're going to read their documents oh we're using pessimistic country currency control no no nobody says that because that's kind of a negative way right nobody writes that but you're going to see mongodb talk about all this optimistic concurrency because optimistic is way better than pessimistic right but technically speaking postgres mysql sql server oracle even they are pessimistic this is we can deal with locks son and we're going to lock this stuff all right obviously i might have missed some of the things leave them in the comments so below this below this video youtube video comment and i'm gonna see on the next one what do you think is best in your personal opinion let's just have a discussion let's have a discussion what what do you think has been i don't think there's a better way to be honest so it's like i i kind of prefer pessimistic because it's like hey i'm a pessimistic guy it's like lock my stuff don't touch it don't touch my stuff don't touch my rows as long but but also i think about this management of the locks right that's why i prefer postgres right yeah you can see the right amplification results of postgres because of all this managing the logs and the rights and all that stuff becomes a little a little bit too much sometimes what if it's a cost you have to pay compared to memory you might say hey i'm going to use sql server i'm going to put all my lux on memory i'm going to have 2 terabytes worth of ram i don't know i don't even know if that's the thing can you put it to terror by ram is there a motherboard that supports it i am outdated when it comes to hardware i don't know anything i'm gonna watch some lioness ltt to update my knowledge tyler pez all right guys i'm gonna see on the next one you guys awesome goodbye