Transcript for:
Understanding Database Normalization for Beginners

hello and welcome to this beginner's guide to first second and third normal form when looking at database normalization so let's get started first normal form this one is nice and simple all rows must be unique and all cells must contain atomic values well what does that mean well it's really quite simple each row in the table must be unique and therefore there cannot be any duplicate rows in the table so you can't have two different rows where all of the data contained within those those rows is identical and we'll look give you an example in a minute and each cell must only contain a single value so you can't have a cell in a table that has a list of values in that cell that's a no-no and finally each value should be non divisible which means that if you've got something in a Cell you can't break that down into further sensible parts so let's have a look at an example of each of those so each rows must be it each room has been unique so let's a look here we've got a table of takeaway orders here and we got for each takeaway order we've got customer name and a customer order so for instance Bob Jones orders here he ordered burger fries and coke Fred ordered nuggets lemonade and fries and then Bob also ordered burger fries and coke now that's fine apart from we don't we can't tell the difference between this row here and this row here so we have no idea for instance which which order was ordered first no idea at all so we need to change that because they need to be uniquely identifiable and the way we do that nice and simple is we just add an order ID column so here order ID number one all that was Bob Jones and he ordered his favorites their order I too was was Fred's and he ordered all those and then the final order was Bob Jones and he ordered his favorite I'd order again there so because we got these order ID that row there he's now on no longer identical to that row there there we go good so that's the first part of first normal form done second part let's have a look each cell must only contain a single value well yep as you can see here this cell here doesn't contain a single value it contains three items a burger fries and coke if at all of them do so these ones here these and this in this area the actual order I do they all do so that's no good so what we need to do there is we need to move those into a separate table so that each row only has one item and some alone so it's going to look a bit like that this table here has an order ID in one column and the item corresponding to that order ID there so order ID one has a burger order ID one also has fries and order ID one also has a coke in there and as you can say each item in this table now is unique and there's no doop-doop and multiple values per cell here so that's sorted now that's the second bit Goods third thing all data must be atomic or non divisible well what's the problem well if you ever look over here at customer named Bob Jones you can actually see that Bob Jones isn't atomic it can be divided further into Bob's fur into his first name and last name and that's no good okay because this that makes it some kind of composite value that's no good it needs to be separated out because there's a different things it looks just like that so you create a column for their first name a column for that last name and you just put them in separate cells looks like that good so that's how you put a database table into first normal form so now let's move on to second normal form now second normal form not quite as much to it all you need is there are no partial dependencies well what does that mean well what it means is first of all that the table must for be in first normal form so you can't have a table that's in second normal form until you've gone through all those rules that we just did from the previous one from first normal form it must must do this first and then we must have all non-prime attribute should be fully dependent on the candidate key and it's fully dependent on the whole candidate key well what does that mean it sounds a bit complicated but it's not so let's have a look here we've got a list here of students so the student ID the course ID that that student is doing and how much that course costs so here this student they're doing course so student 1 he's doing course ID number 1 and that course costs 500 and student so look so student ID 2 here he's doing course 3 and it's 750 and student 3 is as fish dude 2 is doing is doing course 3 and here student 3 is also doing course three and it's costing 750 well what's the problem with his table and the problem is is that the course free has got nothing to do with the student ID so the course fee is a set fee that only depends on the course ID so this particular course course one costs 500 course two cost of thousands course three cost 750 and as you can see here it doesn't matter whether it's student ID student 2 is doing course 3 or student 3 is doing course free for either of those students it costs 750 for each and that's because the course fee has nothing to do with the student ID now it's important here to note that the course fee is not dependent on the student ID and the student ID forms part of this composite key here of student ID and course ID so this is the composite key but this bit is only dependent on that bit and it's not dependent on that beer and that's where you get this partial bit because it's not dependent it is dependent on this part of the composite key but it's not depending on this part of the composite key so that's where we get the partial part of it so what have you got to do to fix this well nice and simple all we do is remove this information to a separate table just like that so we have a table about course fees and each course ID has a course for your touch to it of course one is 500 of course 2 is a thousand etc and the other table we pretty much keep the same we just but we don't need this extra course fee information we just get rid of that column and we only have the student ID and the corresponding course ID so student 1 is doing course one student 1 is also doing close to student 2 is doing course 1 and course 3 etc etc so that is no partial dependency it must be fully dependent on each of these parts of the composite key if you've got one okay so that's second normal form third normal form know transitive dependency so it's similar to the previous one but we're not talking this time about the primary key or the composite key we're talking about something else so just like when you try to put something into second normal form it had to already be in first normal form well third normal form is the same idea what you need to do is it needs to be in second normal form but of course that also means that therefore if it needs to be in second normal form it must also be in first normal form because the to go back so anything that's in thirds got to be in second and obviously anything this in seconds already gotta be in first so it needs to tick all those over boxes it can't be cheaply curds it can't be divisible there can be multiple values it also can't have a partial dependency so all of those and then it must also do this third this other other item here which is no transitive dependency again sounds a bit weird but all it means is that all fields must be only determinable by the primary or composite key and you can't determine the any of the fields by other keys in there so what does that mean well let me show you a nice simple example here we go so here we've got a list of tournaments that I've called from Wikipedia and here it's ER there's the torn-up name and that's the year the tournament was in and then we got the winner of the tournament and the winners date of birth so this works well you it's it all fits through to first and second normal form but there is a problem with this with this table and the problem is is that you can determine one of these keys by looking at another one of those keys and you see which one it is well nice and simple it's here you can tell who the winner what the winners date of birth is by going to the winner's name so here you can go you know that the winners date of birth is this this day of birth because that's his name again he's got that name so although it's not dependent on these bits here you can't tell the winners date of birth from the tournament name or the year you can tell the winners date of birth from the winner's name okay so that's no good so this information as it stands can't sit inside of the table so what are we going to do we've got to fix that and you could do that by having a separate table of winners here so for instance here we've got the in the tournament name the year and the winners the winner there and we have a separate table that says okay Alfredson his date of birth is this and we split the two in split the two and this this works because here the winner is dependent on the tournament the tournament name and the year because that tournament happens every year so you can't work out who the winner is just by the name of the tournament itself and there are multiple tournaments in one year which means you can't determine the winner just by looking at the year you have to use both of the name of the tournament and the year in which it was in order to work out who the winner is so that fits perfectly and then over here again the date of birth corresponds to that particular winner there you go okay good so that is first second and third normal form if you like that so then please subscribe to my channel if you've got any comments or any suggestions for other videos then please just drop me a comment and good luck thank you very much