Okay, welcome. This is just a short tutorial on how to identify normal forms in a database design. Okay, so it's full of rules and you really kind of have to know the rules to be able to make any sense of this. So let's just begin. Okay, so the idea is that there's some rules and assume that you've been given a database that's in horrible shape and it's got all sorts of violations on it and your job is to go through and identify the violations and then after you've identified them, you can go in there and make the fixes. So step one in fixing a broke database is to identify what's wrong. That kind of makes sense. So that's what we're going to do. So you get a table and you go through these things called normal forms. That's a term that the database people use. And there's a a set of rules and you just go through them in order. And the first one is does it meet the definition of a relation? And then we go through first normal form and second normal form all the way through. So let's just start with the rules. The first one is what the heck is a relation? Well, the term relation is kind of an unusual one in the database world. It talks about a set of rules. Basically, there's eight rules. Um, and let's just go through them. The first rule basically is that the the table that we're talking about only contains information about one thing. That's kind of what it's all about. And that a single row on that table contains information about a single thing. In other words, you can't have one row that has like two students in it. Make sense, right? So, if you had a stu a table for students, every row is just one student. You can't have student number one, then student number two, and then student number three, and student number four. Make sense? Okay, cool. Then, and the next rule is like every column in a table has to have stuff that's somehow directly related to the topic. And that one is a little weird and we'll go into that one a little bit later. So, for example, if this is a student table, I' i'd expect to have the students, you know, first name and last name and what their major is and maybe what their grade point average or something, but I wouldn't I wouldn't expect to find like the room number for the class that they're taking because students don't have room numbers. Okay, that that's basically what that means. The next one talks about that if you have a column in like let's say you have a column in an Excel spreadsheet in an Excel spreadsheet you could have numbers and text and you know all sorts of things in there but in a database you pick one particular data type and everything in that column has to has to keep that data type. So if you pick number it's going to be numbers forever. If you pick text it's going to be text forever. That's what that means. every cell, you know, every combination of a row and column uh contain information about just one thing, not two things. Okay? So, you can't have multiple things being shared in a particular cell. And the next couple are kind of weird. It basically says that um you can't depend on the rows being in any particular order and you can't depend on the columns being in any particular order. And then lastly, you can't have any duplicates. Okay, the most important one when when talking about normal forums is that second one where all columns contains data related to a single entity. All right, so let's do an example. So here's an example of a horrible design. I got student ID, first name, last name, major, student type, and phone. So let's see how many violations we can find. So right off the bat, um I have inconsistent data types. Clearly, some of these are numbers and some of these are text. That one's easy to spot. How about the rule for the cell can't have more than one value. There's a good one there. How about row order? I told you that the rows can't be in a particular order. So, William Bonan has two phone numbers and they just wrote the second one underneath, but there's no there's no guarantee that this row is always going to be immediately after that row. So, that's not going to work. And then lastly, I actually have a duplicate. I got John Doe math and John Doe math down here. So I actually have a duplicate. So those that's kind of a simple example of how to use the relation rules to uh see if if a table is in meets those rules. Okay. So that's not too difficult, is it? The worst one, the most complicated one is the one we're going to be talking about an awful lot. So let's move on to first normal form. So a first normal form is not really anything special. All it basically says is that the table's already a relation which means those eight rules already apply plus it has a primary key. Now just as a review primary key means some sort of a key that you have designated that can uniquely identify a row. So that's really just a practical application of uh that last rule in the relation that says you can't have two rows that are the same. The practical way you do that is through a primary key. Okay. Now, this talks about there's a term they use to describe this. It's called a functional dependency. And a functional dependency is a kind of a strange thing. It basically what it says is if you give me one piece, I'll be able to tell you something else. So to test whether something has a functional dependency, just say it out loud. Given a student ID like a primary key, I should be able to tell the first name and the last name and the major. That that all that sounds right, doesn't it? But given a student ID, I should not be able to tell the room number for the class he's currently attending. Not not directly because students don't have room numbers. Students have first names and last names, but they don't have room numbers. Make sense? Okay, that's the most complicated part. And the functional dependency again the concept is not tough. It's just that the idea of everything has to be related. That's the the fundamental thing on a first normal form violation. So given an employee ID, I should be able to tell the name, address, title, position, all sorts of things about the employee. Let's do an example. So here we go. Here's a stock uh inventory of some sort and I have an item number and so item number 12 is a kumquat and I have a unit price and the quantity on hand and I have a customer and order. So say it out loud given an an item number, a stock number, should I be able to tell the description of the product? Yeah. If you gave me a stock number, I should be able to tell you what the what the product is. Should I be able to tell you what the unit price is? Yeah. How about the quantity on hand at the grocery store? Yeah, I go along with that. But given an item number, should I be able to tell the customer? The customer's name? No. No. So, this item is the primary key. Description, that's fine. Unit price, that's fine. Quantity on hand, that's fine. But no, the item number does not determine that's the functional dependency term does not determine the customer and it doesn't determine the order order number. Now, we're not saying that you don't need this information, you know, the customer and order number, right? We're not saying, oh, we need to eliminate that. What we're saying is you need to move it someplace else. Okay, cool. So this might look subjective just looking at it going, you know, there's, you know, you could go, well, yeah, it could or couldn't, but in reality it's not. When you get a little bit more experience in there, it's the rules are pretty strict and you can easily spot them. Okay, so let's move on. Second normal form. Second normal form is a still still the same thing. It's still that functional dependency thing except sometimes you have a table that has a compound primary key which means you have more than one column that's needed to make things unique. Remember that. Okay, cool. So if it has a compound primary key then the rule basically says that all the columns have to match all of the key not just part of the key. So they have to be related to the whole thing not part of it. Okay. So, for example, given a player in a game. So, there's a primary key that has two parts, player and game. I should be able to tell the number of hits, runs, and runners batted in. Make sense? Okay, good. So, here's an example of a bad one. So, I have an item number and a warehouse. So, apparently we have what, three warehouses. Uh, we have, well, I guess it's just two two warehouses. We have a main and a south. And then a quantity on hand and a description. So, right off the bat, you might be looking at at this going, I don't see anything wrong. So, the the primary key is made up of both of these items. So far, so good. The quantity on hand, yeah, because the quantity owned hand of of 14 in the main is zero and in the south is 40. So, that makes good sense. But the description the description is not dependent on the warehouse. So, that makes sense, right? is if you got item number 12, it doesn't matter what warehouse it's in that it's always an axe. Make sense? So, this is a violation of second normal form because the description isn't dependent upon the whole thing. It's dependent only on part of the thing. Okay, good. Moving on. Third normal form. Third normal form introduces a new term called transitive dependency. And it's basically you know how transitive transitive anything works like I trust you and you trust Bob therefore I trust Bob that's a transitive relationship you know like a h things kind of hop along well that's exactly what we're talking about here so the rules are that all columns have to be related to only the primary key you can't have another column in there you can't have another hop in the thing so if some other column is being used as a key that would be a violation a transitive dependency. So, here's a cool thing that they talk about. If you're trying to describe the first three forms, it's the key, the whole key, and nothing but the key. That's kind of cute. All right, so let's do an example. So, here we go. So, here's uh a tournament and a year, and who the winner is. So, right off the bat, this is this is a compound primary key, right? Comp a primary key made of both pieces. and the winner. Yeah, because the winner clearly is has nothing to do with the year alone or the tournament alone. It has to be the combination of the two. But can't you kind of look at this for a minute and see that the winner's date of birth really should be related to the winner and not the combin not the PK? Make sense? So just kind of eyeball for this for just a minute. What they're basically saying is winner would be a better key for the winner's date of birth. So this is a transitive dependency meaning that tournament year determined the winner but the winner. So you see how it does a little hop the winner determined. Okay. Now boyce cod normal form typically B CNF is not a new form. It's really not. All it is is closing a loophole of third normal form. When they wrote the third normal form thing, they said a nonkey can't be acting as a key. Okay, that's what they said and they forgot I guess that it could be a candidate key and therefore technically it's not a violation of third normal form. So they created this kind of I would call this like 3.1 if it were me, but they didn't. They called it boycod normal form. It's just that the rules were too tight on third normal form. This is nothing new. It is not a new type. All they did was just tweak the wording in the in the rule. That's all they did. So, this one's a little bit more difficult to describe and even the examples are kind of weird. So, just bear with me. I'll try my best to explain it. Okay. So, here's a BCNF. So, a student in a major determines the advisor. Yeah. Okay. I go along with that. That that makes sense. But isn't it also true that um that the advisor and the major could determine the student? So, that's called a candidate key. A candidate key is something that could be a primary key, but you chose not to. Like so a better example of candidate key would be let's say you have a student ID and a student email. Both of those are unique. I could choose either one to be my primary key. If I choose student ID, the other one is the candidate key. Okay, that's all what candidate key means. So this B, C, and F basically happens when you have an over overlapping candidate key. I have like two columns over here with as my primary key, but I have a candidate key that has two columns and it's overlapping one of the keys. So, whenever you have overlapping candidate keys, you have a BCNF problem. I know this is difficult to see, so you just have to listen to the words. Whenever you have an overlapping compound uh primary key, candidate key, then you have this issue. Okay, we're coming up on the 15 minute mark. So, I'm going to stop this video and we'll just pick it up in part two.