Transcript for:
Understanding Database Normalization Steps

In this video we will deal with the topic of normalization from the field of databases. Normalization is necessary to avoid anomalies in relational databases. I have already made a video on this topic, or rather, I have already shown what anomalies are and what problems they lead to. I link the video above. Again and again this leads to problems or difficulties. But it is enough to remember the rules and just apply them. I will explain these rules to you step by step. And it should be said that the normalization is a database design and has the sense to prevent anomalies and redundancies. We will start with the zeroth normal form and for this purpose we will we will look at the database of the computer company hyperEDV, which has a database, but it is still renormalized. In a database of the zeroth normal form, all the data is unnormalized in a single table. And this is exactly the case here. Our database consists of a single table with order number, name, date of birth, etc.. And as you can see, this doesn't look very good and shouldn't stay that way, which is why we convert the tables to the first into the first normal form. I already said, there are always quite rigid rules for normalization that have to be followed and the same applies here. The rule to convert tables into the first normal form is: "All attribute values must be atomic". By an attribute itself we mean a column like OrderNo or the name. And by atomic we mean that we cannot break down these attributes further. This is not the case here at the moment, because our table is still in the zeroth normal form and we are now looking at the name, for example. Our customers have a first and a surname and this means that we have to break down these columns as far as possible. possible. We make two new columns out of the name column, namely first name and last name. Because only in this way we can sort by last name at the end and display all customers who begin with an S in the last name, because our computer would not know if Ludger Sievert is in a cell, how it should divide the whole thing. where does the last name begin here, or what is the first name or what is the last name. The same still applies to the address. also the address we would have to divide still into the attributes street, place or postal zip code, because also the computer does not know, how can one distinguish here between street, place and postal zip code. For example, if we display all customers from Münster and they are in one field, this is not very convenient. Means; we must divide everything so WIDE that everything in a single column is atomically in a single column. And then I would say, let's do that and take a look at the result. And yes the attentive ones among you will notice now the house number one could divide nevertheless also still. Of course you could and that would be absolutely correct according to the rule. But we neither want to sort by the house number nor do anything with the house number. That's why it doesn't give us a big advantage. Here it is of course always a question of weighing up whether it makes sense or not, but if we were to apply the rule in this way. Then we would have to transfer the house number of course also in an own column. So far clear? The rule is relatively easy to apply. And we have, if we don't see it strictly, a table in the first normal form. We continue this with the second normal form and for this we look as usual first at the definition which is needed for the second normal form. The rule is as follows: A table is in second normal form if it is in first normal form and every non-key attribute is fully functionally dependent on each key candidate. This sounds a bit more complicated than it actually is. Because it is not that complicated at all. We will notice that now. As first information is important that each normal form requires the previous normal form, as with the second one. Means, if we want to have a table in the second normal form or transfer it to the second the table must first be in the first first normal form. And then we can we can bring the table into the second normal form. This rule is also valid for the third normal form. Means, there the table must be first in the in the second normal form. Now let's look further, because for the second normal form we make a small excursion in relational databases. In relational databases we work with primary keys. The purpose of a key in a relational database, a key is used to identify a data set without identified without any doubt and this value may therefore only occur once. Our primary keys can be recognized by the fact that the columns above are underlined like customer number and order number are underlined. For this one or maybe more columns are defined, which take over this task. For an order for example we can we can well imagine the order number for this. An order has a number and this number is unique. It does not exist more than once. There is nothing to do with it. The same applies to the customer and to all articles. Here, too, the customer number or article number the article number is unique. Each customer has its own number and each article has its own number. And now we turn our attention to the to the story of the fully functional dependencies. We now get a total of four tables as you can see, which have been created one table at the beginning. These are Order, Customer, Article and OrdersArticle. We will now look step by step at why these tables have been created in this form and how we deal with them. Each of these tables in a relational database has a primary key and non-key attributes which in the second normal form are fully functionally depend on primary keys. We look at at the customer, for example. Since the first name is fully functionally dependent to the customer ID. We can see this even better in the table OrderArticle. Here we use we use a composite primary key of order number and article number. So both columns form the primary key here. Order and article must not appear together in this must not occur again in this table. In addition there is the column number, which and there one can you can see it relatively well, equally from the order number as well as from the article number as well. So we always need both information. So the order number and the article number together so that the number is meaningful. That means, the number is fully functionally dependent to the whole primary key. So we have each column that was not fully functionally dependent on a key kanidata from a key channel data in the previous table into a sub-table. We remember, we had our own key candidates, respectively the article number, customer number and order number in one table and now we have moved them to into their own tables, which is normal in the second normal form. normal form. Now you ask yourselves how do we get the second customer number and the second we get the second customer number and the second article number and the second order number. We will now look at this in detail. Let's have a look at the relationship between the Customer table and the Order table and the relationship between the table Order, OrderArticle and Article and the table OrderArticle table. Let's start first with the order and the customer and we can see that the customer number appears in the Order table. Here is a relationship between the the Customer table and the Purchase Order table and it is a 1 to n relationship. The 1 means in this case, that each order can have one or no customer customer. And the n means that each customer has not made one or more orders. orders. This is resolved in relational databases so that the primary key on the on the customer side acts as a foreign key in the Order table. Thus each order number can only appear once, since the Order number column is the primary key and the customer can appear multiple times and can have multiple purchase orders. Thus we solve only one to n relationship in relational databases. databases. now here we have a second relationship and this one is a bit more complicated. Here we are dealing with an n to n relationship. If we have a 1 to n relationship, as we have just seen, the primary key as a foreign key in the other table we need a new table for an n to n relationship we need a new table in any case. The n to n relationship says that an item can have can have several orders and an order can have several articles