Definition: Normalization is the process of structuring a database in such a way that it reduces redundancy and dependency by organizing fields and table relationships.
Purpose: Protect against bad data, ensure data integrity, and prevent anomalies (insertion, update, deletion).
Focus: Understanding normal forms from First Normal Form (1NF) to Fifth Normal Form (5NF).
Importance of Data Integrity
Good database design can protect against certain types of bad data.
Example of bad data: A customer with two dates of birth indicates a failure of data integrity.
Proper normalization prevents contradictory data and makes databases easier to understand and extend.
Normal Forms Overview
Think of normal forms as safety assessments for database design.
First Normal Form (1NF): Basic safety level, ensures no redundancy or ambiguity.
Second Normal Form (2NF): Ensures non-key attributes are fully dependent on the primary key.
Third Normal Form (3NF): Eliminates transitive dependencies among non-key attributes.
Fourth Normal Form (4NF): Addresses multi-valued dependencies.
Fifth Normal Form (5NF): Ensures that a table cannot be derived from joining other tables.
First Normal Form (1NF)
Criteria for 1NF:
No row order conveys information.
No mixing of data types within a single column.
Every table must have a primary key.
No repeating groups of data (e.g., inventory items).
Example Violations:
Using row order to convey height of individuals.
Mixing data types in the same column.
Not having a primary key defined.
Storing repeating groups of data in a single row.
Second Normal Form (2NF)
Definition: Every non-key attribute must be fully dependent on the entire primary key.
Example: Player inventory table where player ratings are part of the primary key. This design can lead to:
Deletion Anomaly: Losing player rating when all inventory is deleted.
Update Anomaly: Inconsistent player ratings after an update.
Insertion Anomaly: Inability to insert a new player without inventory.
Correction: Separate player data into its own table to eliminate dependency issues.
Third Normal Form (3NF)
Definition: Every non-key attribute must depend only on the primary key, with no transitive dependencies.
Example: Adding player skill level introduces a dependency between player rating and player skill level, leading to inconsistencies.
Correction: Remove player rating from the player table and create a new table for player skill levels.
Fourth Normal Form (4NF)
Definition: Addresses multi-valued dependencies in tables.
Example: A table that combines birdhouse models, colors, and styles can lead to inconsistencies when managing new colors or styles.
Correction: Split into separate tables for models, colors, and styles to maintain independence.
Fifth Normal Form (5NF)
Definition: Ensures a table cannot be derived from joining other tables.
Example: Combining preferences for ice cream flavors and brands in a single table can lead to inconsistencies.
Correction: Create three separate tables for flavors, brands, and preferences to accurately represent the relationships.
Summary of Normal Forms
First Normal Form (1NF): No row order, no mixed data types, primary key required, no repeating groups.
Second Normal Form (2NF): Non-key attributes must be dependent on the entire primary key.
Third Normal Form (3NF): Non-key attributes must depend on the key, the whole key, and nothing but the key.
Fourth Normal Form (4NF): Only multi-valued dependencies on the key are allowed.
Fifth Normal Form (5NF): Must not be possible to describe the table as the logical result of joining other tables.
Conclusion
Following normalization processes enhances data integrity and reduces redundancy.
Understanding these forms is crucial for efficient database design.