📊

Understanding Database Normalization Forms

Aug 27, 2024

Database Normalization Lecture Notes

Introduction to Normalization

  • Definition: Normalization is the process of structuring a database in such a way that it reduces redundancy and dependency by organizing fields and table relationships.
  • Purpose: Protect against bad data, ensure data integrity, and prevent anomalies (insertion, update, deletion).
  • Focus: Understanding normal forms from First Normal Form (1NF) to Fifth Normal Form (5NF).

Importance of Data Integrity

  • Good database design can protect against certain types of bad data.
  • Example of bad data: A customer with two dates of birth indicates a failure of data integrity.
  • Proper normalization prevents contradictory data and makes databases easier to understand and extend.

Normal Forms Overview

  • Think of normal forms as safety assessments for database design.
  • First Normal Form (1NF): Basic safety level, ensures no redundancy or ambiguity.
  • Second Normal Form (2NF): Ensures non-key attributes are fully dependent on the primary key.
  • Third Normal Form (3NF): Eliminates transitive dependencies among non-key attributes.
  • Fourth Normal Form (4NF): Addresses multi-valued dependencies.
  • Fifth Normal Form (5NF): Ensures that a table cannot be derived from joining other tables.

First Normal Form (1NF)

  • Criteria for 1NF:
    1. No row order conveys information.
    2. No mixing of data types within a single column.
    3. Every table must have a primary key.
    4. No repeating groups of data (e.g., inventory items).
  • Example Violations:
    • Using row order to convey height of individuals.
    • Mixing data types in the same column.
    • Not having a primary key defined.
    • Storing repeating groups of data in a single row.

Second Normal Form (2NF)

  • Definition: Every non-key attribute must be fully dependent on the entire primary key.
  • Example: Player inventory table where player ratings are part of the primary key. This design can lead to:
    • Deletion Anomaly: Losing player rating when all inventory is deleted.
    • Update Anomaly: Inconsistent player ratings after an update.
    • Insertion Anomaly: Inability to insert a new player without inventory.
  • Correction: Separate player data into its own table to eliminate dependency issues.

Third Normal Form (3NF)

  • Definition: Every non-key attribute must depend only on the primary key, with no transitive dependencies.
  • Example: Adding player skill level introduces a dependency between player rating and player skill level, leading to inconsistencies.
  • Correction: Remove player rating from the player table and create a new table for player skill levels.

Fourth Normal Form (4NF)

  • Definition: Addresses multi-valued dependencies in tables.
  • Example: A table that combines birdhouse models, colors, and styles can lead to inconsistencies when managing new colors or styles.
  • Correction: Split into separate tables for models, colors, and styles to maintain independence.

Fifth Normal Form (5NF)

  • Definition: Ensures a table cannot be derived from joining other tables.
  • Example: Combining preferences for ice cream flavors and brands in a single table can lead to inconsistencies.
  • Correction: Create three separate tables for flavors, brands, and preferences to accurately represent the relationships.

Summary of Normal Forms

  1. First Normal Form (1NF): No row order, no mixed data types, primary key required, no repeating groups.
  2. Second Normal Form (2NF): Non-key attributes must be dependent on the entire primary key.
  3. Third Normal Form (3NF): Non-key attributes must depend on the key, the whole key, and nothing but the key.
  4. Fourth Normal Form (4NF): Only multi-valued dependencies on the key are allowed.
  5. Fifth Normal Form (5NF): Must not be possible to describe the table as the logical result of joining other tables.

Conclusion

  • Following normalization processes enhances data integrity and reduces redundancy.
  • Understanding these forms is crucial for efficient database design.