πŸ“Š

Understanding Database Normalization Techniques

Oct 23, 2024

Lecture Notes on Database Normalization

Introduction to Normalization

  • Definition: Normalization is the process of structuring a database to reduce redundancy and improve data integrity.
  • Importance: Helps in avoiding bad data and logical inconsistencies.
  • Outcome: Understanding the five normal forms and the implications of normalization.

Issues with Bad Database Design

  • Bad data can arise from poor database design, leading to data integrity failures.
  • Example: A customer having two dates of birth is logically impossible.

What is Normalization?

  • Goal: Structure database tables to eliminate redundant information.
  • Benefits:
    • Protects against contradictory data.
    • Easier to understand, enhance, and extend.
    • Protects from insertion, update, and deletion anomalies.

Normal Forms Overview

  • Normal forms (1NF, 2NF, 3NF, 4NF, 5NF) provide criteria to assess database design.
  • Analogy: Normal forms are like safety assessments for structures.
    • 1NF: Basic safety guarantee.
    • 2NF: Higher safety guarantee.
    • 3NF: Further refinements, ensuring integrity.
    • 4NF & 5NF: Address more complex dependencies.

First Normal Form (1NF)

Requirements for 1NF:

  1. No Row Order: Data must not convey meaning through row order.
    • Example: Names of Beatles can be listed in any order; use additional columns for related info.
  2. No Mixing Data Types: Each column must have a consistent data type.
  3. Primary Key: Each table must have a primary key to uniquely identify records.
  4. No Repeating Groups: Avoid storing multiple values in a single column.
    • Example: Instead of combining item types in one field, create separate rows.

Second Normal Form (2NF)

Requirements for 2NF:

  • Full Dependency: Every non-key attribute must depend on the entire primary key.
    • Example: In a player inventory table, item quantity should depend on both player and item type.
  • Deletion, Update, Insertion Anomalies: Issues can arise if non-key attributes don’t fully depend on the key.
    • Solution: Separate attributes into different tables if necessary.

Third Normal Form (3NF)

Requirements for 3NF:

  • No Transitive Dependencies: A non-key attribute should not depend on another non-key attribute.
    • Example: Player rating should be directly related to the player, not through skill level.
  • Guideline: Every attribute in a table should depend on the key, the whole key, and nothing but the key.
  • Boyce-Codd Normal Form (BCNF): A stricter version of 3NF.

Fourth Normal Form (4NF)

Requirements for 4NF:

  • Multi-Valued Dependencies: Only acceptable dependencies are those directly on the key.
    • Example: In a birdhouse model table, colors and styles should be in separate tables to avoid inconsistencies.

Fifth Normal Form (5NF)

Requirements for 5NF:

  • Joining Tables: A table should not represent data that can be logically derived from joining other tables.
    • Example: Ice cream preferences should be represented across multiple related tables to ensure non-duplication of data.

Summary of Normal Forms

  1. First Normal Form (1NF): No row order, no mixed data types, must have a primary key, no repeating groups.
  2. Second Normal Form (2NF): Non-key attributes fully depend on the entire primary key.
  3. Third Normal Form (3NF): No transitive dependencies among non-key attributes.
  4. Fourth Normal Form (4NF): Only multi-valued dependencies on the key are allowed.
  5. Fifth Normal Form (5NF): Must not be describable as a logical result of joining other tables.

Conclusion

  • Normalization is crucial for maintaining the integrity and usability of databases.
  • Proper design following the normal forms can significantly enhance data structure and management.