Lecture on Normalization

Jun 22, 2024

Lecture on Normalization

What is Normalization?

  • Technique to remove or reduce redundancy in a database table.
  • Redundancy refers to duplicacy of data.
  • If not completely removable, the goal is to reduce it.

Types of Duplicacy

  1. Row Level Duplicacy
  2. Column Level Duplicacy

Row Level Duplicacy

  • Example: Student table with columns SID, S. Name, Age.
    • Values: (1, John, 20), (2, Mike, 21), (1, John, 20)
    • Rows 1 and 3 have exact same values, this is row-level duplicacy.
  • Solution: Use of Primary Key
    • Set SID as the primary key.
    • Primary key properties: Unique + Not Null.
    • Ensures no duplicate or empty SID values, preventing row-level duplicacy.

Column Level Duplicacy

  • Columns: Student ID, Student Name, Course ID, Course Name, Faculty ID, Faculty Name, Salary.
  • No two rows are exactly same (row-level duplicacy removed with primary key).
  • Example: Repeated values in Course ID, Course Name, Faculty ID, Faculty Name, Salary columns.

Problems Caused by Column Level Duplicacy

  1. Insertion Anomaly
    • Difficulty in inserting new courses or faculty data without related student data (due to primary key constraints).
  2. Deletion Anomaly
    • Deleting a student’s data might remove important course and faculty information linked to that student.
  3. Updation Anomaly
    • Updating repeated column data (e.g., faculty salary) is inefficient and inconsistent.

Anomalies Explained

Insertion Anomaly

  • Example: New course added without student data leads to primary key conflict (cannot leave SID empty or insert dummy values).
  • Cannot directly insert new course/faculty data due to dependency on student data.

Deletion Anomaly

  • Example: Deleting student with SID=2 removes all related course and faculty data.
  • Important information could be lost if the deleted student was the only one in that course.

Updation Anomaly

  • Example: Changing salary of Faculty ID F1 repeatedly wherever it appears.
  • If F1’s salary appears 100 times, the update has to execute 100 times.
  • Leads to inefficiency; single change should ideally reflect across database.

Solution: Normalization

  • Dividing table into multiple related tables to eliminate redundancy.
    • Table 1: SID, S.Name
    • Table 2: CID, C.Name
    • Table 3: FID, F.Name, Salary
  • Each table has its own primary key.
  • Solves insertion, deletion, and updation anomalies by isolating redundant data into separate tables.

Conclusion

  • Proper table division according to normalization rules can efficiently remove redundancy.
  • Ensures database integrity and optimizes operations like insertion, deletion, and updation.

Thank you!