Lecture on Normalization

Jun 22, 2024

Lecture on Normalization

What is Normalization?

Technique to remove or reduce redundancy in a database table.
Redundancy refers to duplicacy of data.
If not completely removable, the goal is to reduce it.

Types of Duplicacy

Row Level Duplicacy
Column Level Duplicacy

Row Level Duplicacy

Example: Student table with columns SID, S. Name, Age.
- Values: (1, John, 20), (2, Mike, 21), (1, John, 20)
- Rows 1 and 3 have exact same values, this is row-level duplicacy.
Solution: Use of Primary Key
- Set SID as the primary key.
- Primary key properties: Unique + Not Null.
- Ensures no duplicate or empty SID values, preventing row-level duplicacy.

Column Level Duplicacy

Columns: Student ID, Student Name, Course ID, Course Name, Faculty ID, Faculty Name, Salary.
No two rows are exactly same (row-level duplicacy removed with primary key).
Example: Repeated values in Course ID, Course Name, Faculty ID, Faculty Name, Salary columns.

Problems Caused by Column Level Duplicacy

Insertion Anomaly
- Difficulty in inserting new courses or faculty data without related student data (due to primary key constraints).
Deletion Anomaly
- Deleting a student’s data might remove important course and faculty information linked to that student.
Updation Anomaly
- Updating repeated column data (e.g., faculty salary) is inefficient and inconsistent.

Anomalies Explained

Insertion Anomaly

Example: New course added without student data leads to primary key conflict (cannot leave SID empty or insert dummy values).
Cannot directly insert new course/faculty data due to dependency on student data.

Deletion Anomaly

Example: Deleting student with SID=2 removes all related course and faculty data.
Important information could be lost if the deleted student was the only one in that course.

Updation Anomaly

Example: Changing salary of Faculty ID F1 repeatedly wherever it appears.
If F1’s salary appears 100 times, the update has to execute 100 times.
Leads to inefficiency; single change should ideally reflect across database.

Solution: Normalization

Dividing table into multiple related tables to eliminate redundancy.
- Table 1: SID, S.Name
- Table 2: CID, C.Name
- Table 3: FID, F.Name, Salary
Each table has its own primary key.
Solves insertion, deletion, and updation anomalies by isolating redundant data into separate tables.

Conclusion

Proper table division according to normalization rules can efficiently remove redundancy.
Ensures database integrity and optimizes operations like insertion, deletion, and updation.

Thank you!

Full transcript