Saturday, June 9, 2012

Re-Learning the Basic of Relational Database - Normalization, Primary Key and Foreign Key

The last time that I worked on SQL-based relational database was more than 10 years back when I first started my first job as a C/C++ and Java programmer. It's time for me to revise that knowledge again. Recently, I've picked up a book called "Microsoft SQL Server 2012: A Beginner's Guide" by a German professor named Dusan Petkovic. I've started chapter 1 and it's all about the basic of relational database on Normalization process and various key concepts. How I wished the author could explain in a even more simpler language.

Normal Forms
Normalization is the process of efficiently organizing data in a database by reducing data redundancy while ensuring the integrity of data dependency. Normal forms are used in this process to describe the stages of normalization. In theory, it started from stage one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see the first three NFs (1NF, 2NF, and 3NF) and the last two NFs (4NF and 5NF) are seldom used.

First Normal Form (1NF)
1NF means that the value of any column in a row must be single valued (i.e. atomic). Imagine a table with following fields: Employee No (emp_no) and Project No (project_no) where the relationship is one-to-many i.e. one employee may take up multiple projects. The table may look like this:
emp_no         project_no
10102           (p1, p3)    


The table is not in 1NF, as project_no contains more than 1 value. Once we've ensured the rule of atomic, the table will be in 1NF as follows:
emp_no         project_no
10102               p1         
10102               p3         


Second Normal Form (2NF)
Primary key refers to the column of a table that is able to identify each row uniquely. In the earlier table, both emp_on and project_no form a composite primary key (i.e. having more than 1 column as primary key).  Expanding the same example with more columns that entails which department  the employees belong to, such as  department id (dept_id), department name (dept_name) and department location (dept_loc). A table may look like this:
emp_no         project_no       dept_id      dept_name       dept_loc
10102               p1                 d1             Sales                  L1    
10102               p3                 d1             Sales                  L1    
25348               p1                 d2             Marketing           L2     


Here, there is some redundancy on the dept_name and dept_loc. Not only the redundant info would take up more storage space, there is a chance of update error whenever the employee changes department or the department relocates. To be in 2NF, remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys (i.e. primary keys in other table). The resultant tables may look like this:
emp_no         project_no  
10102               p1           
10102               p3           
25348               p1           


emp_no           dept_id        dept_name       dept_loc
10102                  d1                Sales                  L1  
25348                  d2                Marketing           L2  

In the newly separated table, the emp_no is both the primary key and foreign key (i.e. reference key to a primary key of another table). Note: a table with a one-column primary key is always in 2NF.


Third Normal Form (3NF)
To be in 3NF, the table must first satisfy the requirements of 1NF and 2NF. Next, there must be no functional dependencies between the non-key columns. In the earlier separated table, it is not in 3NF because the dept_name is dependent on the dept_id, which is another non-key. To be in 3NF, another  new table is separated from it and the final resultant tables may look like this:
emp_no         project_no  
10102               p1           
10102               p3           
25348               p1           


emp_no           dept_id 
10102                  d1   
25348                  d2   


dept_id        dept_name       dept_loc
d1                   Sales                  L1   
d2                   Marketing           L2   


In the last table, dept_id becomes the primary key and all the three tables are now in 3NF.

1 comment: