Normalization is the process of efficiently organizing data in a database by reducing data redundancy while ensuring the integrity of data dependency. Normal forms are used in this process to describe the stages of normalization. In theory, it started from stage one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see the first three NFs (1NF, 2NF, and 3NF) and the last two NFs (4NF and 5NF) are seldom used.
First Normal Form (1NF)
1NF means that the value of any column in a row must be single valued (i.e. atomic). Imagine a table with following fields: Employee No (emp_no) and Project No (project_no) where the relationship is one-to-many i.e. one employee may take up multiple projects. The table may look like this:
10102 (p1, p3)
The table is not in 1NF, as project_no contains more than 1 value. Once we've ensured the rule of atomic, the table will be in 1NF as follows:
Second Normal Form (2NF)
Primary key refers to the column of a table that is able to identify each row uniquely. In the earlier table, both emp_on and project_no form a composite primary key (i.e. having more than 1 column as primary key). Expanding the same example with more columns that entails which department the employees belong to, such as department id (dept_id), department name (dept_name) and department location (dept_loc). A table may look like this:
emp_no project_no dept_id dept_name dept_loc
10102 p1 d1 Sales L1
10102 p3 d1 Sales L1
25348 p1 d2 Marketing L2
Here, there is some redundancy on the dept_name and dept_loc. Not only the redundant info would take up more storage space, there is a chance of update error whenever the employee changes department or the department relocates. To be in 2NF, remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys (i.e. primary keys in other table). The resultant tables may look like this:
emp_no dept_id dept_name dept_loc
10102 d1 Sales L1
25348 d2 Marketing L2
In the newly separated table, the emp_no is both the primary key and foreign key (i.e. reference key to a primary key of another table). Note: a table with a one-column primary key is always in 2NF.
Third Normal Form (3NF)
To be in 3NF, the table must first satisfy the requirements of 1NF and 2NF. Next, there must be no functional dependencies between the non-key columns. In the earlier separated table, it is not in 3NF because the dept_name is dependent on the dept_id, which is another non-key. To be in 3NF, another new table is separated from it and the final resultant tables may look like this:
dept_id dept_name dept_loc
d1 Sales L1
d2 Marketing L2
In the last table, dept_id becomes the primary key and all the three tables are now in 3NF.