Database Normalisation

When delving into the realm of Database Normalisation you need to keep in mind that when it was first suggested Relational Database Management Systems such as Postgres and SQLServer didn't exist yet. As such much of this was theoretical and so the formal definitions are maths based, use technical terms and are often written as first-order predicate logic. Don't worry if that sounded like a foreign language, I will give the formal definitions but also try to explain them in simple terms as well.

This is one of those topics that is going to need a lot of diagrams to help the explanation so apologies in advance if you're on your phone :)

I am acutely aware, there is a lot of opinion on how far you need to go. I will write up all phases but add notes when I believe a step is far enough for given projects.

WHAT?

Database Normalisation is an iterative design process used to organise and structure data. A well structured database reduces redundancy, maintains integrity and prevents inconsistencies and anomalies. It will also ensure efficient use of storage space. There are several rules or stages referred to as Normal Forms and each one builds on the work of their predecessor.

WHO & WHEN?

British computer scientist Edgar F. Codd first described the process as part of his theory called the Relational Model in 1969. Since then many others have expanded on his work:

WHY?

The result of Database Normalisation provides many advantages:

HOW?

The process involves splitting often large and complex tables of data into several smaller ones and creating relationships between those tables. These relationships define how the data in the two tables are linked together.

The Normal Forms

As previously explained the process of Normalising data is an iterative one. Each step builds on the work done in the previous step. These steps are called Normal Forms and data is only in a Normal Form if it satisfies the criteria for the current Normal Form AND ALL the previous ones.

Below is the list of Normal Forms in the order they should be completed. I have given Unnormalised Form the order value of zero because this is the point at which you start with your original data in the format you get it.

Order Name Shortened Name
0 UnNormalised Form UNF
1 First Normal Form 1NF
2 Second Normal Form 2NF
3 Third Normal Form 3NF
4 Elementary Key Normal Form EKNF
5 Boyce-Codd Normal Form BCNF
6 Fourth Normal Form 4NF
7 Essential Tuple Normal Form ETNF
8 Fifth Normal Form 5NF
9 Domain Key Normal Form DKNF
10 Sixth Normal Form 6NF