This page looks best with JavaScript enabled

Database Normalization: Scratching the Database Surface

 ·   ·  ☕ 7 min read


Database normalization is the process of structuring a database, in accordance with a series of normal forms in order to reduce data redundancy and improve integrity.

In other words, database normalization is basically breaking a table into two to prevent repetitive columns aka data redundancy. It entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).

That still does not makes sense, right? For this to make sense I would like to define some prerequisites.


  1. Primary Key: Primary key is the key or column selected by database administrator to uniquely identify tuples (aka records) in a table. There can be only 1 primary key in a table.
  2. Foreign Key: Foreign key are primary key of a different table and is used to create relationship between two tables.
  3. Composite Key: If there is no single primary key, we select more than one columns whose combintion would act as a primary key. This is called composite key.

What are normal forms by the way?

We already defined what normalization is. Now there are a few ‘passes’ in which data can be normalized. One pass is known as a normal form.

So what are different normal forms?

1. First Normal Form

This is step 1 of the normalization process. At least follow 1st normal form or do not use databases. Scalable Table design which can be extended.

  1. There should be a primary key in the table.
  2. Each column should contain atomic values. Meaning only one value should be there, not a set or list of values.

A column should contain values that are of the same type. Do not inter-mix different types of values in any column. Most RDBMS will throw you an error if you pass another datatype than schema. Each column should have a unique name. Same names lead to confusion at the time of data retrieval.

Order in which data is saved doesn’t matter. Using SQL query, you can easily fetch data in any order from a table.

Break the list of two items into 2 rows. This will create redundancy, but let’s wait for the Second Normal Form. But first, let’s look at an example. Below is a table without 1st normal form applied. This dataset is of a car rental company where car owners can exibit their car for rent. And other party can book their car for a certain period of time.

Customer IDCustomer NameCar Number PlateCar NameDate of TransactionOwner IDOwner Name
C12SachinCarQ1234 CarQ5436Swift Thar12/01/2020 18/01/2020076 078Dev Irfan
C46RahulCarQ3421 CarQ6534 CarQ3789Baleno Honda City Swift12/01/2020 14/01/2020 15/01/2020054 065 086Rohit Shikhar Irfan

Above data is not in 1st normal form because each customer has multiple car number plates in a single row.

Customer IDCustomer NameCar Number PlateCar NameDate of TransactionOwner IDOwner Name
C46RahulCarQ6534Honda City14/01/2020065Shikhar

Is it in first normal form now? Definately not. It does not have a primary key.

If you look closely, none of the field is unique and every entry is prone to repeat, even Car Number Plate; which will repeat as the car is booked by a different person.

In a case where there is no single primary key, we have a select a composite key, which as we know is combination of two column which together can act unique among the row.

We can select Car Number Plate + Date of Transation as a composite key. Here we assume that one car can be booked by only one person in a day. By taking these two fields, we can say our table is in first normal form now.

2. Second Normal Form

To have a table in 2nd normal form:

  1. The table to be in 1st Normal Form.
  2. There should be no Partial Functional Dependencies.

What is Partial Functional Dependency?

Let us understand this by an example.

We have our composite key. By keeping that aside and looking at rest of the table, we can say that Car Name is dependent on one of the field from our composite key; Car Plate Number.

When field is only dependent on one of the composite fields, it is called that a particular field is having partial functional dependency. PFD only happens in the case of composite key.

Owner ID is also PFD on Car Number Plate. Same with Owner name.

We’ll have to break our dataset into two tables.

Car Number PlateCustomer IDCustomer NameDate of Transaction

The Car Number Plate in this table acts as a foreign key in the other table.

And this is our another table:

Car Number PlateCar NameOwner IDOwner Name
CarQ6534Honda City065Shikhar

3. Third Normal Form

For a table to be in 3rd Normal Form:

  • It should be in 2nd Normal Form
  • There should be no transitive dependencies.

What is a Transitive Dependency?

Transitive Dependency can also be said as Indirect dependencies. Let us take an example again.

Look at table 2 from previous section. Owner Name depends on Owner ID and Owner ID dependens on Car Number Plate. You can’t find Owner Name directly from the Car Number Plate (there can be repeating name). The other two fields, Car Name and Owner ID directly depends on the Car Number Plate.

We now have to further split two tables into other 2. First table from previous section is now split into two.

Car Number PlateCustomer IDDate of Transaction

And this…

Customer IDCustomer Name

Now the second table is also split into two:

Car Number PlateCar NameOwner ID
CarQ6534Honda City065

An this…

Owner IDOwner Name

Why we normalize tables?

Anamolies. There are different type of anamolies.

What are different type of anamolies?

  1. Updation Anamolies

  2. Deletion Anamolies

  3. Insertion Anamolies

We’ll get to know more about this with examples.

Updation Anamolies

So for setting the stage, let’s assume that our table is not normalized. Look at the table from 1st normal form.

Now assume that Dev has sold his car to Roushan. If you go ahead and replace the Owner Name in first row, you’ll still have N number of rows remaining to update.

We have to update fields at mulpitle place.

Deletion Anamolies

Although it’s not practicle in the table above. Assume that you have to remove a customer. If you remove a customer Sachin, you’ll be deleting all the data with it including Car Number Plate, Car Name, Owner ID/Name.

This is known as Deletion Anamolies.

Insertion Anamolies

In Insertion Anamolies, you can’t insert a new row without you have all the other data associated with that row. For example, you want to add a new car to the database. You also need a customer, date of transaction and so on.

This is known as Insertion Anamolies.

Why not to normalize database

If your system uses non-relational database, you might be against normalization. Although relation database can scale horizontally too, it’s not by design efficient at that.

If you have multiple table on multiple machines, you will have a network overload with each join. It does not scales well.

If you are using databases such as document based database, you’d want to have all related data on same document. So that when you shard, you don’t have to deal with network overload making the application slower.


This was my first post realted to databases in my blogging history. I have learn from it more than teaching from it. I hope this post was worth of your time too.

Cover image is (c)

Share on

Santosh Kumar
Santosh Kumar
Santosh is a Software Developer currently working with NuNet as a Full Stack Developer.