This page looks best with JavaScript enabled

Database Normalization: Scratching the Database Surface

 ·   ·  ☕ 7 min read

Introduction

Database normalization is the process of structuring a database, in accordance with a series of normal forms in order to reduce data redundancy and improve integrity.

In other words, database normalization is basically breaking a table into two to prevent repetitive columns aka data redundancy. It entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).

That still does not makes sense, right? For this to make sense I would like to define some prerequisites.

Definitions

  1. Primary Key: Primary key is the key or column selected by database administrator to uniquely identify tuples (aka records) in a table. There can be only 1 primary key in a table.
  2. Foreign Key: Foreign key are primary key of a different table and is used to create relationship between two tables.
  3. Composite Key: If there is no single primary key, we select more than one columns whose combintion would act as a primary key. This is called composite key.

What are normal forms by the way?

We already defined what normalization is. Now there are a few ‘passes’ in which data can be normalized. One pass is known as a normal form.

So what are different normal forms?

1. First Normal Form

This is step 1 of the normalization process. At least follow 1st normal form or do not use databases. Scalable Table design which can be extended.

  1. There should be a primary key in the table.
  2. Each column should contain atomic values. Meaning only one value should be there, not a set or list of values.

A column should contain values that are of the same type. Do not inter-mix different types of values in any column. Most RDBMS will throw you an error if you pass another datatype than schema. Each column should have a unique name. Same names lead to confusion at the time of data retrieval.

Order in which data is saved doesn’t matter. Using SQL query, you can easily fetch data in any order from a table.

Break the list of two items into 2 rows. This will create redundancy, but let’s wait for the Second Normal Form. But first, let’s look at an example. Below is a table without 1st normal form applied. This dataset is of a car rental company where car owners can exibit their car for rent. And other party can book their car for a certain period of time.

Customer IDCustomer NameCar Number PlateCar NameDate of TransactionOwner IDOwner Name
C12SachinCarQ1234 CarQ5436Swift Thar12/01/2020 18/01/2020076 078Dev Irfan
C46RahulCarQ3421 CarQ6534 CarQ3789Baleno Honda City Swift12/01/2020 14/01/2020 15/01/2020054 065 086Rohit Shikhar Irfan

Above data is not in 1st normal form because each customer has multiple car number plates in a single row.

Customer IDCustomer NameCar Number PlateCar NameDate of TransactionOwner IDOwner Name
C12SachinCarQ1234Swift12/01/2020076Dev
C12SachinCarQ5436Thar18/01/2020078Irfan
C46RahulCarQ3421Baleno12/01/2020054Rohit
C46RahulCarQ6534Honda City14/01/2020065Shikhar
C46RahulCarQ3789Swift15/01/2020086Irfan

Is it in first normal form now? Definately not. It does not have a primary key.

If you look closely, none of the field is unique and every entry is prone to repeat, even Car Number Plate; which will repeat as the car is booked by a different person.

In a case where there is no single primary key, we have a select a composite key, which as we know is combination of two column which together can act unique among the row.

We can select Car Number Plate + Date of Transation as a composite key. Here we assume that one car can be booked by only one person in a day. By taking these two fields, we can say our table is in first normal form now.

2. Second Normal Form

To have a table in 2nd normal form:

  1. The table to be in 1st Normal Form.
  2. There should be no Partial Functional Dependencies.

What is Partial Functional Dependency?

Let us understand this by an example.

We have our composite key. By keeping that aside and looking at rest of the table, we can say that Car Name is dependent on one of the field from our composite key; Car Plate Number.

When field is only dependent on one of the composite fields, it is called that a particular field is having partial functional dependency. PFD only happens in the case of composite key.

Owner ID is also PFD on Car Number Plate. Same with Owner name.

We’ll have to break our dataset into two tables.

Car Number PlateCustomer IDCustomer NameDate of Transaction
CarQ1234C12Sachin12/01/2020
CarQ5436C12Sachin18/01/2020
CarQ3421C46Rahul12/01/2020
CarQ6534C46Rahul14/01/2020
CarQ3789C46Rahul15/01/2020

The Car Number Plate in this table acts as a foreign key in the other table.

And this is our another table:

Car Number PlateCar NameOwner IDOwner Name
CarQ1234Swift076Dev
CarQ5436Thar078Irfan
CarQ3421Baleno054Rohit
CarQ6534Honda City065Shikhar
CarQ3789Swift086Irfan

3. Third Normal Form

For a table to be in 3rd Normal Form:

  • It should be in 2nd Normal Form
  • There should be no transitive dependencies.

What is a Transitive Dependency?

Transitive Dependency can also be said as Indirect dependencies. Let us take an example again.

Look at table 2 from previous section. Owner Name depends on Owner ID and Owner ID dependens on Car Number Plate. You can’t find Owner Name directly from the Car Number Plate (there can be repeating name). The other two fields, Car Name and Owner ID directly depends on the Car Number Plate.

We now have to further split two tables into other 2. First table from previous section is now split into two.

Car Number PlateCustomer IDDate of Transaction
CarQ1234C1212/01/2020
CarQ5436C1218/01/2020
CarQ3421C4612/01/2020
CarQ6534C4614/01/2020
CarQ3789C4615/01/2020

And this…

Customer IDCustomer Name
C12Sachin
C46Rahul

Now the second table is also split into two:

Car Number PlateCar NameOwner ID
CarQ1234Swift076
CarQ5436Thar078
CarQ3421Baleno054
CarQ6534Honda City065
CarQ3789Swift086

An this…

Owner IDOwner Name
076Dev
078Irfan
054Rohit
065Shikhar
086Irfan

Why we normalize tables?

Anamolies. There are different type of anamolies.

What are different type of anamolies?

  1. Updation Anamolies

  2. Deletion Anamolies

  3. Insertion Anamolies

We’ll get to know more about this with examples.

Updation Anamolies

So for setting the stage, let’s assume that our table is not normalized. Look at the table from 1st normal form.

Now assume that Dev has sold his car to Roushan. If you go ahead and replace the Owner Name in first row, you’ll still have N number of rows remaining to update.

We have to update fields at mulpitle place.

Deletion Anamolies

Although it’s not practicle in the table above. Assume that you have to remove a customer. If you remove a customer Sachin, you’ll be deleting all the data with it including Car Number Plate, Car Name, Owner ID/Name.

This is known as Deletion Anamolies.

Insertion Anamolies

In Insertion Anamolies, you can’t insert a new row without you have all the other data associated with that row. For example, you want to add a new car to the database. You also need a customer, date of transaction and so on.

This is known as Insertion Anamolies.

Why not to normalize database

If your system uses non-relational database, you might be against normalization. Although relation database can scale horizontally too, it’s not by design efficient at that.

If you have multiple table on multiple machines, you will have a network overload with each join. It does not scales well.

If you are using databases such as document based database, you’d want to have all related data on same document. So that when you shard, you don’t have to deal with network overload making the application slower.

Conclusion

This was my first post realted to databases in my blogging history. I have learn from it more than teaching from it. I hope this post was worth of your time too.

Cover image is (c) Integrify.com

Share on

Santosh Kumar
WRITTEN BY
Santosh Kumar
Santosh is a Software Developer currently working with NuNet as a Full Stack Developer.