Database Normalization: Scratching the Database Surface

Introduction

Database normalization is the process of structuring a database, in accordance with a series of normal forms in order to reduce data redundancy and improve integrity.

In other words, database normalization is basically breaking a table into two to prevent repetitive columns aka data redundancy. It entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).

That still does not makes sense, right? For this to make sense I would like to define some prerequisites.

Definitions

Primary Key: Primary key is the key or column selected by database administrator to uniquely identify tuples (aka records) in a table. There can be only 1 primary key in a table.
Foreign Key: Foreign key are primary key of a different table and is used to create relationship between two tables.
Composite Key: If there is no single primary key, we select more than one columns whose combintion would act as a primary key. This is called composite key.

What are normal forms by the way?

We already defined what normalization is. Now there are a few ‘passes’ in which data can be normalized. One pass is known as a normal form.

So what are different normal forms?

1. First Normal Form

This is step 1 of the normalization process. At least follow 1st normal form or do not use databases. Scalable Table design which can be extended.

There should be a primary key in the table.
Each column should contain atomic values. Meaning only one value should be there, not a set or list of values.

A column should contain values that are of the same type. Do not inter-mix different types of values in any column. Most RDBMS will throw you an error if you pass another datatype than schema. Each column should have a unique name. Same names lead to confusion at the time of data retrieval.

Order in which data is saved doesn’t matter. Using SQL query, you can easily fetch data in any order from a table.

Break the list of two items into 2 rows. This will create redundancy, but let’s wait for the Second Normal Form. But first, let’s look at an example. Below is a table without 1st normal form applied. This dataset is of a car rental company where car owners can exibit their car for rent. And other party can book their car for a certain period of time.

Customer ID	Customer Name	Car Number Plate	Car Name	Date of Transaction	Owner ID	Owner Name
C12	Sachin	CarQ1234 CarQ5436	Swift Thar	12/01/2020 18/01/2020	076 078	Dev Irfan
C46	Rahul	CarQ3421 CarQ6534 CarQ3789	Baleno Honda City Swift	12/01/2020 14/01/2020 15/01/2020	054 065 086	Rohit Shikhar Irfan

Above data is not in 1st normal form because each customer has multiple car number plates in a single row.

Customer ID	Customer Name	Car Number Plate	Car Name	Date of Transaction	Owner ID	Owner Name
C12	Sachin	CarQ1234	Swift	12/01/2020	076	Dev
C12	Sachin	CarQ5436	Thar	18/01/2020	078	Irfan
C46	Rahul	CarQ3421	Baleno	12/01/2020	054	Rohit
C46	Rahul	CarQ6534	Honda City	14/01/2020	065	Shikhar
C46	Rahul	CarQ3789	Swift	15/01/2020	086	Irfan

Is it in first normal form now? Definately not. It does not have a primary key.

If you look closely, none of the field is unique and every entry is prone to repeat, even Car Number Plate; which will repeat as the car is booked by a different person.

In a case where there is no single primary key, we have a select a composite key, which as we know is combination of two column which together can act unique among the row.

We can select Car Number Plate + Date of Transation as a composite key. Here we assume that one car can be booked by only one person in a day. By taking these two fields, we can say our table is in first normal form now.

2. Second Normal Form

To have a table in 2nd normal form:

The table to be in 1st Normal Form.
There should be no Partial Functional Dependencies.

What is Partial Functional Dependency?

Let us understand this by an example.

We have our composite key. By keeping that aside and looking at rest of the table, we can say that Car Name is dependent on one of the field from our composite key; Car Plate Number.

When field is only dependent on one of the composite fields, it is called that a particular field is having partial functional dependency. PFD only happens in the case of composite key.

Owner ID is also PFD on Car Number Plate. Same with Owner name.

We’ll have to break our dataset into two tables.

Car Number Plate	Customer ID	Customer Name	Date of Transaction
CarQ1234	C12	Sachin	12/01/2020
CarQ5436	C12	Sachin	18/01/2020
CarQ3421	C46	Rahul	12/01/2020
CarQ6534	C46	Rahul	14/01/2020
CarQ3789	C46	Rahul	15/01/2020

The Car Number Plate in this table acts as a foreign key in the other table.

And this is our another table:

Car Number Plate	Car Name	Owner ID	Owner Name
CarQ1234	Swift	076	Dev
CarQ5436	Thar	078	Irfan
CarQ3421	Baleno	054	Rohit
CarQ6534	Honda City	065	Shikhar
CarQ3789	Swift	086	Irfan

3. Third Normal Form

For a table to be in 3rd Normal Form:

It should be in 2nd Normal Form
There should be no transitive dependencies.

What is a Transitive Dependency?

Transitive Dependency can also be said as Indirect dependencies. Let us take an example again.

Look at table 2 from previous section. Owner Name depends on Owner ID and Owner ID dependens on Car Number Plate. You can’t find Owner Name directly from the Car Number Plate (there can be repeating name). The other two fields, Car Name and Owner ID directly depends on the Car Number Plate.

We now have to further split two tables into other 2. First table from previous section is now split into two.

Car Number Plate	Customer ID	Date of Transaction
CarQ1234	C12	12/01/2020
CarQ5436	C12	18/01/2020
CarQ3421	C46	12/01/2020
CarQ6534	C46	14/01/2020
CarQ3789	C46	15/01/2020

And this…

Customer ID	Customer Name
C12	Sachin
C46	Rahul

Now the second table is also split into two:

Car Number Plate	Car Name	Owner ID
CarQ1234	Swift	076
CarQ5436	Thar	078
CarQ3421	Baleno	054
CarQ6534	Honda City	065
CarQ3789	Swift	086

An this…

Owner ID	Owner Name
076	Dev
078	Irfan
054	Rohit
065	Shikhar
086	Irfan

Why we normalize tables?

Anamolies. There are different type of anamolies.

What are different type of anamolies?

Updation Anamolies
Deletion Anamolies
Insertion Anamolies

We’ll get to know more about this with examples.

Updation Anamolies

So for setting the stage, let’s assume that our table is not normalized. Look at the table from 1st normal form.

Now assume that Dev has sold his car to Roushan. If you go ahead and replace the Owner Name in first row, you’ll still have N number of rows remaining to update.

We have to update fields at mulpitle place.

Deletion Anamolies

Although it’s not practicle in the table above. Assume that you have to remove a customer. If you remove a customer Sachin, you’ll be deleting all the data with it including Car Number Plate, Car Name, Owner ID/Name.

This is known as Deletion Anamolies.

Insertion Anamolies

In Insertion Anamolies, you can’t insert a new row without you have all the other data associated with that row. For example, you want to add a new car to the database. You also need a customer, date of transaction and so on.

This is known as Insertion Anamolies.

Why not to normalize database

If your system uses non-relational database, you might be against normalization. Although relation database can scale horizontally too, it’s not by design efficient at that.

If you have multiple table on multiple machines, you will have a network overload with each join. It does not scales well.

If you are using databases such as document based database, you’d want to have all related data on same document. So that when you shard, you don’t have to deal with network overload making the application slower.

Conclusion

This was my first post realted to databases in my blogging history. I have learn from it more than teaching from it. I hope this post was worth of your time too.

Cover image is (c) Integrify.com