As data people, we are constantly hearing about normalized and denormalized data. Many of us understand the high-level concept of it, but have we taken the time to understand the different forms?
Well, coming from an untraditional tech background, where I had no formal engineering education, I never did. Until now.
Understanding the different forms of normalization will help you build better data pipelines, especially when deciding on how to ingest your data using a tool like Airbyte.
When you understand the configurations you are setting up, and why you would want to do them one way versus another, you produce high-quality data that you can properly assert your expectations on. You also become better at modeling the data because you know exactly how it’s going to behave, or how it should behave.
In this article, I’ll walk through the 5 forms of normalization and the differences between them. We will then look at some examples of normalization in the wild!
Types of Anomalies
There are a few types of problems that can occur in your database when your data is not properly normalized. Let’s review what these are.
Update anomaly
This occurs when you update a record and unintentionally affect the validity of other records in your table.
Let’s say you have a table with order_id
, product_id
, and product_name
. You have two records that have a product_id
of 11 and a product_name
of “serenity crayons”. The company decides to rename the product to “calm crayons”. You only update the one record’s product_name
but not the other. Now you have two records with conflicting data, making it unclear which one has the correct value.
Deletion anomaly
This occurs when you delete a record in one place but not another. This causes inconsistencies across tables, making it hard to know which records are accurate.
If we think of the table from the previous example and instead want to delete product_id
11, it would be easy to delete one of these records but not the other. Then, we do not know if this product was truly deleted or not.
Insert anomaly
Insert anomalies prevent you from adding important data to your tables because of the lack of that data in another spot. This may occur if you need to add a product to an order record but can’t add the product in the system until someone orders it.
The 5 Forms of Normalization
They get more and more “normal” as they go on…
Keep reading with a 7-day free trial
Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.