Your Guide to Slowly Changing Dimensions (SCDs)
What they are, why they're important, and the 4 different types
Understanding the purpose of slowly changing dimensions, and the different types that exist, is foundational knowledge for any analytics engineer.
In this article, I’ll review what they are, why they matter, and the different types that you’ll see in your work.
What are slowly changing dimensions?
A slowly changing dimension is a data record whose values change slowly over time.
A customer’s account data like email, address, and phone number is an example of a slowly changing dimension. The customer’s address won’t change every day, but it may change every two years.
The type of slowly changing dimension depends on how the changes are tracked in your database. There are four main types, with some additional types being a combination of these four.
Why do slowly changing dimensions matter?
To properly model your data, you need to understand how your database tracks changes.
Can records be updated and overwritten in the source data? You may need to create a snapshot of the source table using a tool like dbt, to ensure you can measure how the data changes over time.
Are new records inserted when a column value changes? You’ll need to apply a window function to find the most recent value of a record when the data source is used downstream.
Knowing how changes are handled in your database also allows you to have proper expectations about the work that can be done and set clear expectations with stakeholders on what’s possible. For example, you won’t be able to track historical changes to a product over time if you just now realize the product data is a Type 1 Slowly Changing Dimension!
Types of Slowly Changing Dimensions
There are four main types of slowly changing dimensions that you will encounter in your work as an analytics engineer. I’ve run into all of them while modeling data and each needs to be handled a certain way.
Type 1 Slowly Changing Dimensions
Type 1 Slowly Changing Dimensions overwrite old values in your data. The new value will replace the old value, without any record of what the old value was.
I find this method used in most databases and the one that causes the most issues for analytics engineers. Type 1 SCDs make it impossible to see how values in a record change over time. This can prove to be problematic for use cases such as tracking price changes.
In my last role, I discovered an issue with the way product packs were stored in our database. Product teams would reuse the same name, and therefore the same IDs, for product packs, but change the individual products included in them.
This made it easier to change what products were in a pack but made it impossible to know which products used to be in the pack at the time they were sold to a customer. We had no way to see which customers ordered which products at a specific point in time.
A possible solution for handling Type 1 SCDs is to create snapshot tables.
Type 2 Slowly Changing Dimensions
Type 2 Slowly Changing Dimensions insert changed values as new records in your database. They maintain the old records and the new records, meaning you will find duplicate “unique” IDs.
Unlike Type 1 SCDs, Type 2 has additional fields on the records that indicate a version number, or the timestamps in which the value is effective.
When using Type 2 SCDs, it’s important to consider both the unique ID and the timestamps when using the data in any joins. For example, if you are joining orders to customers, you will want to join on both of these fields to find the customer record valid at the time the customer placed their order.
Type 3 Slowly Changing Dimensions
Type 3 slowly changing dimensions maintain one record for the original primary key but utilize an original value field and a current value field. Unlike Type 2 SCDs, this tracks the previous and current values of a field in one record rather than separate records.
The major downfall of Type 3 SCDs is that they can only track one change in value rather than multiple changes. This works ok if a customer can only change their email once, for example, but it’s not ideal for attributes that can change multiple times.
Keep reading with a 7-day free trial
Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.