It’s your first day as an analytics engineer on a new team. You’ve been hired to refactor legacy code and introduce dbt to the team’s data stack.
During your first week, you find that all of the company’s code is stored in Tableau, making it difficult to reuse logic and keep track of metric calculations.
For example, sessions are calculated in multiple dashboards, each with a slightly different definition. You have no idea which (if any) is the correct one!
Because of the conflicting definitions, you decide that the data is unreliable and it is best to build each data model from scratch, clarifying each piece of business logic with your stakeholders.
You start by creating a sessions model in dbt, this way it can be reused in all of the different dashboards that reference sessions. Having it in one place, within dbt, will create a source of truth.
This process is called shifting the logic left.
The Problem
Core business logic is written within BI tools to die. Once it is defined in this layer, likely, it will never make it out. It’s hard to access and even more difficult to update.
When business logic is written within BI tools, it:
prevents a source of truth
creates “lagging” logic (some metric calculations will contain old logic while others contain new)
lacks proper documentation to understand what metrics and fields mean
lacks standardization in naming conventions, coding styles, etc.
Storing logic within BI isn’t scalable or sustainable. Once you reach a certain size, you have no choice but to adopt a better way of doing things. You need to be more strategic in how you choose to organize and write your data models.
The Solution
“Shifting left” refers to the idea that you should push your business logic and calculations as far upstream as possible.
Rather than your core logic living in your BI tool, it would live in a tool like dbt. This way, it is no longer trapped in a spot at the very end of the data pipeline, but exists in the middle, ready to be used by all kinds of tools and platforms.
When done right, shifting your logic left allows you to solve all the problems I mentioned earlier.
Creates a source of truth
You should always have one location that you look to for the truth in the numbers. This can be directly within an application, the raw data, or a data model. It is the spot where you have the most confidence in the data.
Defining your logic in BI tools prevents you from having a central location for your data. Instead of the logic being stored in one place, it is stored in multiple places. This makes it difficult to know which definition to depend on, especially with varying definitions.
Shifting this logic left into a transformation tool allows you to define core logic once and only once. It becomes the go-to place to look when you experience discrepancies or data quality issues.
However, you do need to do things the right way so that you don’t create the same problem in your transformation tool that you created in the BI tool. This is the idea of modularity that I talk about a lot in my course, Transform Your Data Stack with dbt.
Prevents lagging logic
When logic lives in BI tools, there is no way to find the same metric calculation across different dashboards and reports. You have to remember where to look for each metric.
This makes updating calculations a pain. When you forget they exist, they don’t get updated and therefore caused inaccurate data.
If the discrepancy is caught, it’s typically by a stakeholder who suspecta something wrong with the numbers.
Shifting the logic left, out of the BI tool and into a data transformation tool, centralizes the logic. Now, instead of defining the same thing in multiple different places, you can define it once and then reference it in multiple reports.
This way, when a calculation needs to be updated, it’s updated directly within the transformation layer, in one spot. Now, that update can trickle down into the multiple places downstream where it is referenced.
Makes the code more reusable
When there is a central, upstream place where your business logic lives, your data models become reusable. Instead of copying and pasting the same code over and over again, you can reference an upstream data source that you’ve crafted.
Not only does this make things easier for data analysts and analytics engineers alike, but it speeds up the performance of your models and reduces clutter in your stack.
The goal of any transformation tool, but particularly dbt, is to make your data models more modular so that they can be used for multiple different purposes. The days of one-off coding are done!
Increases code standards
BI tools are great for visualizing data, but they lack a lot of features needed to produce high-quality data. This is exactly the reason you shouldn’t be producing your data within them.
Transformation tools like dbt prioritize testing, documentation, and high-quality code. They enable you to make your data discoverable, reusable, and well-tested. All of these features will help improve your data models immensely and give analysts what they need to produce their best work within BI tools.
Companies helping do this
While data transformation living in BI tools is still a huge problem, many companies are making it their goal to solve it.
Keep reading with a 7-day free trial
Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.