4 dbt Mistakes to Avoid
What everyone does wrong when they decide they want to use dbt in their data stack
dbt is a great tool (even after being bought by Fivetran) to create a consistent, reusable, and well-documented codebase of data transformations. After all, there’s a reason why no other data transformation tool has come close to their success.
They understand the analytics engineer better than any other company.
However, a lot of people, especially when first thinking about a modern data strategy, see dbt as the answer to all their problems. It’s only one step in the right direction.
The tool only works well if you use it properly. Don’t make the same mistakes that so many people make when first adding dbt to their data stack.
Here are the most common ones to avoid:
Using staging/intermediate/core models as your sole “data modeling” strategy.
When I first learned about dbt, I thought THIS was a data modeling strategy. I was just starting my career as an analytics engineer and took the dbt documentation as a bible of all things data.
A year in, I began to learn more about other data modeling strategies like dimensional data modeling. This is when I realized that staging, intermediate, and core models were simply a way to organize your dbt projects. They don’t address any of the larger issues like accessibility, performance, grain, and business use cases.
Dimensional data modeling, for example, is the modeling technique your core data models should follow. Staging and intermediate layers are really just about cleaning and shaping the data so that you can then form it into facts and dimensions.
Not planning your DAG and creating spaghetti lineage.
Most people look at using dbt as their data modeling strategy. Great! We have a transformation tool! Now everything will be perfect, clear, and easier to use!
Except that’s not the case. Adding dbt to your data stack is only helpful when you use the tool with intention. You can’t just lift and shift your old reporting queries to dbt and expect it to work some kind of miracle.
When you start lifting and shifting queries from BI, for example, you are just moving the issues of not being able to track dependencies and make consistent code changes further upstream.
Instead, you need to identify similarities between your queries so you can isolate what should be its own upstream data model. This is just the first step in planning your DAG.
Planning your DAG requires you to:
Consider the grain of data you will need for BI dashboards and reporting
Consider what business logic overlaps and depends on one another
Think through when aggregations are appropriate and where they should live
Set guidelines around how intermediate models are used and what constitutes a new intermediate model
To be honest, it’s hard to find a data team that doesn’t have a messy DAG. They can get out of hand quickly, even when you are intentional about things. I’ve seen projects with more intermediate models under the hood of one core model than any person would deem necessary. It made it difficult to know which model was actually clean and ready to use. Once this happens, it’s really difficult to refactor without breaking something.
Not documenting, testing, and following specific style guidelines.
Do not underestimate a dbt style guide. I’ve shouted this from the rooftops so many times, but it’s for a reason. When you set coding and naming standards from the very beginning, you save yourself a whole headache of work in the future.
Style and naming conventions created a unified code base that is easy for everyone to work with. Column names and tests, for example, are predictable. A co-worker won’t be frustrated about where to find something or why it’s named in a weird way.
This is the small stuff that has the ability to eat away at your time in the future. Don’t make the mistake of letting the small stuff compound into a huge problem.
You can read more about the types of things you want to address in your style guide here. You’ll want this to live in the ReadMe of your GitHub repo!
Only thinking about dbt in isolation.
dbt isn’t your data tool. It’s one tool of a multi-step data pipeline, and it needs to be treated as so.
If we compare it to some cast iron pipes that carry water to and from your house, dbt is just one 4-foot piece of pipe. There are other pieces of pipe before and after it that are still needed to carry water to and from your house.
Think about what comes before and after dbt, letting this influence how you test and document your models. This means doing things like:
Documenting your sources and where the data is coming from (Airbyte, Fivetran, S3 bucket, etc.)
Adding freshness and data volume anomaly tests to all data sources (things can break outside of dbt!)
Documenting exposures in your BI or reverse ETL tools (so you know what depends on your dbt models when you make changes to them)
By doing this, you make dbt aware of what’s going on outside of the tool. This then determines whether your models should even run, if changes to your models are appropriate, or if they will break other resources outside dbt. Data never exists in isolation!
When you find ways for tools to know about one another, you ensure your data flows from start to end without any leaks. Just as you lose water with any cracks in the connecting joints of your pipes, you lose data when tools don’t properly connect.
Looking to add dbt to your data stack? Let’s chat. I help companies create AI-ready data strategies to centralize and secure your data.
Have a great week!
Madison Schott

