Data Governance Killed by the Modern Data Stack

2 signs you are lacking strong practices and the problems that come with it

May 02, 2024

∙ Paid

Video killed the radio star.

The modern data stack killed data governance.

If you’re my age, you probably think of Just Dance every time you hear this song. I spent too many long nights dancing to this song with my friends in our basements.

Except this time, we are talking about the modern data stack coming in and killing data governance. What once seemed inaccessible to the average stakeholder or data team member, is now very accessible. Almost too much so.

With BI tools like Mode, if you don’t limit access from the very start, anyone who uses the tool can access all of the raw data in your data warehouse. With reverse ETL tools like Hightouch and Census, if you don’t create specific roles and users with fine-grained permissions, these tools have read-and-write access to your entire warehouse.

The days of radio are long gone, but let’s hope data governance doesn’t face the same fate.

Signs you are lacking in data governance

Unfortunately, data governance often isn’t a problem until it’s a HUGE problem. It saves the data team time to give everyone access, assuming nothing will go wrong. However, when something goes wrong, it often goes VERY wrong.

If stakeholders have access to raw data, whether in a data warehouse or analytics tool like Mode, it’s typically a sign you need to implement tighter access control.

If data models other than ones created by the data team live in dbt, you probably need to tighten your standards of what becomes a data model and who writes them.

If your data team lacks a version control tool like GitHub, or even protected branches within a version control tool, it is too easy to push mistakes to production.

Even smaller details, like lacking a dbt style guide in your GitHub repo or guidance on organizing your data models can show a lack of governance practices. Data transformation is included in governance, too!

A few problems I’ve seen…

There are two main problems I see with the lack of data governance, besides the obvious catastrophes that can occur. The first is the result of the lack of data governance and the second is one of the reasons we neglect it.

Everyone has access and thinks they are a data expert.

Having access to all of a company’s data enables stakeholders to bypass the data team and dig for insights themselves. While sometimes this is necessary due to the bandwidth of the data team, it can cause a lot of issues.

I’ve seen stakeholders writing queries in tools like Mode and then wondering why their results don’t match the data teams’. This causes trust issues between business and data teams, something that is very difficult to rebuild once lost.

Access to the data is one thing, but thinking you know exactly how to work with the data is another. There is a data team for a reason- unless self-service on a large scale is celebrated, your data may not be ready for everyone to have their hands on it.

Speed is prioritized over building a foundation.

When you are a small scrappy company trying to move as fast as possible to deliver results with a limited team, speed becomes king. Quality begins to lack to keep up with all of the demands. Unfortunately, you can’t have your cake and eat it too.

To focus on quality, you need to reduce speed.

When speed is prioritized, documentation, testing, and other best practices are neglected. Before you know it, your data environment gets out of hand and you have to spend even more time cleaning it up than you would have spent to get it right in the first place.

You need to see the end game and learn how to balance speed with high-quality work that won’t cost you in the future.

Speaking of building a solid foundation and producing high-quality work, I launched my first course- Transform Your Data Stack with dbt! This course is for all types of data professionals, beginner to intermediate, who wish to introduce (or refactor) dbt to help you scale your team's data.

You will learn how to:

Build a dbt project
Document your data following best practices
Define data quality tests using dbt packages
Write reusable data models and macros

Join me in May for 4 live sessions over 2 weeks where you will learn with me and interact with the greater data community!

Now back to the newsletter :)

A good first step…

Keep reading with a 7-day free trial

Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.