Will AI Replace Data Modeling?
3 reasons why this is the hardest data skill to hand over to AI
I’ve been experimenting a lot lately with Cursor for all kinds of analytics tasks. I’ve been using it to debug, make small changes, generate documentation, write complex SQL, and even plan a potential data model.
Some of these tasks it excels at, others it falls short. Planning a data model is one of those tasks where it seems to fall short. I find that I’m giving it all of my notes, only for it to regurgitate my thought process entirely. It doesn’t seem to add additional value other than the information I’ve given it.
Trying to use AI to plan a data model has given me a lot of thought into why I think it’s so hard for AI to model data the right way.
Data modeling began in the 1960s with the first database management systems. In the mid-80s, the first object-oriented database was created, and then in the 90s, the first relational database was created. Ralph Kimball introduced the idea of dimensional data modeling with dimension and fact tables in the late 1990s.
A data model organizes data elements and standardizes how the data elements relate to one another.
In other words, data modeling has been around for quite some time. It’s been here for a while, and it’s not going anywhere, even with the rise of AI. With more and more complex data being generated each and every day, it becomes even more important that there are the right people making sense of it all.
Data has a human component to it that machines will never be able to understand.
There’s a reason why data quality is constantly stressed. It’s something data teams continue to struggle with. Why? Because no matter how many times you try to safeguard against data quality issues, data is constantly changing.
Stakeholders are adding new processes and changing old ones, and engineers are changing source code. A thousand different things happen each day that can impact the quality of your code.
Data has a human component to it that no machine will ever be able to understand. Humans have industry and business knowledge from the natural flow of conversations that machines can’t get by looking at patterns in data.
This is the same reason why a long-time team member leaving a company is so painful. With losing them you lose years of domain knowledge that only they know.
Humans will always be needed to connect all the dots and ask the right questions that AI will never be able to.
As part of the budget model I’m building, I needed to understand the process of the person who's been manually creating budgets every month. I had to observe her process, ask questions, and understand why this process worked.
Every time I had a question on why something was done a certain way, I asked her. A lot of the time, what I was asking about was something that hadn’t been formally documented. There was no process yet. I was the one who had to help define the process that was then being modeled.
AI would most likely never know that the source data was generated by a broken process. Instead, it would try to deduce patterns from something that had no patterns. Sometimes data is generated based on user error or a lack of process. This understanding is a key aspect in the information-gathering stage of designing a data model.
Prompting edge cases to AI would take longer than coding the solution to handle the edge cases.
How many courses have you seen on how to write AI prompts? AI is virtually useless if you can’t tell it exactly what to do for you. That’s why there is a ton of training required if you want to properly use it! Especially for something as complicated as analytics engineering.
If you’ve ever asked AI to write you a line of complex code, you know how frustrating it can be to explain exactly what you need and then still have to translate that into the correct technical solution.
I don’t know about you, but I’d rather put all that effort into coding a solution that is guaranteed to work rather than spend time crafting the perfect ChatGPT prompt. For AI to understand even a smidge of what you’re solving, you have to explain how the entire business works and feed it the industry knowledge you’ve accumulated.
When you write the code to handle the edge cases, you are in control of the solution.
Don’t get me wrong, this gets more complicated when you hand over access to your data warehouse to an AI tool, but that comes with a whole other slew of problems.
Even if AI were to write an effective model, you lose all of the learnings that come with building something.
Data models aren’t only valuable for the metrics they produce. The process of data modeling requires learning, cleaning, understanding, and translating all of the business’s data. It gives you invaluable insights into the business and how it operates.
Data modeling helps you discover engineering bugs, flaws in processes, and even bugs in other data models. During the process, you not only build something new, but you make everything else in the ecosystem better.
Using a data model that you didn’t build becomes a black box. Probably a black box similar to the one that caused you to rewrite the data model in the first place… You can’t answer why code was written a certain way or what a certain metric means because you lost all of the context behind it when you shipped it off to AI.
A human can at least document every field and transformation in the code and why they did it. Transparency can be built into every layer of the model. AI can’t explain the human reasons for the things it does. It’s mechanical.
The future of AI
Don’t get me wrong, AI has a place in the data world. If you aren’t using it to improve your workflow, the best time to start is today.
Do I think it should be given free rein to model complex data? No. However, it can be a great tool to help you plan a data model, think through edge cases you haven’t considered, and help simplify complex code.
AI has a place in the data world when it’s done right. It needs to be used to solve the types of problems that it’s best at solving. It needs to help us humans do more of what we do best- understand the business and translate that into code.
In two weeks, I’ll be showing paid subscribers how I use Cursor to help model data, make small code changes, and find bugs. I’ll share my experiences on where it thrives and where it fails.
Have a great week!
Madison


The way I see it, people become worse in data modelling. The more powerful compute becomes the easier it is to ship without modelling. AI will make this skill even more rare and well paid.
Agree with everything here! This helped me think through my own workflows. I think the overlap between development and constraints helps me identify effective AI use cases. The more constraints you can feed to the AI, the more precise and accurate it's output. So one of the best times to employ AI is likely the point where you've identified all the constraints and now require a "first draft" development of sorts.