3 AI-Proof Analytics Engineering Skills
Because discernment is necessary and you can't have it without a solid foundation
If you’ve taken an AI fundamentals course, you’re probably familiar with the terms discernment and hallucination. These are two concepts whose importance is stressed heavily when it comes to AI.
Why?
AI is often wrong. It could have been trained on poor data. It could have been trained on outdated data. Despite what many people want to believe, it’s not the truth (yet).
AI hallucinates, which is why discernment is so important.
AI hallucinating when it comes to analytical use cases is no exception. Today, when using Claude Code to “dbt-ify” my SQL notebook, it used source functions to reference models within the project. Any dbt user, even a complete beginner, knows that models are referenced with a {{ ref() }} function.
Of course, this is a simple example, but this is why it’s so important to have a strong analytics engineering foundation when it comes to AI. Or you might just end up with a repo full of sloppy AI-written code…
In today’s article, we’ll talk about the core foundational skills that you must master as an analytics engineer aside from AI. These are the skills that will allow you have discernment when using tools like Cursor and Claude Code. Because, if you don’t understand these concepts at their core, you won’t be able to use AI effectively as an analytics engineer.
I’m officially dedicating the Learn Analytics Engineering chat to all things AI. I’ll be sharing my use cases, wins, and fails using AI for analytics engineering in real time in the chat.
This will become a place for us to all connect and learn from one another on how to apply AI to our existing workflows. Paid subscribers will get to learn with me in real-time and to share ideas with one another.
Data Modeling
You can give Claude Code all of the business context it needs and ask it to code a solution in SQL, but you won’t get something that can scale with the business. You also won’t understand what the heck you just build.
Every analytics engineer needs to understand the basic principles of data modeling so they can build scalable models with AI. You can start by reading The Data Warehouse Toolkit and understanding the why behind the ways data models should be built. You don’t often understand the why until you’ve encountered multiple issues with data models built the wrong way.
Taking the time to study atomic grain, facts, dimensions, and additivity allows you to then spot when AI does these things wrong. It also helps you in giving AI clear context on how you need your data models to be built and used.
If you don’t understand these things yourself, you can’t guide agents to build them for you. This is why junior developers are struggling so much. They haven’t gained hands-on experience to discern data modeling issues with AI.
Take a step back and learn the principles first before allowing AI to assist you in building data models.
Here are some resources you can start with:
SQL
AI is pretty decent at writing SQL, I’ll admit. It’s never been easier to write SQL. However, poorly-written SQL, just like poorly written code, is still a thing. AI agents may add 100 lines of complex code for something that could have been written in 5 lines with a QUALIFY statement.
Make sure you understand:
difference between joins
window function behavior
order of execution
behavior of NULL values
These are all things that can easily slip past your gut check of AI-generated SQL, but make a huge difference in the quality of your data models and reporting.
If you make mistakes with the above, your analytics work will be wrong. With using Cursor I’ve found unnecessary cross joins that caused fanning out of data, NULL values that should have been handled upstream, and weird uses of window functions that made the code impossible to read.
All of these issues wouldn’t have been discovered if I didn’t already have a solid foundation of SQL skills.
Read up on these resources to master the SQL basics:
Embedding business context into data
Stakeholders are adding new processes and changing old ones, and engineers are changing source code. A thousand different things happen each day that can impact the quality of your code.
Data has a human component to it that no machine will ever be able to understand. Humans have industry and business knowledge from the natural flow of conversations that machines can’t get by looking at patterns in data.
This is the same reason why a long-time team member leaving a company is so painful. With losing them you lose years of domain knowledge that only they know.
Humans will always be needed to connect all the dots and ask the right questions that AI will never be able to.
As part of the budget model I’m building, I needed to understand the process of the person who’s been manually creating budgets every month. I had to observe her process, ask questions, and understand why this process worked.
Every time I had a question on why something was done a certain way, I asked her. A lot of the time, what I was asking about was something that hadn’t been formally documented. There was no process yet. I was the one who had to help define the process that was then being modeled.
AI would most likely never know that the source data was generated by a broken process. Instead, it would try to deduce patterns from something that had no patterns. Sometimes data is generated based on user error or a lack of process. This understanding is a key aspect in the information-gathering stage of designing a data model.
Don’t forget to join the chat to get real-time updates on how I’m using AI in my analytics engineering workflows! Yesterday I shared how I used Claude Code to plan a model refactor. Hint: not everything went well, but a lot of it did due to how I planned and provided context.
Have a great week!
Madison








Totally makes sense. Data is so messy in real life and constantly changing like you mentioned in that post. AI will have a tough time with it. I've also noticed that unlike other problems like SQL there isn't a huge knowledge base around dara modeling out there. So I wonder if it struggles even more because of that as well
Everything NLP is a superpower right now while the industry is mesmerized by mediocrity.