How NOT to Use Medallion Architecture
Differences between them and how to use them together while avoiding the most common mistakes and anti-patterns
Medallion architecture can quickly become everything that it shouldn’t. It’s easy to think of it as a replacement for data modeling, and it’s easy to view it as a filing cabinet for your various models.
Both of these things are what you shouldn’t be doing.
If you didn’t read last week’s newsletter with Riki Miko, I recommend starting there. This discusses the basics of Medallion Architecture, including why you would want to use it and the differences between its layers and their intended uses.
As analytics engineers and data architects, we understand these principles at heart, but when it comes to putting them into practice, it can be easy to abandon the fundamentals. Sometimes things just need to get done, and you fall into patterns. Let’s try to understand why that is and how that can be avoided when it comes to Medallion Architecture.
Special shoutout to Riki Miko once again for helping me out with today’s article on Medallion Architecture!
Medallion Architecture vs. Dimensional Data Modeling
Let’s get this straight: medallion architecture is not a replacement for dimensional modeling! It’s a lifecycle and pipeline management framework, not a specific modeling technique. Medallion focuses on how data flows, matures, and is curated, while dimensional modeling focuses on how data is structured specifically for consumption.
Dimensional modeling, as per Kimball, is consumer-first. It’s optimized for reporting and querying using objects like facts, dimensions, and star schemas. It requires heavy upfront business modeling and works best when domains are stable and well-understood. In other words, dimensional models are a packaged product designed for end-user analytics.
Medallion architecture, on the other hand, is process-first. It’s optimized for ingesting raw, semi-structured, or streaming data, integrating it, and progressively curating it into usable forms. Dimensional models can exist as Gold-layer assets in a medallion design, but it’s important to remember that they’re just one way to package curated data for consumption. It’s extremely important to keep this at the forefront of pipeline design!
When we recall that medallion is a progression from raw → clean → curated, you can think of dimensional modeling as one possible output at the end of that chain. The gold layer of medallion architecture and dimensional modeling aren’t competitors, they’re complementary.
You need the pipeline to reliably produce clean, integrated data (Silver) before you can structure it into star schemas or other analytics-ready forms (dimensional modeling in Gold).
Beyond structuring data, medallion architecture also directly enables analytics. By enforcing layered pipelines:
Analysts have trustworthy, reproducible data. No more pulling from random extracts or guessing business rules.
Each layer communicates semantic meaning and reliability, so teams know exactly what to use depending on their use case.
Different latency requirements are supported. Not every organization needs near-real-time assets, so understanding requirements and educating consumers is critical.
In short, medallion architecture sets the stage and provides the backbone for reliable analytics, while dimensional modeling is one way to package that curated data for business consumption. They work hand-in-hand: one ensures the data is trustworthy and integrated, the other makes it consumable and useful.
How to solve a problem using both solutions
Let’s pretend you’re modeling sales for an online retailer using customer, sales, and product data. The business wants to understand sales by different dimensions.
You start by collecting raw data from the source systems that track this data. All of this raw data gets dumped into the Bronze layer to be transformed. A few of these data sources are:
segment__eventshubspot__customerscustomersproductsorders
Next you need to make sure this raw data in the Bronze layer is actually usable. In the Silver layer, you will do basic transformations like casting, deduplication, and validation. This could look like:
validating that all orders are valid orders using specific business logic like comparing ship dates and return dates
unifying customer profiles by mapping emails, addresses, and other identifying data
creating Type 2 Slowly Changing Dimensions to track how products change over time in the raw
productsdata
Now, in the Gold layer, is where you can use your dimensional data modeling techniques. Again, this layer doesn’t have to follow dimensional data modeling, but it is where it would occur if you choose to use both frameworks.
Using the data that we cleaned in the Silver layer, we can form a fact model and supporting dimension models. fact_sales would exist as a model with one row per order line item. It would use sales data available in the Silver layer and never raw data directly from the Bronze layer.
Dimension tables such as dim_customers and dim_products would also be created using data in the Silver layer. Each of these tables would have a primary key that exists as a foreign key in fact_sales, allowing the tables to be easily joined.
You can read more about dimensional data modeling here.
Master the 4-Step Dimensional Modeling Process
Dimensional data modeling is one of the foundational skills that every analytics engineer needs to know. However, I didn’t start my career knowing anything about this.
Common Mistakes and Anti-Patterns
Even experienced teams stumble when implementing medallion architecture, so make sure you avoid falling into these traps!
Landing data in Bronze and then skipping Silver and moving it straight to Gold for one-off use cases.
This creates messy, redundant Gold datasets and inconsistent reporting, defeating the purpose of having a structured pipeline in the first place. Be sure to always use the Silver layer for your data cleaning rather than pushing this too far downstream into Gold. When the cleaning is in Gold, it prevents other models from using the clean datasets as well.
Building Silver from Gold.
Silver should never be built from Gold. The natural progression is raw, clean, curated. Breaking that chain can create circular dependencies, confusing lineage, and undermines trust in the system.
This is another issue I’ve been seeing a lot lately in my work. Once you start referencing gold in your Silver layer, it becomes really difficult to untangle and clean up. I’ve tried to do this but it feels like it’s impossible when you’re in the trenches and have hundreds of models built this way. Use this layer with intention from the start.
Not handling Slowly Changing Dimensions (SCD) properly across layers.
SCDs should be handled upstream in the Silver layer so that they can be used by multiple assets downstream in the Gold layer. Otherwise, you’d be duplicating the same logic in multiple Gold assets. Not building SCDs at all would prevent you from reconstructing history accurately.
Dimensional modeling too early.
Dimensional modeling should always happen in the Gold layer. When they are built in Silver, you lose the flexibility to support multiple downstream dimensional models. This also makes it harder to adapt to changing business requirements.
Silver is about cleansing and conforming your data to make it usable for analytics in the Gold layer. It’s not about creating models to be consumed by the business.
Creating too many consumer-specific tables with no reuse.
Without governance, duplication proliferates, compute costs balloon, and maintenance becomes a headache.
Someone recently asked me what the number one mistake data analysts make when transitioning to analytics engineering is and this is it. When building data products for your warehouse, rather than for one-time use or for a dashboard, reusability needs to be top of mind. You can’t keep writing queries the way you always have. Everything should be intentionally designed for as much reuse as possible.
Next week paid subscribers can look forward to a deep-dive on How to Decide on a Data Model Design. This will cover business discovery, designing a fact table, organizing your model layers, key design principles, and common pitfalls to avoid.
If you have any questions related to this, don’t forget to drop them in the subscriber chat so I can address them in next week’s newsletter.
Until then, have an awesome week!
Madison




Keeping reuse as a priority is paramount. We never followed that in my team and now we're facing duplicated logic and spaghetti code. Absolute nightmare.
Brillaint breakdown on the complementary relationship here. Thecalling out of Silver-to-Gold shortcuts is especially timely since I've seen teams basically treating Bronze-to-Gold as acceptable becuase "it's just one use case." The point about Silver being where clean, reusable datasets live makes so much sense once you see downstream chaos from skipping it.