How to Orchestrate Your Data Pipeline

#6: Tasks, flows, and schedules- oh my!

Sep 12, 2024

∙ Paid

Can you believe it’s already September? While summer isn’t officially over yet, the season is beginning to wrap up. Kids are going back to school, temperatures are cooling down (except in Phoenix 😅), and fewer ice cream cones are being eaten.

You’ve worked so hard over the last few weeks to build your data pipeline. I’m so proud of you! Now, we are in the final stages where the magic will all come together right in front of your own eyes.

We will orchestrate Airbyte and dbt so that new data ingests into your data warehouse and your data models run subsequently. Throughout the newsletter, I’ll point out pipeline best practices that can be applied to any orchestration tool.

If you’ve missed any of the Data Pipeline Summer series, I highly recommend following along from the beginning:

1️⃣ Building Blocks of a Data Pipeline

2️⃣ How to Setup a Data Warehouse for Analytics

3️⃣ How to Ingest Data with Airbyte

4️⃣ How to Write an Incremental Data Model with dbt

5️⃣ How to Test Your Data Models with dbt

Each of these articles focuses on an individual component of the pipeline. Now that each of them is set up, we need to make sure they all play nice together.

Orchestrators allow you to schedule the pieces of your pipeline in a way where the data flows seamlessly from one part to another. Once the schedule of the pipeline is defined, the pieces will all run one after another, starting with data ingestion.

Today we’ll use Prefect to define and schedule tasks using Python, orchestrating our data pipeline to run hands-off.

Keep reading with a 7-day free trial

Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.