SQL JOINs with Comparison Operators (>, <, >=, <=)
How to use SQL to find MRR and session events
A few days ago, I found myself reading through hundreds of lines of Ruby code written by our software engineering team. I don’t know how to code in Ruby. Nor have I ever read Ruby code before.
Luckily, if you’re familiar with one object-oriented language, you’re kinda familiar with them all. This is one of the benefits of having a diverse set of skills- you can finesse just about anything.
One condition in the code compared the subscriber limits of different plans, determining how many plan tiers an account could jump. While all of this data was available in the Ruby code, this data lives across different tables in the data warehouse.
If I wanted to translate this to SQL, I needed to JOIN multiple tables using comparison operators. This brings me to today’s article- how to join tables using comparison operators like less than, greater than, and greater than or equal to.
Joining data in SQL using comparison operators is foundational SQL knowledge and something that’s often needed when data modeling and calculating metrics.
It’s almost time for Data Pipeline Summer!
This is a 6-week newsletter series where I walk you through how to create your own data pipeline using open-source, modern data stack tools!
The challenge will be available to paid subscribers only, so be sure to sign up if you’re interested in the hands-on experience of building a data pipeline.
Throughout the 6 weeks, you will learn to use the most popular modern data stack tools including Airbyte, Snowflake, and dbt!
By the end of the summer, you will have built a data pipeline that you can add to your GitHub, contribute to your resume experience, or help with your current role.
Gather up your sunscreen, flip-flops, and swimming trunks- it’s almost time to start swimming!
Why use comparison operators when joining in SQL?
Comparison joins allow you to fan out your data by comparing dates or numerical fields. What do I mean by this? One record in table A can be joined to multiple records in table B.
This isn’t always ideal, like when you unknowingly have duplicate records in one table and can’t figure out why. However, when you know every record in each table is unique, and purposely join one record in table A to multiple in table B, it can be quite powerful.
An example of this is creating date ranges. If you have a subscription with a start and end date but want a record for every month the subscription is active, you can join subscriptions to a dim_dates
table using a comparison operator. This will create multiple records for a subscription, one for each month the subscription is active.
-- active subscription months
SELECT
dim_dates.month,
subscriptions.subscription_id
FROM subscriptions
JOIN dim_dates
ON subscriptions.start_date >= dim_dates.month
AND subscriptions.end_date <= dim_dates.month
Not to mention, conditional joins are also more efficient than joining on a key and filtering the results using WHERE. This comes down to SQL’s order of execution.
-- ineffieicent query using WHERE instead of conditional operators
SELECT
dim_dates.month,
subscriptions.subscription_id
FROM subscriptions
JOIN dim_dates
WHERE subscriptions.start_date >= dim_dates.month
AND subscriptions.end_date <= dim_dates.month
JOINs execute before WHERE, so in the example above, every subscription_id
would need to first be joined to every month. In the next step, the records that don’t fit the specified date range will be filtered away. However, because of the initial JOIN on every record, this query would be extremely slow.
When joining using conditional operators, you aren’t joining records that don’t return what you are looking for, meaning the query scans less data.
How conditional joins work
Keep reading with a 7-day free trial
Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.