Let's Learn SQL Window Functions
The top window functions and how to use them (with code examples!)
Whether you are solving a complex problem or are in the process of interviewing for a technical role that requires advanced SQL knowledge, it is important to understand how to use SQL window functions.
While it is not always necessary in the job interview process, knowing how to use them will be sure to impress interviewers and save you time when solving coding problems. These functions often simplify a very complex solution into one that is faster and easier to understand.
And, I’m not alone in my thoughts on this. Here’s what some of the data community had to say:
What is a window function?
A window function is a function that is executed across various table rows. Rather than looking at each row individually like a “normal” function, these usually look at the rows before and/or after in order to calculate the target row.
All window functions contain two key parts: PARTITION BY and ORDER BY clauses. While you don’t always have to use both of these, they are what makes a window function a window function.
PARTITION BY
If you aren’t familiar, PARTITION BY helps group the data so the count is restarted for each partition.
It helps to separate the rows into groups and then, within those groups, apply the particular function.
If you don’t include the PARTITION BY clause, the function will treat the entire dataset as a single group.
ORDER BY
ORDER BY then helps to order the values within each group. It orders them depending on the column you specify.
You must specify whether you want to order by the column ASC (values growing larger) or DESC (values getting lower). Make sure you pay close attention to this. If you order it in the wrong direction, some functions may work opposite of how you intended them to.
Most useful window functions
ROW_NUMBER()
RANK()
LEAD() and LAG()
FIRST_VALUE()
Let’s look at each of these individually to see what they do and the best way to use them in your SQL code. But first, know someone who would benefit from learning these functions? Share this with them!
ROW_NUMBER()
ROW_NUMBER() and RANK() are very similar and often used interchangeably, depending on your specific use case. ROW_NUMBER() is used to assign a number to a row based on its order in the table. You don’t have to use a column name in the parentheses since you are simply returning the order.
Here we are ordering the employees by their employee_id
and assigning each row a row number based on the employee’s order in the table.
SELECT
ROW_NUMBER() OVER(ORDER BY employee_id) AS row_number,
employee_id,
employee_name
FROM org_chart
This will result in an output that looks like this:
See how the employees are ordered from lowest employee_id
to highest with the corresponding row number.
RANK()
RANK() is used to order rows based on a certain column value. This is great for any type of ordering problem and super helpful due to the ORDER BY and PARTITION BY clauses.
Here we can use it to find the top spenders in the company:
SELECT
employee_name,
employee_id,
amount_spent,
department_id
RANK(amount_spent) OVER(ORDER BY amount_spent DESC) AS spend_rank
FROM employees
Notice that I include DESC after the ORDER BY clause. This is extremely important when using rank. If you don’t specify this the query might return the opposite of what you intended.
This query results in a table that looks like this:
Now, let’s look at the same example but using PARTITION BY.
SELECT
employee_name,
employee_id,
amount_spent,
department_id,
RANK(amount_spent) OVER(ORDER BY amount_spent DESC PARTITION BY department_id) AS spend_rank
FROM employees
Now we are ranking the top spenders in each department. Each time there is a new department_id
, the ranking will start from one or continue with the next number from where it left off.
Here you can see that each department is treated as a different group. Within each department, employees are ranked according to their spending. However, the query still returns the entire dataset.
Keep reading with a 7-day free trial
Subscribe to Learn Analytics Engineering to keep reading this post and get 7 days of free access to the full post archives.