SQL 101 — CTE (Common Table Expression)
Have you found yourself in a situation where you needed several steps to process data until you get the desired results, but creating a bunch of views and tables was not an option? In situations like this, I believe most of us will use a subquery.
During my early days in data, I would abuse subquery in from
clause and then become confused when re-read the query later because my query was chaotic, and in the end, I would re-do my query just to modify things. Just think about it, even the future me cannot understand the past me, who made the query, let alone other people. This would be a serious problem if I ever resign and need to hand over my job.
After reading lots of articles and hours on YouTube, I finally found a better way to do subqueries, which was using a CTE. This might save you and your teammates hours of confusion in understanding a legacy query.
CTE (Common Table Expression)
As cited from sqlshack.com, CTE or Common Table Expression is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. In short, it allows you to create an imaginary table within a query that you can select and process later on in the same query. There are two types of CTE: recursive CTE and non-recursive CTE. In this article, I will focus on non-recursive CTE.
So, what are the advantages of using CTE rather than normal subquery in from clause?
- Better readability
- Reusability of the temporary tables
Performance-wise, I think it is on par with using a normal subquery because CTE basically is also a subquery.
Please check compatibility with your database before using the window function. For MySQL, the window function is available in MySQL 8.
CTE Syntax
CTE is created using a with
statement. It is possible to create multiple CTE in one query. I will give an example on using CTE.
I have a student_score table:
I need to get the names of students who ranked 1 & 2 in each class.
Our plans:
- Rank each student within their class
- Create a filter so only students with ranks 1 &2 appear.
The basic query for CTE is
with [CTE_name] as(your query for CTE)SELECT…from [CTE_name]
For our case, I will write query as:
with student_rank as (
SELECT
rank() OVER (PARTITION BY class ORDER BY score DESC) rank,
class,
name,
score
FROM student_score
ORDER BY class asc, score desc
)SELECT
*
from student_rank
where rank in (1,2)
Plan #1:
Within with
statement, we created an imaginary table named student_rank
to add rank
field from student_score
table that we had
Plan #2:
Now that we had ranked all students within their class, all we need is to filter the table to get show only #1 and #2. To do this, we select fields from our preprocessed imaginary table student_rank
and apply filter where rank in (1,2)
.
This would be our results:
If your query gets more complex, it is possible to create more than one CTE and use CTE #1 results in CTE#2’s query.
Use Cases
CTEs are helpful in many ways especially when subqueries are needed. In my experience, I used CTE to:
- Create cohort tables (appears in many technical tests)
- Determine rank (lowest, second-lowest, etc) — (appears in many technical tests)
- Category grouping before another grouping
- I will add more.. (or if you have another common use case please write in comment section below)
Write queries in CTE form and you (and your teammates) will save a lot of time! (learned this the hard way T_T)
P.S: If you have any questions or suggestions, do not hesitate to comment!