SQL 101 — Mastering Data Filtering: WHERE, HAVING, and QUALIFY Clause

Nadia Selas Noversega
4 min readFeb 1, 2023

--

As a BI analyst, I have spent countless hours working with SQL to extract meaningful insights from large datasets. During my journey, I have learned that one of the most important techniques in SQL is filtering, which allows you to extract relevant information from a dataset.

Focusing on important data

There are three main clauses that you can use to filter SQL results: the WHERE clause, the HAVING clause, and the QUALIFY clause. While the three of them are used to filter data, each clause work differently.

Where

The WHERE clause in SQL is used to filter data based on specified conditions. It is added to a SELECT, UPDATE, or DELETE statement and is used to retrieve or modify only those records that meet specific criteria. The WHERE clause is used to specify the conditions that must be met for a record to be returned or affected by the statement.

For example, the following SQL statement uses the WHERE clause to retrieve all records from the customers table where the city column is equal to “London”:

SELECT * FROM customers 
WHERE city = 'London';

In this example, the WHERE clause is used to specify the condition “city = ‘London’”, which means that only those records where the “city” column contains the value “London” will be returned.

Though WHERE clause is powerful enough to filter data, it has some restriction that you have to put in mind:

  1. The use of indexes can affect the performance of the WHERE clause. If the columns referenced in the WHERE clause are not indexed, the query can take longer to run, as the database has to scan the entire table for the relevant records.
  2. NULL values can cause issues when using the WHERE clause, as NULL is not equal to any other value, including itself. To compare NULL values in a WHERE clause, you must use the IS NULL or IS NOT NULL operators.

Having

The HAVING clause in SQL is used to filter groups of rows based on aggregate values. It is used in conjunction with the GROUP BY clause, which groups rows from a table into aggregate values. The HAVING clause is used to specify conditions that must be met by the groups, rather than individual rows.

For example, consider a table of sales data, with columns for the date, product, and sales amount. To find the total sales for each product, you would group the data by product and use the SUM function to calculate the total sales for each group:

SELECT product, SUM(sales_amount) as total_sales
FROM sales_data
GROUP BY product
HAVING SUM(sales_amount) > 10000;

In this example, the HAVING clause is used to filter the groups based on the condition SUM(sales_amount) > 10000, which means that only those groups with a total sales greater than 10,000 will be returned.

Few things that you need to put in mind when using HAVING clause:

  1. NULL values can cause issues when using the HAVING clause, as aggregate functions may ignore or handle NULL values differently. It’s important to understand how aggregate functions treat NULL values, and to specify the appropriate conditions in the HAVING clause to account for them.
  2. The use of the HAVING clause can affect the performance of a query, as the database must perform the grouping and aggregation before applying the conditions specified in the HAVING clause. Indexes, indexes on aggregate columns, and proper use of the WHERE clause can improve query performance.

Qualify

The QUALIFY clause in SQL is used to filter the results of window functions. It allows you to apply conditions to the results of aggregate and window functions, making it a powerful tool for summarizing and filtering data.

Please note that not all databases support the QUALIFY clause. It is important to check the database documentation for more information.

For example, consider a table of sales data, with columns for the date, product, and sales amount. To find the top 5 products by sales, you could group the data by product and use the SUM function to calculate the total sales for each group:

SELECT product, SUM(sales_amount) as total_sales
FROM sales_data
GROUP BY product
QUALIFY RANK() OVER (ORDER BY SUM(sales_amount) DESC) <= 5;

To get the top 5 products, the query will process as following:

  1. Get all product and sales_amount from sales_data
  2. Grouping the data by product
  3. Summarize sales_amount per product as total_sales
  4. Sorting total_sales from the biggest to the smallest amount
  5. Assign rank for each row
  6. Filtering rank ≤ 5

All these filtering process (4–5) could be written into only single line QUALIFY RANK() OVER (ORDER BY SUM(sales_amount) DESC) <= 5;

Alternatively, if your database does not support QUALIFY , you can use Common Table Expression (CTE) to achieve the same result:

with sales as (
SELECT
product,
SUM(sales_amount) as total_sales,
ROW_NUMBER() OVER (ORDER BY SUM(sales_amount) desc) rownum
FROM sales_data
GROUP BY product
)

SELECT *
FROM sales
where rownum <= 5;

or if you also don’t have window function:

with sales as (
SELECT
product,
SUM(sales_amount) as total_sales,
FROM sales_data
GROUP BY product
)

SELECT *
FROM sales
ORDER BY total_sales desc
LIMIT 5

Don’t worry if your database doesn’t support a function. There’s always a way to get the same result, just be creative!

In conclusion, filtering data in SQL is a crucial technique for extracting meaningful insights from large datasets. The WHERE, HAVING, and QUALIFY clauses are the primary methods used for filtering data in SQL, each with its own unique purpose and use case. Understanding the difference between these clauses is important for making efficient and accurate data filtering decisions.

P.S: If you have any questions or suggestions to make this article better, do not hesitate to comment!

Fun fact: This article is 80% written with the help of ChatGPT. I only gave the main idea of each paragraph and the AI turns it into paragraph.

--

--

Nadia Selas Noversega
Nadia Selas Noversega

Written by Nadia Selas Noversega

Swimming in the sea of data 🏊 — Sharing my data journey. Visit my profile https://www.linkedin.com/in/nadselnov/

No responses yet