Aggregations & GROUP BY - DBMS

Aggregate functions take a bunch of rows and squash them down into a single value. Think of it like summarizing — instead of looking at 1,000 individual salaries, we want one number: the average.

The Five Core Aggregates

-- Sample data: employees table
-- name    | department | salary
-- Alice   | Eng        | 90000
-- Bob     | Eng        | 80000
-- Charlie | Marketing  | 70000
-- Diana   | Marketing  | 75000
-- Eve     | Eng        | 85000

SELECT
    COUNT(*) AS total_employees,     -- 5
    SUM(salary) AS total_payroll,    -- 400000
    AVG(salary) AS avg_salary,       -- 80000
    MIN(salary) AS lowest_salary,    -- 70000
    MAX(salary) AS highest_salary    -- 90000
FROM employees;

Quick Notes on COUNT

-- COUNT(*) counts ALL rows (including NULLs)
SELECT COUNT(*) FROM employees;  -- 5

-- COUNT(column) counts non-NULL values in that column
SELECT COUNT(manager_id) FROM employees;  -- might be 4 if one is NULL

-- COUNT(DISTINCT column) counts unique non-NULL values
SELECT COUNT(DISTINCT department) FROM employees;  -- 2 (Eng, Marketing)

GROUP BY — Splitting Into Buckets

Without GROUP BY, aggregates work on the entire table. With GROUP BY, we split rows into groups and aggregate each group separately.

Think of it like sorting physical papers into piles by department, then counting each pile.

-- Average salary PER department
SELECT
    department,
    COUNT(*) AS headcount,
    AVG(salary) AS avg_salary,
    MAX(salary) AS top_salary
FROM employees
GROUP BY department;

-- Result:
-- department | headcount | avg_salary | top_salary
-- Eng        | 3         | 85000      | 90000
-- Marketing  | 2         | 72500      | 75000

The Golden Rule of GROUP BY

Every column in SELECT must either be:

In the GROUP BY clause, OR
Inside an aggregate function

This is the rule that trips up beginners:

-- This FAILS (name is not grouped or aggregated)
SELECT name, department, AVG(salary)
FROM employees
GROUP BY department;
-- ERROR: column "name" must appear in GROUP BY or be in an aggregate

-- This WORKS
SELECT department, AVG(salary)
FROM employees
GROUP BY department;

Why? Because if we’re grouping by department, there are multiple names per group. The database doesn’t know which name to show.

HAVING vs WHERE

This is probably the most asked question about GROUP BY. Here’s the simple answer:

WHERE filters rows before grouping
HAVING filters groups after grouping

All Rows

→

WHERE

Filter rows

→

GROUP BY

Make groups

→

HAVING

Filter groups

→

Result

-- WHERE: filter individual rows BEFORE grouping
-- "Only look at employees hired after 2020"
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hired_at > '2020-01-01'    -- filters ROWS first
GROUP BY department;

-- HAVING: filter groups AFTER grouping
-- "Only show departments where the average salary is above 80K"
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 80000;      -- filters GROUPS

-- Both together:
-- "Among employees hired after 2020, show departments with avg salary > 80K"
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hired_at > '2020-01-01'
GROUP BY department
HAVING AVG(salary) > 80000;

The key thing: WHERE cannot use aggregate functions because it runs before grouping. HAVING can.

-- This FAILS
SELECT department, COUNT(*) FROM employees
WHERE COUNT(*) > 2 GROUP BY department;
-- ERROR: aggregate functions not allowed in WHERE

-- This WORKS
SELECT department, COUNT(*) FROM employees
GROUP BY department
HAVING COUNT(*) > 2;

SQL Execution Order

Understanding the order SQL actually executes helps a lot:

FROM — pick the table(s)
WHERE — filter individual rows
GROUP BY — form groups
HAVING — filter groups
SELECT — pick columns and compute aggregates
ORDER BY — sort the results
LIMIT — cap the output

This is why we can’t use a column alias from SELECT in a WHERE clause — SELECT hasn’t run yet!

-- This FAILS in most databases
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
HAVING avg_sal > 80000;  -- Can't use alias in HAVING (in strict SQL)

-- This WORKS
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > 80000;  -- Repeat the expression
-- (PostgreSQL and MySQL are more lenient here, but standard SQL requires this)

Practical Examples

-- Top 3 departments by headcount
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
ORDER BY headcount DESC
LIMIT 3;

-- Departments where everyone earns at least 60K
SELECT department, MIN(salary) AS min_salary
FROM employees
GROUP BY department
HAVING MIN(salary) >= 60000;

-- Monthly revenue breakdown
SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS total_orders,
    SUM(amount) AS revenue,
    AVG(amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Grouping by Multiple Columns

We can group by more than one column. Each unique combination of values becomes a group.

-- Average salary by department AND job level
SELECT department, job_level, AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_level
ORDER BY department, job_level;

-- Result:
-- department | job_level | avg_salary
-- Eng        | Junior    | 70000
-- Eng        | Senior    | 95000
-- Marketing  | Junior    | 60000
-- Marketing  | Senior    | 85000

Interview Tip

The WHERE vs HAVING question is guaranteed in DBMS interviews. The one-line answer: “WHERE filters rows before grouping, HAVING filters groups after grouping.” But go further — explain that WHERE can’t use aggregates because it runs before GROUP BY in the execution order. That shows deep understanding.