Window Functions - DBMS

Window functions are one of the most powerful features in SQL, and once we learn them, we wonder how we ever lived without them. They let us perform calculations across a set of rows without collapsing them into a single row like GROUP BY does.

In simple language: GROUP BY squashes rows together. Window functions compute values across rows but keep every row in the result.

The Syntax

Every window function uses the OVER() clause:

function_name() OVER (
    PARTITION BY column    -- divide rows into groups (optional)
    ORDER BY column        -- order within each group (optional)
)

PARTITION BY vs GROUP BY

This is the fundamental difference to understand:

-- GROUP BY: collapses rows (one row per department)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- Result: 2 rows (one per department)

-- PARTITION BY: keeps all rows, adds the aggregate as a new column
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- Result: all 5 rows, each with their department's average salary
-- name    | department | salary | dept_avg
-- Alice   | Eng        | 90000  | 85000
-- Bob     | Eng        | 80000  | 85000
-- Eve     | Eng        | 85000  | 85000
-- Charlie | Marketing  | 70000  | 72500
-- Diana   | Marketing  | 75000  | 72500

See how every row is preserved? That’s the magic.

Ranking Functions

ROW_NUMBER()

Assigns a unique sequential number to each row within a partition. No ties — if two rows have the same value, they still get different numbers.

-- Number employees by salary within each department
SELECT
    name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;

-- Result:
-- name    | department | salary | rn
-- Alice   | Eng        | 90000  | 1
-- Eve     | Eng        | 85000  | 2
-- Bob     | Eng        | 80000  | 3
-- Diana   | Marketing  | 75000  | 1
-- Charlie | Marketing  | 70000  | 2

Super useful for “top N per group” queries:

-- Top 2 highest paid employees per department
WITH ranked AS (
    SELECT
        name, department, salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 2;

RANK()

Like ROW_NUMBER, but ties get the same rank. After a tie, it skips numbers.

-- If Alice and Eve both earn 90000:
-- RANK: 1, 1, 3 (skips 2)
SELECT
    name, salary,
    RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;
-- name    | salary | rnk
-- Alice   | 90000  | 1
-- Eve     | 90000  | 1   ← same rank (tie)
-- Bob     | 80000  | 3   ← skips 2!

DENSE_RANK()

Same as RANK, but doesn’t skip numbers after ties.

-- DENSE_RANK: 1, 1, 2 (no gaps)
SELECT
    name, salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM employees;
-- name    | salary | drnk
-- Alice   | 90000  | 1
-- Eve     | 90000  | 1   ← same rank (tie)
-- Bob     | 80000  | 2   ← no gap!

Salary ROW_NUMBER RANK DENSE_RANK

90000111

90000211

8000033 ← skip2 ← no skip

70000443

LAG and LEAD — Looking at Neighboring Rows

LAG looks at the previous row. LEAD looks at the next row. These are incredibly useful for comparisons over time.

-- Compare each month's revenue with the previous month
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month,
    revenue - LAG(revenue) OVER (ORDER BY month) AS change
FROM monthly_revenue;

-- Result:
-- month   | revenue | prev_month | change
-- Jan     | 10000   | NULL       | NULL
-- Feb     | 12000   | 10000      | 2000
-- Mar     | 11000   | 12000      | -1000
-- Apr     | 15000   | 11000      | 4000

-- LAG with a default value (avoid NULLs for the first row)
LAG(revenue, 1, 0) OVER (ORDER BY month)
--            ^  ^
--            |  └── default value if no previous row
--            └── how many rows back to look

Running Totals with SUM() OVER()

This is a classic use case — compute a cumulative sum as we move through rows.

-- Running total of daily sales
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

-- Result:
-- order_date | amount | running_total
-- Jan 1      | 100    | 100
-- Jan 2      | 250    | 350
-- Jan 3      | 175    | 525
-- Jan 4      | 300    | 825

We can also do running totals per group:

-- Running total per department
SELECT
    department,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY department
        ORDER BY order_date
    ) AS dept_running_total
FROM orders;

Moving Averages with Frame Clauses

We can define a “window frame” — a sliding window of rows:

-- 3-day moving average
SELECT
    order_date,
    amount,
    AVG(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW  -- current + 2 previous
    ) AS moving_avg_3day
FROM orders;

Common Real-World Patterns

-- Percentage of total
SELECT
    department,
    salary,
    salary * 100.0 / SUM(salary) OVER () AS pct_of_total
FROM employees;

-- Find duplicates (keep one, mark the rest)
WITH numbered AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
)
DELETE FROM users WHERE id IN (
    SELECT id FROM numbered WHERE rn > 1
);

Interview Tip

Window functions are a favorite topic in SQL interviews because they separate people who know basic SQL from those who know it well. Practice the “top N per group” pattern (ROW_NUMBER + CTE + WHERE rn <= N) — it comes up constantly. Also, be ready to explain the difference between ROW_NUMBER, RANK, and DENSE_RANK with a tie scenario.