Understanding SQL Window Functions

Aug 22, 2024

SQL Queries Using Window Functions

Introduction to Window Functions

  • Window functions are also referred to as analytic functions in some RDBMS.
  • Important for SQL interviews - expect at least one question on window functions.
  • Allows solving complex SQL queries easily.

Example Use Cases

  • Fetching top three employees from each department based on salary.
  • Displaying minimum and maximum salary in each department for each employee record.

Syntax of Window Functions

  • Slightly different from regular SQL queries.
  • Includes functions like:
    • RANK
    • DENSE_RANK
    • ROW_NUMBER
    • LEAD
    • LAG
    • Using aggregate functions as analytic functions.

Aggregate Function Example

  • Created an employee table with columns: employee ID, name, department, salary.
  • Example query to find max salary:
    SELECT MAX(salary) AS max_salary FROM employee;
    
  • To find max salary by department, use:
    SELECT MAX(salary) AS max_salary FROM employee GROUP BY department_name;
    

Using Window Functions for More Detail

  • To display max salary alongside all employee details:
    SELECT e.*, MAX(salary) OVER() AS max_salary FROM employee AS e;
    
  • To extract max salary by department:
    SELECT e.*, MAX(salary) OVER(PARTITION BY department_name) AS max_salary FROM employee AS e;
    

Common Window Functions

ROW_NUMBER

  • Assigns a unique value to each record:
    SELECT e.*, ROW_NUMBER() OVER() AS rn FROM employee AS e;
    
  • To assign based on department:
    SELECT e.*, ROW_NUMBER() OVER(PARTITION BY department_name ORDER BY employee_id) AS rn FROM employee AS e;
    

RANK

  • Used to fetch top N employees based on salary:
    SELECT e.*, RANK() OVER(PARTITION BY department_name ORDER BY salary DESC) AS rnk FROM employee AS e;
    
  • To get top 3 employees:
    SELECT * FROM (SELECT e.*, RANK() OVER(PARTITION BY department_name ORDER BY salary DESC) AS rnk FROM employee AS e) AS x WHERE rnk < 4;
    

DENSE_RANK

  • Similar to RANK but does not skip ranks:
    SELECT e.*, DENSE_RANK() OVER(PARTITION BY department_name ORDER BY salary DESC) AS d_rnk FROM employee AS e;
    

LEAD and LAG

  • LAG: fetches previous record's value:
    SELECT e.*, LAG(salary) OVER(PARTITION BY department_name ORDER BY employee_id) AS previous_salary FROM employee AS e;
    
  • LEAD: fetches next record's value:
    SELECT e.*, LEAD(salary) OVER(PARTITION BY department_name ORDER BY employee_id) AS next_salary FROM employee AS e;
    

Practical Use Case for LEAD and LAG

  • Determine if current employee salary is higher/lower than previous employee:
    SELECT e.*, CASE WHEN salary > LAG(salary) OVER(...) THEN 'Higher' WHEN salary < LAG(salary) OVER(...) THEN 'Lower' ELSE 'Same' END AS salary_comparison FROM employee AS e;
    

Conclusion

  • Many other window functions exist (e.g., nth_value, first_value, etc.).
  • Encouragement to like and subscribe for more SQL, Python, and data science content.