MySQL窗口函数与实践指南

一、窗口函数基础概念

1.1 什么是窗口函数

窗口函数(Window Function)是SQL标准中定义的高级分析功能,允许在查询结果集的特定"窗口"(即数据子集)上执行计算。与传统聚合函数不同,窗口函数不会将多行合并为单行结果,而是为每一行返回基于窗口内数据的计算结果。

1.2 MySQL版本支持

MySQL从8.0版本开始正式支持窗口函数,主要包含以下类型:

  • 排名函数:ROW_NUMBER(), RANK(), DENSE_RANK()
  • 分布函数:PERCENT_RANK(), CUME_DIST()
  • 前后函数:LAG(), LEAD()
  • 首尾函数:FIRST_VALUE(), LAST_VALUE()
  • 聚合函数窗口化:SUM(), AVG(), COUNT()等

1.3 与聚合函数的区别

通过对比表理解本质差异:

特性 窗口函数 聚合函数
结果行数 保持原表行数 合并为分组行数
计算范围 可定义滑动窗口 固定整个分组
使用场景 复杂分析(如排名、累计值) 简单统计(总和、平均)
性能影响 较高 较低
语法结构 必须包含OVER子句 直接使用
-- 聚合函数示例
SELECT department, AVG(salary) 
FROM employees 
GROUP BY department;

-- 窗口函数示例
SELECT name, salary, 
       AVG(salary) OVER(PARTITION BY department) 
FROM employees;

二、窗口函数核心语法

2.1 基本语法结构

函数名([参数]) OVER (
    [PARTITION BY 分区字段]
    [ORDER BY 排序字段 [ASC|DESC]]
    [frame_clause]
)

2.2 OVER子句详解

2.2.1 PARTITION BY

定义数据分区的依据,类似GROUP BY但不聚合:

SELECT 
    employee_id,
    department,
    salary,
    RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
2.2.2 ORDER BY

确定窗口内的数据排序方式:

SELECT 
    order_date,
    amount,
    SUM(amount) OVER(ORDER BY order_date) AS running_total
FROM sales;
2.2.3 窗口框架(Frame Clause)

精确控制窗口范围,语法:

ROWS | RANGE BETWEEN frame_start AND frame_end

常用框架范围:

  • UNBOUNDED PRECEDING:窗口开始
  • n PRECEDING:当前行前n行
  • CURRENT ROW:当前行
  • n FOLLOWING:当前行后n行
  • UNBOUNDED FOLLOWING:窗口结束

示例:计算3日移动平均

SELECT 
    date,
    sales,
    AVG(sales) OVER(
        ORDER BY date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg
FROM daily_sales;

三、常用窗口函数详解

3.1 排名函数

3.1.1 ROW_NUMBER()

连续唯一序号生成:

SELECT 
    product_id,
    category,
    price,
    ROW_NUMBER() OVER(PARTITION BY category ORDER BY price DESC) AS row_num
FROM products;
3.1.2 RANK()

允许并列排名并跳过后续序号:

SELECT 
    student_id,
    score,
    RANK() OVER(ORDER BY score DESC) AS rank
FROM exam_results;
3.1.3 DENSE_RANK()

允许并列但不跳过序号:

SELECT 
    employee_id,
    sales,
    DENSE_RANK() OVER(ORDER BY sales DESC) AS dense_rank
FROM sales_records;

3.2 分布函数

3.2.1 PERCENT_RANK()

计算百分比排名:

SELECT 
    student_id,
    score,
    PERCENT_RANK() OVER(ORDER BY score) AS pct_rank
FROM exam_scores;
3.2.2 CUME_DIST()

计算累计分布:

SELECT 
    salary,
    CUME_DIST() OVER(ORDER BY salary) AS cumulative_dist
FROM employees;

3.3 前后函数

3.3.1 LAG()

获取前N行数据:

SELECT 
    date,
    revenue,
    LAG(revenue, 1) OVER(ORDER BY date) AS prev_day_revenue
FROM daily_sales;
3.3.2 LEAD()

获取后N行数据:

SELECT 
    month,
    users,
    LEAD(users, 1) OVER(ORDER BY month) AS next_month_users
FROM user_growth;

3.4 首尾函数

3.4.1 FIRST_VALUE()

获取窗口首行值:

SELECT 
    department,
    employee,
    salary,
    FIRST_VALUE(salary) OVER(PARTITION BY department ORDER BY hire_date) AS first_salary
FROM employees;
3.4.2 LAST_VALUE()

获取窗口尾行值(注意框架范围):

SELECT 
    product_id,
    date,
    price,
    LAST_VALUE(price) OVER(
        PARTITION BY product_id 
        ORDER BY date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_price
FROM price_history;

3.5 聚合函数窗口化

SELECT 
    order_date,
    amount,
    SUM(amount) OVER(ORDER BY order_date) AS running_total,
    AVG(amount) OVER() AS overall_avg,
    COUNT(*) OVER(PARTITION BY customer_id) AS customer_orders
FROM orders;

四、高级应用场景

4.1 分组排名与筛选

获取各部门前3名高薪员工:

WITH ranked_employees AS (
    SELECT 
        department,
        employee,
        salary,
        DENSE_RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS rank
    FROM employees
)
SELECT * 
FROM ranked_employees 
WHERE rank <= 3;

4.2 累计计算

计算累计销售额和移动平均:

SELECT 
    date,
    daily_sales,
    SUM(daily_sales) OVER(ORDER BY date) AS cumulative_sales,
    AVG(daily_sales) OVER(
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS weekly_avg
FROM sales_data;

4.3 数据对比分析

计算同比/环比增长率:

SELECT 
    month,
    revenue,
    LAG(revenue, 12) OVER(ORDER BY month) AS prev_year,
    (revenue - LAG(revenue, 1) OVER(ORDER BY month)) / LAG(revenue, 1) OVER(ORDER BY month) AS mom_growth
FROM monthly_revenue;

4.4 分位数分析

计算工资四分位数:

SELECT 
    employee,
    salary,
    NTILE(4) OVER(ORDER BY salary) AS quartile
FROM employees;

五、性能优化策略

5.1 执行计划分析

使用EXPLAIN查看执行计划:

EXPLAIN 
SELECT 
    department,
    AVG(salary) OVER(PARTITION BY department)
FROM employees;

关键指标解读:

  • Using temporary:表示使用临时表
  • Using filesort:需要额外排序操作
  • window functions:存在窗口计算

5.2 索引优化技巧

为排序和分区字段创建复合索引:

CREATE INDEX idx_department_salary ON employees(department, salary);

5.3 分区策略优化

合理设置PARTITION BY字段:

-- 优化前
SELECT 
    ROW_NUMBER() OVER(ORDER BY sales DESC) 
FROM sales_records;

-- 优化后
SELECT 
    ROW_NUMBER() OVER(PARTITION BY region ORDER BY sales DESC) 
FROM sales_records;

5.4 框架范围控制

精确限定窗口范围减少计算量:

-- 计算最近3个月的移动平均
SELECT 
    month,
    revenue,
    AVG(revenue) OVER(
        ORDER BY month 
        RANGE BETWEEN INTERVAL 2 MONTH PRECEDING AND CURRENT ROW
    ) AS moving_avg
FROM monthly_sales;

六、注意事项与限制

  1. 版本兼容性:确认MySQL版本≥8.0
  2. 框架子句限制:RANGE模式不支持所有数据类型
  3. 内存消耗:大窗口可能导致内存压力
  4. 并行处理:窗口函数单线程执行
  5. 索引利用:ORDER BY子句走索引的条件
  6. 与GROUP BY配合:需要子查询处理
-- 错误示例
SELECT 
    department,
    AVG(salary),
    RANK() OVER(ORDER BY AVG(salary))
FROM employees
GROUP BY department;

-- 正确写法
SELECT 
    department,
    avg_salary,
    RANK() OVER(ORDER BY avg_salary) 
FROM (
    SELECT 
        department,
        AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
) t;
正文到此结束
评论插件初始化中...
Loading...