MySQL窗口函数与实践指南
一、窗口函数基础概念
1.1 什么是窗口函数
窗口函数(Window Function)是SQL标准中定义的高级分析功能,允许在查询结果集的特定"窗口"(即数据子集)上执行计算。与传统聚合函数不同,窗口函数不会将多行合并为单行结果,而是为每一行返回基于窗口内数据的计算结果。
1.2 MySQL版本支持
MySQL从8.0版本开始正式支持窗口函数,主要包含以下类型:
- 排名函数:ROW_NUMBER(), RANK(), DENSE_RANK()
- 分布函数:PERCENT_RANK(), CUME_DIST()
- 前后函数:LAG(), LEAD()
- 首尾函数:FIRST_VALUE(), LAST_VALUE()
- 聚合函数窗口化:SUM(), AVG(), COUNT()等
1.3 与聚合函数的区别
通过对比表理解本质差异:
特性 | 窗口函数 | 聚合函数 |
---|---|---|
结果行数 | 保持原表行数 | 合并为分组行数 |
计算范围 | 可定义滑动窗口 | 固定整个分组 |
使用场景 | 复杂分析(如排名、累计值) | 简单统计(总和、平均) |
性能影响 | 较高 | 较低 |
语法结构 | 必须包含OVER子句 | 直接使用 |
-- 聚合函数示例
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
-- 窗口函数示例
SELECT name, salary,
AVG(salary) OVER(PARTITION BY department)
FROM employees;
二、窗口函数核心语法
2.1 基本语法结构
函数名([参数]) OVER (
[PARTITION BY 分区字段]
[ORDER BY 排序字段 [ASC|DESC]]
[frame_clause]
)
2.2 OVER子句详解
2.2.1 PARTITION BY
定义数据分区的依据,类似GROUP BY但不聚合:
SELECT
employee_id,
department,
salary,
RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
2.2.2 ORDER BY
确定窗口内的数据排序方式:
SELECT
order_date,
amount,
SUM(amount) OVER(ORDER BY order_date) AS running_total
FROM sales;
2.2.3 窗口框架(Frame Clause)
精确控制窗口范围,语法:
ROWS | RANGE BETWEEN frame_start AND frame_end
常用框架范围:
UNBOUNDED PRECEDING
:窗口开始n PRECEDING
:当前行前n行CURRENT ROW
:当前行n FOLLOWING
:当前行后n行UNBOUNDED FOLLOWING
:窗口结束
示例:计算3日移动平均
SELECT
date,
sales,
AVG(sales) OVER(
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM daily_sales;
三、常用窗口函数详解
3.1 排名函数
3.1.1 ROW_NUMBER()
连续唯一序号生成:
SELECT
product_id,
category,
price,
ROW_NUMBER() OVER(PARTITION BY category ORDER BY price DESC) AS row_num
FROM products;
3.1.2 RANK()
允许并列排名并跳过后续序号:
SELECT
student_id,
score,
RANK() OVER(ORDER BY score DESC) AS rank
FROM exam_results;
3.1.3 DENSE_RANK()
允许并列但不跳过序号:
SELECT
employee_id,
sales,
DENSE_RANK() OVER(ORDER BY sales DESC) AS dense_rank
FROM sales_records;
3.2 分布函数
3.2.1 PERCENT_RANK()
计算百分比排名:
SELECT
student_id,
score,
PERCENT_RANK() OVER(ORDER BY score) AS pct_rank
FROM exam_scores;
3.2.2 CUME_DIST()
计算累计分布:
SELECT
salary,
CUME_DIST() OVER(ORDER BY salary) AS cumulative_dist
FROM employees;
3.3 前后函数
3.3.1 LAG()
获取前N行数据:
SELECT
date,
revenue,
LAG(revenue, 1) OVER(ORDER BY date) AS prev_day_revenue
FROM daily_sales;
3.3.2 LEAD()
获取后N行数据:
SELECT
month,
users,
LEAD(users, 1) OVER(ORDER BY month) AS next_month_users
FROM user_growth;
3.4 首尾函数
3.4.1 FIRST_VALUE()
获取窗口首行值:
SELECT
department,
employee,
salary,
FIRST_VALUE(salary) OVER(PARTITION BY department ORDER BY hire_date) AS first_salary
FROM employees;
3.4.2 LAST_VALUE()
获取窗口尾行值(注意框架范围):
SELECT
product_id,
date,
price,
LAST_VALUE(price) OVER(
PARTITION BY product_id
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_price
FROM price_history;
3.5 聚合函数窗口化
SELECT
order_date,
amount,
SUM(amount) OVER(ORDER BY order_date) AS running_total,
AVG(amount) OVER() AS overall_avg,
COUNT(*) OVER(PARTITION BY customer_id) AS customer_orders
FROM orders;
四、高级应用场景
4.1 分组排名与筛选
获取各部门前3名高薪员工:
WITH ranked_employees AS (
SELECT
department,
employee,
salary,
DENSE_RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees
)
SELECT *
FROM ranked_employees
WHERE rank <= 3;
4.2 累计计算
计算累计销售额和移动平均:
SELECT
date,
daily_sales,
SUM(daily_sales) OVER(ORDER BY date) AS cumulative_sales,
AVG(daily_sales) OVER(
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS weekly_avg
FROM sales_data;
4.3 数据对比分析
计算同比/环比增长率:
SELECT
month,
revenue,
LAG(revenue, 12) OVER(ORDER BY month) AS prev_year,
(revenue - LAG(revenue, 1) OVER(ORDER BY month)) / LAG(revenue, 1) OVER(ORDER BY month) AS mom_growth
FROM monthly_revenue;
4.4 分位数分析
计算工资四分位数:
SELECT
employee,
salary,
NTILE(4) OVER(ORDER BY salary) AS quartile
FROM employees;
五、性能优化策略
5.1 执行计划分析
使用EXPLAIN查看执行计划:
EXPLAIN
SELECT
department,
AVG(salary) OVER(PARTITION BY department)
FROM employees;
关键指标解读:
Using temporary
:表示使用临时表Using filesort
:需要额外排序操作window functions
:存在窗口计算
5.2 索引优化技巧
为排序和分区字段创建复合索引:
CREATE INDEX idx_department_salary ON employees(department, salary);
5.3 分区策略优化
合理设置PARTITION BY字段:
-- 优化前
SELECT
ROW_NUMBER() OVER(ORDER BY sales DESC)
FROM sales_records;
-- 优化后
SELECT
ROW_NUMBER() OVER(PARTITION BY region ORDER BY sales DESC)
FROM sales_records;
5.4 框架范围控制
精确限定窗口范围减少计算量:
-- 计算最近3个月的移动平均
SELECT
month,
revenue,
AVG(revenue) OVER(
ORDER BY month
RANGE BETWEEN INTERVAL 2 MONTH PRECEDING AND CURRENT ROW
) AS moving_avg
FROM monthly_sales;
六、注意事项与限制
- 版本兼容性:确认MySQL版本≥8.0
- 框架子句限制:RANGE模式不支持所有数据类型
- 内存消耗:大窗口可能导致内存压力
- 并行处理:窗口函数单线程执行
- 索引利用:ORDER BY子句走索引的条件
- 与GROUP BY配合:需要子查询处理
-- 错误示例
SELECT
department,
AVG(salary),
RANK() OVER(ORDER BY AVG(salary))
FROM employees
GROUP BY department;
-- 正确写法
SELECT
department,
avg_salary,
RANK() OVER(ORDER BY avg_salary)
FROM (
SELECT
department,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) t;
正文到此结束
相关文章
热门推荐
评论插件初始化中...