MySQL聚合查询与联合查询及优化实践
聚合查询的核心技术与应用场景
1. 基础聚合函数实践
五个核心函数构成数据处理基石:
-- 统计注册用户总数
SELECT COUNT(*) AS total_users FROM users WHERE status=1;
-- 计算商品销售总额
SELECT SUM(price * quantity) AS total_sales FROM order_details;
-- 获取最高客单价记录
SELECT MAX(total_amount) AS max_order FROM orders;
-- 分析用户年龄分布
SELECT AVG(age) AS avg_age, MIN(age) AS youngest, MAX(age) AS oldest FROM members;
分组统计时常见的陷阱:
-- 错误示例:未聚合字段直接查询
SELECT department, employee_name, AVG(salary)
FROM employees
GROUP BY department;
-- 正确做法(MySQL 8.0+)
SELECT department,
ANY_VALUE(employee_name) AS sample_name,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
2. 多维度分组技巧
WITH ROLLUP实现数据透视:
SELECT YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
product_category,
SUM(amount) AS total_sales
FROM sales_records
GROUP BY order_year, order_month, product_category
WITH ROLLUP;
分级统计结果示例:
+------------+-------------+------------------+-------------+
| order_year | order_month | product_category | total_sales |
+------------+-------------+------------------+-------------+
| 2023 | 01 | Electronics | 150000 |
| 2023 | 01 | Furniture | 89000 |
| 2023 | 01 | NULL | 239000 | -- 月度小计
| 2023 | NULL | NULL | 750000 | -- 年度合计
| NULL | NULL | NULL | 1850000 | -- 全局总计
+------------+-------------+------------------+-------------+
3. 条件筛选的进阶用法
HAVING与WHERE的差异对比:
-- 筛选总销售额超过10万的部门
SELECT department_id, SUM(sales) AS total
FROM monthly_sales
WHERE quarter = 2 -- 先过滤季度
GROUP BY department_id
HAVING total > 100000 -- 后筛选分组结果
ORDER BY total DESC;
复杂条件组合示例:
SELECT product_id,
COUNT(*) AS order_count,
AVG(quantity) AS avg_qty
FROM order_items
WHERE discount_rate < 0.2 -- 基础筛选
GROUP BY product_id
HAVING order_count > 50 AND avg_qty >= 3 -- 聚合后筛选
AND MAX(quantity) < 10; -- 混合非聚合条件
联合查询的深度解析
1. 表连接类型全解析
可视化连接示意图:
[INNER JOIN] [LEFT JOIN] [RIGHT JOIN] [FULL JOIN]
A B A B A B A B
●-∩-● ●--∩-● ●-∩--● ●---∩---●
实际业务场景应用:
-- 获取客户完整订单信息
SELECT c.customer_id, c.name, o.order_date, o.total_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date BETWEEN '2023-01-01' AND '2023-12-31'
OR o.order_id IS NULL; -- 包含未下单客户
-- 多表级联查询
SELECT e.emp_name, d.dept_name, p.project_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
LEFT JOIN projects p ON e.emp_id = p.leader_id;
2. 集合操作的特殊场景
UNION去重机制示例:
-- 合并不同来源的客户数据
(SELECT name, phone FROM web_contacts WHERE reg_date > '2023-01-01')
UNION
(SELECT contact_name, mobile FROM offline_clients WHERE status=1)
ORDER BY name;
UNION ALL性能优化案例:
-- 合并日分区表数据(已知无重复)
SELECT * FROM sales_2023_q1
UNION ALL
SELECT * FROM sales_2023_q2
UNION ALL
SELECT * FROM sales_2023_q3;
3. 子查询的进阶模式
相关子查询示例:
-- 查找各部门薪资高于平均的员工
SELECT e1.emp_name, e1.salary, e1.department
FROM employees e1
WHERE salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department
);
FROM子句中的派生表:
-- 计算产品月销售额排名
SELECT month, product_id, monthly_sales,
RANK() OVER (PARTITION BY month ORDER BY monthly_sales DESC) AS sales_rank
FROM (
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,
product_id,
SUM(quantity * price) AS monthly_sales
FROM orders
GROUP BY month, product_id
) AS sales_data;
性能优化关键策略
1. 索引优化实战
复合索引设计示例:
-- 为分组查询优化索引
ALTER TABLE sales_records
ADD INDEX idx_region_category (sales_region, product_category);
-- 排序优化索引
CREATE INDEX idx_order_date_amount ON orders(order_date DESC, total_amount);
索引失效的典型场景:
-- 函数处理导致索引失效
SELECT * FROM users WHERE YEAR(created_at) = 2023; -- 坏查询
-- 优化后版本
SELECT * FROM users
WHERE created_at BETWEEN '2023-01-01' AND '2023-12-31';
2. 执行计划分析方法
EXPLAIN输出解读要点:
+----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-----------------+
| 1 | SIMPLE | orders| NULL | ref | idx_customer | idx_customer | 5 | const| 100 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-----------------+
关键指标解析:
- type列:system > const > eq_ref > ref > range > index > ALL
- Extra列:Using filesort(需要优化), Using temporary(可能创建临时表)
- rows列:预估扫描行数
3. 查询重写技巧
优化分组查询示例:
-- 原始低效查询
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 100;
-- 优化版本(提前过滤)
SELECT department, emp_count
FROM (
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
) AS dept_stats
WHERE emp_count > 100;
连接顺序优化原则:
-- 小表驱动大表原则
SELECT /*+ STRAIGHT_JOIN */ s.*, p.product_name
FROM (SELECT * FROM sales WHERE date > '2023-01-01') s -- 小结果集
JOIN products p ON s.product_id = p.id; -- 大表
典型业务场景解决方案
1. 分层统计报表
多级汇总实现:
SELECT
IF(GROUPING(region), '所有大区', region) AS region,
IF(GROUPING(city), '全区汇总', city) AS city,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY region, city WITH ROLLUP;
2. 用户行为分析
留存率计算示例:
SELECT
a.register_date,
COUNT(DISTINCT a.user_id) AS new_users,
COUNT(DISTINCT b.user_id) / COUNT(DISTINCT a.user_id) AS day1_retention
FROM registrations a
LEFT JOIN login_events b
ON a.user_id = b.user_id
AND b.login_date = a.register_date + INTERVAL 1 DAY
GROUP BY a.register_date;
3. 库存管理系统
实时库存查询:
SELECT
p.product_id,
p.product_name,
p.initial_stock + IFNULL(SUM(s.stock_change), 0) AS current_stock
FROM products p
LEFT JOIN stock_logs s ON p.product_id = s.product_id
GROUP BY p.product_id
HAVING current_stock < p.minimum_stock;
常见问题排查指南
1. 分组字段选择错误
典型错误示例:
-- 错误:选择了非聚合字段
SELECT department, employee_name, AVG(salary)
FROM employees
GROUP BY department;
-- 修正方案
SELECT
department,
GROUP_CONCAT(employee_name) AS members,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
2. NULL值处理
聚合函数中的NULL行为:
-- COUNT的不同表现
SELECT
COUNT(*) AS total_rows, -- 包含NULL
COUNT(email) AS valid_emails, -- 忽略NULL
COUNT(DISTINCT phone) AS unique_phones
FROM contacts;
3. 性能断崖式下降
大数据量处理方案:
-- 分页优化技巧
SELECT product_id, SUM(sales)
FROM big_sales_table
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY product_id
ORDER BY SUM(sales) DESC
LIMIT 10000, 20;
-- 优化为基于游标的分页
SELECT product_id, SUM(sales) AS total
FROM big_sales_table
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
AND product_id > ?last_product_id
GROUP BY product_id
ORDER BY product_id
LIMIT 20;
正文到此结束
相关文章
热门推荐
评论插件初始化中...