IN、EXISTS 使用方式以及区别
数据背景
协议表数据总数为277811,协议签署表总数为541621 一看数据 协议签署表是协议表的两倍 就误以为使用EXISTS 性能会好一点但结果如何呢?下面的生产环境在跑的结果查询耗时20s, 不对呀 !为啥才几十万的数据为啥能查询时间这么长!难道是因为join 的表太多了导致时间太久?
进一步查看改sql 的执行计划 居然执行了全表扫描,明明ps.signer_ent_id = '979425854629085184' 使用了索引字段查询按道理会走索引,但真实却没有。所以查询时间才这么久!
原始sql 使用EXISTS
-- 原始sql 使用EXISTS
EXPLAIN SELECT
ep.id,
ep.expired_date AS expireDate,
ep.create_id AS createId,
ep.create_time AS createTime,
ep.finish_time AS finishTime,
ep.dead_time AS deadTime,
ep.page_no AS pageNo,
ep.protocol_code AS protocolCode,
ep.buss_type AS bussType,
ep.protocol_type AS protocolType,
ep.contract_name AS contractName,
ep.contract_no AS contractNo,
ep.biz_id AS bizId,
ep.biz_code AS bizCode,
ep.sign_state AS signState,
ep.remark AS remark,
ep.is_sign AS isSign,
ep.system_channel AS systemChannel,
ep.fail_reason AS failReason,
ep.download_state AS downloadState,
ep.can_download AS canDownload,
ep.file_temp_name AS fileTempName,
ep.file_related_id AS fileRelatedId,
ep.file_save_type AS fileSaveType,
IF (ep.file_save_type = 2, 1, 0) AS isBackup,
ps1.signer_id AS signerAId,
ps1.signer_ent_id AS signerAEntId,
ps1.signer_name AS signerAName,
ps1.sign_time AS signATime,
ps1.signer_state AS signerAState,
ps1.remark AS aRemark,
ps2.signer_id AS signerBId,
ps2.signer_ent_id AS signerBEntId,
ps2.signer_name AS signerBName,
ps2.sign_time AS signBTime,
ps2.signer_state AS signerBState,
ps2.remark AS bRemark,
ps3.signer_id AS signerCId,
ps3.signer_ent_id AS signerCEntId,
ps3.signer_name AS signerCName,
ps3.sign_time AS signCTime,
ps3.signer_state AS signerCState,
ps3.remark AS cRemark
FROM
pub_enterprise_protocol ep
LEFT JOIN pub_enterprise_protocol_signer ps1 ON ps1.contract_no = ep.contract_no
AND ps1.signer_type = 'A'
LEFT JOIN pub_enterprise_protocol_signer ps2 ON ps2.contract_no = ep.contract_no
AND ps2.signer_type = 'B'
LEFT JOIN pub_enterprise_protocol_signer ps3 ON ps3.contract_no = ep.contract_no
AND ps3.signer_type = 'C'
WHERE
1 = 1
AND EXISTS (
SELECT
1
FROM
pub_enterprise_protocol_signer ps
WHERE
ps.contract_no = ep.contract_no
AND ps.signer_ent_id = '979425854629085184'
)
AND ep.protocol_type = '1'
LIMIT 0,
100;
优化分析
当子查询结果集很大,而外部表较小的时候,查询效率会优于IN。
当子查询结果集较小,而外部表很大的时候,IN的查询效率会优于Exists。
改成in 看一下结果
优化后sql 使用in
-- 优化后sql 使用in
EXPLAIN SELECT
ep.id,
ep.expired_date AS expireDate,
ep.create_id AS createId,
ep.create_time AS createTime,
ep.finish_time AS finishTime,
ep.dead_time AS deadTime,
ep.page_no AS pageNo,
ep.protocol_code AS protocolCode,
ep.buss_type AS bussType,
ep.protocol_type AS protocolType,
ep.contract_name AS contractName,
ep.contract_no AS contractNo,
ep.biz_id AS bizId,
ep.biz_code AS bizCode,
ep.sign_state AS signState,
ep.remark AS remark,
ep.is_sign AS isSign,
ep.system_channel AS systemChannel,
ep.fail_reason AS failReason,
ep.download_state AS downloadState,
ep.can_download AS canDownload,
ep.file_temp_name AS fileTempName,
ep.file_related_id AS fileRelatedId,
ep.file_save_type AS fileSaveType,
IF (ep.file_save_type = 2, 1, 0) AS isBackup,
ps1.signer_id AS signerAId,
ps1.signer_ent_id AS signerAEntId,
ps1.signer_name AS signerAName,
ps1.sign_time AS signATime,
ps1.signer_state AS signerAState,
ps1.remark AS aRemark,
ps2.signer_id AS signerBId,
ps2.signer_ent_id AS signerBEntId,
ps2.signer_name AS signerBName,
ps2.sign_time AS signBTime,
ps2.signer_state AS signerBState,
ps2.remark AS bRemark,
ps3.signer_id AS signerCId,
ps3.signer_ent_id AS signerCEntId,
ps3.signer_name AS signerCName,
ps3.sign_time AS signCTime,
ps3.signer_state AS signerCState,
ps3.remark AS cRemark
FROM
pub_enterprise_protocol ep
LEFT JOIN pub_enterprise_protocol_signer ps1 ON ps1.contract_no = ep.contract_no
AND ps1.signer_type = 'A'
LEFT JOIN pub_enterprise_protocol_signer ps2 ON ps2.contract_no = ep.contract_no
AND ps2.signer_type = 'B'
LEFT JOIN pub_enterprise_protocol_signer ps3 ON ps3.contract_no = ep.contract_no
AND ps3.signer_type = 'C'
WHERE
ep.contract_no IN (
SELECT
ps.contract_no
FROM
pub_enterprise_protocol_signer ps
WHERE
ps.signer_ent_id = '979425854629085184'
)
AND ep.protocol_type = '1'
LIMIT 0,
100;
结果证明 改成in后查询时间只需0.019s 相比优化前20.57s 提升1085倍,并且查看了优化后的执行计划 条件命中索引 扫描的行数也只有32 行 相比原来277811 行全表扫描,可谓提升巨大
问题分析 EXISTS变慢了?
由EXISTS执行计划看到 子查询并没有走索引走的是where 接下来做个实验 去掉查询条件
发现还是会全表扫描 但是子查询走了索引 索引字段是合同编号 查询时间也到了毫秒级别 这才是一个正常的查询时间(虽然和in 还有一倍的差距 但是也提升巨大)
子查询中 两个条件 (A)ps.contract_no = ep.contract_no(B)AND ps.signer_ent_id = '979425854629085184' 都是索引列为啥使用B 条件 反而没走索引了?
有执行计划中的子查询可知:
DEPENDENT SUBQUERY:子查询中的第一个SELECT,取决于外面的查询 ,DEPENDENT SUBQUERY , 那么就会先执行外部的查询 , 然后再循环执行内部的查询
MATERIALIZED: 子查询视图
EXISTS 的子查询出现 DEPENDENT SUBQUERY 先查询了外部表277811 行 在根据查询子查询条件 当只有A时走了contract_no索引字段 当同时出现B时 虽然执行计划上显示可能用到的索引字段是contract_no 但最终走了where 条件,这就说明了EXISTS 慢的原因