SQL优化:复杂标量子查询改写案例

一个复杂的标量子查询改写 left join 的例子,在 case when 部分多张表与外表关联,并且有聚合,标量子查询SQL如下,要跑600秒:

SELECT 
    AAZ661 ZFLSH, 
    '1' ZFXH, 
    AAE002 TIMESTAMP, 
    AAE924 JSXM,
    SUM(
        CASE 
            WHEN (
                SELECT COUNT(DISTINCT B.BAA526) 
                FROM t1 A, t2 B 
                WHERE A.AAA345 = B.AAA345 
                  AND A.AAZ661 = Q.AAZ661 
                  AND (B.BAA531 = Q.LTXBZ OR B.BAA531 = '9' OR Q.LTXBZ = '9') 
                  AND B.BAA526 IN ('3', '4')
            ) = 2 
            THEN 
                CASE 
                    WHEN BAA526 = '4' THEN AAE774 
                    WHEN AAE924 NOT IN ('7004', '7008', '7009', '7010', '7011', '7029') AND BAA526 = '5' THEN AAE019 - ZZZHBF 
                    ELSE AAE019 
                END 
            ELSE AAE019 
        END
    ) JE 
FROM Q 
GROUP BY AAZ661, AAE002, AAE924;

第一次改写:

SELECT 
    Q.AAZ661 ZFLSH, 
    '1' ZFXH, 
    Q.AAE002 TIMESTAMP, 
    Q.AAE924 JSXM,
    SUM(
        CASE 
            WHEN V.COUNT_BAA526 = 2 
            THEN 
                CASE 
                    WHEN B.BAA526 = '4' THEN Q.AAE774 
                    WHEN Q.AAE924 NOT IN ('7004', '7008', '7009', '7010', '7011', '7029') AND B.BAA526 = '5' THEN Q.AAE019 - Q.ZZZHBF 
                    ELSE Q.AAE019 
                END 
            ELSE Q.AAE019 
        END
    ) JE 
FROM Q 
LEFT JOIN (
    SELECT 
        A.AAZ661,B.BAA531
        COUNT(DISTINCT B.BAA526) AS COUNT_BAA526
    FROM t1 A
    JOIN t2 B ON A.AAA345 = B.AAA345
    WHERE  B.BAA526 IN ('3', '4')
    GROUP BY A.AAZ661,B.BAA531
) V ON Q.AAZ661 = V.AAZ661 and(Q.LTXBZ = V.BAA531 OR V.BAA531 = '9' OR Q.LTXBZ = '9')
GROUP BY Q.AAZ661, Q.AAE002, Q.AAE924;

改写后只要50秒,但是结果不对,需要修正(chatgpt帮忙做的修正,值得一提的是提问方法,一开始问他原始 SQL 应该怎么改,总是给出明显有错误的 SQL;但后来我先给出一个改写后的 SQL,问他改写的是否正确时,他就能给出下面这个答案了):

  • 子查询中的 GROUP BY 只需要按 A.AAZ661 进行分组,因为我们计算的是 COUNT(DISTINCT B.BAA526),并且获取 BAA531 的最大值。
  • 在子查询中使用 MAX(B.BAA531) 获取 BAA531 的最大值,这样在 LEFT JOIN 时可以正确地进行条件判断。

其实没太理解为什么要取 MAX(B.BAA531) ,尽管它的结果是对的。按道理来说,原始 SQL 是取了 Q 表每行数据中的 Q.AAZ661、Q.LTXBZ 代入子查询,对满足条件的记录计算 COUNT(DISTINCT B.BAA526) ,那改成 left join 后,应该对 AAZ661、BAA531 进行分组,第一次改写的 SQL 就是这个逻辑,但是结果却不对。希望有老师能指点下,不胜感激。

SELECT 
    Q.AAZ661 AS ZFLSH, 
    '1' AS ZFXH, 
    Q.AAE002 AS TIMESTAMP, 
    Q.AAE924 AS JSXM,
    SUM(
        CASE 
            WHEN V.COUNT_BAA526 = 2 
            THEN 
                CASE 
                    WHEN Q.BAA526 = '4' THEN Q.AAE774 
                    WHEN Q.AAE924 NOT IN ('7004', '7008', '7009', '7010', '7011', '7029') AND Q.BAA526 = '5' THEN Q.AAE019 - Q.ZZZHBF 
                    ELSE Q.AAE019 
                END 
            ELSE Q.AAE019 
        END
    ) AS JE 
FROM Q 
LEFT JOIN (
    SELECT 
        A.AAZ661,
        COUNT(DISTINCT B.BAA526) AS COUNT_BAA526,
        MAX(B.BAA531) AS BAA531
    FROM t1 A
    JOIN t2 B ON A.AAA345 = B.AAA345
    WHERE B.BAA526 IN ('3', '4')
    GROUP BY A.AAZ661
) V ON Q.AAZ661 = V.AAZ661 
   AND (V.BAA531 = Q.LTXBZ OR V.BAA531 = '9' OR Q.LTXBZ = '9')
GROUP BY Q.AAZ661, Q.AAE002, Q.AAE924;
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容