分享一次公司晋级考试的SQL题目，非常有趣的案例（postgresql 标量子查询 where like 谓词过滤条件） - 服务器托管|北京服务器租用|机房托管租用|IDC托管租用|机房机柜带宽租用-价格及费用咨询

同事今天晋级高级工程师考试，发来一道公司出题目让我帮忙进行优化，其中场景二的案例非常有意思。

题目内容如下：

原始SQL：

scott=> explain analyze 
scott-> select 
scott-> a.id,
scott-> a.col2,
scott-> (select sum(b.id) from table02 b where a.col2 like b.col2||'%' )
scott-> from table01 a;


                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on table01 a  (cost=0.00..3905341.00 rows=100000 width=45) (actual time=0.579..50568.090 rows=100000 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=39.02..39.03 rows=1 width=8) (actual time服务器托管网=0.504..0.504 rows=1 loops=100000)
           ->  Seq Scan on table02 b  (cost=0.00..39.00 rows=10 width=4) (actual time=0.063..0.499 rows=8 loops=100000)
                 Filter: (a.col2 ~~ (col2 || '%'::text))
                 Rows Removed by Filter: 1992
 Planning Time: 0.097 ms
 Execution Time: 50590.882 ms
(8 行记录)

时间：50591.756 ms (00:50.592)

table01、table02 这两张表没有创建任何索引，全表扫描+标量子查询SQL执行需要50s才能出结果，速度非常慢。

考题要求要优化这条SQL，意思既是无论是调整 postgresql数据库的参数，对SQL加索引，等价改写SQL，这些手段都没问题，只要能让执行速度变快就行。

由于当时我在忙其他的事情，大致看了一眼后给出了以下的改写方案（我没加索引，感觉加索引的用处不大）：

改写1：

scott=> explain analyze select 
scott-> a.id,
scott-> a.col2,
scott-> b.sum_b_id
scott-> from table01 a 
scott-> left join (select sum(b.id) sum_b_id,b.col2 from table02 b group by b.col2) b 
scott-> ON a.col2 like b.col2||'%'
scott-> ;

                                                    QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=39.00..448135.74 rows=127500 width=45) (actual time=1.283..8674.517 rows=100000 loops=1)
   Join Filter: (a.col2 ~~ (b.col2 || '%'::text))
   Rows Removed by Join Filter: 25400387
   ->  Seq Scan on table01 a  (cost=0.00..1841.00 rows=100000 width=37) (actual time=0.018..19.620 rows=100000 loops=1)
   ->  Materialize  (cost=39.00..45.37 rows=255 width=11) (actual time=0.000..0.024 rows=255 loops=100000)
         ->  HashAggregate  (cost=39.00..41.55 rows=255 width=11) (actual time=1.241..1.316 rows=255 loops=1)
               Group Key: b.col2
               ->  Seq Scan on table02 b  (cost=0.00..29.00 rows=2000 width=7) (actual time=0.007..0.342 rows=2000 loops=1)
 Planning Time: 0.181 ms
 Execution Time: 8682.974 ms
(10 行记录)

时间：8684.338 ms (00:08.684)

可以看到等价改写以后，SQL从原来执行 50s 的时间已经降低到 8.8s 左右，提升还是挺大的。

把答案给了同事，我服务器托管网也去忙其他的事情了。

晚上我闲下来没事做，贼无聊，仔细看了下改写1 SQL的执行计划，感觉这种计划可能不是最优的执行计划。

因为我始终感觉走HASH可能才是最佳的执行计划，如果这条SQL在ORACLE 上执行，CBO很大可能会让计划走HASH，但是在PG就是走NL（脑残优化器）。

吃完饭后一直在尝试改写，搞了哥很长时间，最终还是把HASH版本的SQL给改出来了，泪目。

改写2：

scott=> explain analyze 
scott-> with a as (select id,col2,substr(t1.col2,1,x.rn) rn1 from table01 t1,
scott(> 

scott(>   (select min(length(col2)) rn from table02) x),
t2.col2,1,x.rn) rn2 from (select sum(b.id) sum_b_id,b.col2 from table02 b group by b.col2) t2,                                
scott(>       (select min(length(col2)) rn from table02) x)
scott-> select a.id,a.col2,b.sum_b_id from a 
scott-> left join b on a.rn1 = b.rn2 and a.col2 like b.col2||'%';
                                                                QUERY PLAN                                                        
        
----------------------------------------------------------------------------------------------------------------------------------
--------
 Hash Left Join  (cost=127.86..14881.38 rows=100000 width=45) (actual time=2.322..215.695 rows=100000 loops=1)
   Hash Cond: (substr(t1.col2, 1, (min(length(table02.col2)))) = substr(b.col2, 1, (min(length(table02_1.col2)))))
   Join Filter: (t1.col2 ~~ (b.col2 || '%'::text))
   ->  Nested Loop  (cost=39.00..2880.02 rows=100000 width=41) (actual time=0.588..36.635 rows=100000 loops=1)
         ->  Aggregate  (cost=39.00..39.01 rows=1 width=4) (actual time=0.580..0.582 rows=1 loops=1)
               ->  Seq Scan on table02  (cost=0.00..29.00 rows=2000 width=3) (actual time=0.015..0.252 rows=2000 loops=1)
         ->  Seq Scan on table01 t1  (cost=0.00..1841.00 rows=100000 width=37) (actual time=0.005..15.073 rows=100000 loops=1)
   ->  Hash  (cost=85.67..85.67 rows=255 width=15) (actual time=1.721..1.724 rows=255 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 20kB
         ->  Nested Loop  (cost=78.00..85.67 rows=255 width=15) (actual time=1.500..1.602 rows=255 loops=1)
               ->  Aggregate  (cost=39.00..39.01 rows=1 width=4) (actual time=0.554..0.555 rows=1 loops=1)
                     ->  Seq Scan on table02 table02_1  (cost=0.00..29.00 rows=2000 width=3) (actual time=0.004..0.233 rows=2000 l
oops=1)
               ->  HashAggregate  (cost=39.00..41.55 rows=255 width=11) (actual time=0.945..1.002 rows=255 loops=1)
                     Group Key: b.col2
                     ->  Seq Scan on table02 b  (cost=0.00..29.00 rows=2000 width=7) (actual time=0.005..0.250 rows=2000 loops=1)
 Planning Time: 0.351 ms
 Execution Time: 224.017 ms
(17 行记录)

时间：225.488 ms

这个案例从最早的 50秒改写到 8秒，到最后的 225毫秒出结果，花了不少时间研究各种改写方式。

只能说PG的优化器确实太拉跨了，要用CTE才能让优化器走HASH连接，浪费开发者不少时间，换成ORACLE数据库不会走这种SB执行计划。

以后估计会很少发博客，正在考虑转行卖炒粉，现在的市场真的是一言难尽。

服务器托管，北京服务器托管，服务器租用 http://www.fwqtg.net
机房租用，北京机房租用，IDC机房托管， http://www.fwqtg.net

相关推荐: Vue组件懒加载

在当今快节奏的数字世界中，网站性能对于吸引用户和取得成功至关重要。然而，对于像首页这样的页面，在不影响功能的前提下优化性能就成了一项挑战。这就是 Vue 组件懒加载的用武之地。通过将非必要元素的加载推迟到可见时进行，开发人员可以增强用户体验，同时确保登陆页面…

服务器托管，北京服务器托管，服务器租用，机房机柜带宽租用