問題背景在開發項目過程中，客戶要求使用gbase8s資料庫(基於informix)，簡單的分頁頁面響應很慢。排查發現分頁sql是先查詢出數據在外面套一層後再取多少條，如果去掉嵌套的一層，直接獲取則很快。日常使用中postgresql並沒有這樣的操作也很快，這是為什麼呢？說明在資料庫實現早期，查 ...

問題背景

在開發項目過程中，客戶要求使用gbase8s資料庫(基於informix)，簡單的分頁頁面響應很慢。排查發現分頁sql是先查詢出數據在外面套一層後再取多少條，如果去掉嵌套的一層，直接獲取則很快。日常使用中postgresql並沒有這樣的操作也很快，這是為什麼呢？

說明

在資料庫實現早期，查詢優化器對子查詢一般採用嵌套執行的方式，即父查詢中的每一行，都要執行一次子查詢，這樣子查詢會執行很多次，效率非常低。

本篇主要講postgresql針對子查詢的優化。

項目中使用子查詢的地方非常多，如何寫出高效的sql，掌握子查詢的優化是非常有必要的。

執行計劃對比（gbase8s vs postgresql）:

gbase8s慢sql執行計劃：

--gbase8s執行計劃
SET EXPLAIN ON ; 
SET EXPLAIN FILE TO '/home/gbasedbt/sqexplain.out' ;
select skip 0 first 15 * from (
select  * from T_SZGL_JDRY order by T_SZGL_JDRY.updatetime desc
)
Estimated Cost: 3207
Estimated # of Rows Returned: 6172

  1) gbasedbt.t_szgl_jdry: INDEX PATH
    (1) Index Name: gbasedbt.i_t_szgl_jdry_updatetime
        Index Keys: updatetime  (Reverse)  (Serial, fragments: ALL)
QUERY: (OPTIMIZATION TIMESTAMP: 12-21-2017 03:20:43)
------
select skip 0 first 15 * from (
select  * from T_SZGL_JDRY order by T_SZGL_JDRY.updatetime desc
)
Estimated Cost: 232
Estimated # of Rows Returned: 6172
  1) (Temp Table For Collection Subquery): SEQUENTIAL SCAN
Query statistics:
-----------------
The final cost of the plan is reduced because of the FIRST n specification in
 the query.

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                t_szgl_jdry
  t2                (Temp Table For Collection Subquery)
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     6173       6172      6173       00:00.05   3207    
--查詢執行用 222 ms,15行受影響

gbase8s修改後執行計劃

select  skip 0 first 15 * from T_SZGL_JDRY order by T_SZGL_JDRY.updatetime desc

Estimated Cost: 7
Estimated # of Rows Returned: 6172

  1) gbasedbt.t_szgl_jdry: INDEX PATH
    (1) Index Name: gbasedbt.i_t_szgl_jdry_updatetime
        Index Keys: updatetime  (Reverse)  (Serial, fragments: ALL)
Query statistics:
-----------------
The final cost of the plan is reduced because of the FIRST n specification in
 the query.

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                t_szgl_jdry

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15         6172      15         00:00.00   8       

QUERY: (OPTIMIZATION TIMESTAMP: 12-21-2017 03:23:25)
------
select 1 from sysusers
Estimated Cost: 2
Estimated # of Rows Returned: 1
  1) gbasedbt.sysusers: SEQUENTIAL SCAN
...
--查詢執行用 18 ms,15行受影響

第一個執行計劃中 (1) (Temp Table For Collection Subquery): SEQUENTIAL SCAN)可以看出是將子查詢的結果查詢出來後，在這個基礎上獲取了15條記錄

對比postgresql執行計劃

--分頁執行計劃-不嵌套
db_jcxxzypt=# explain select * from db_jcxx.t_jcxxzy_tjaj order by d_slrq limit 15 offset 0;
                                                QUERY PLAN                                                 
-------------------------------------------------------------------------
 Limit  (cost=0.44..28.17 rows=15 width=879)
   ->  Index Scan using idx_ttjaj_dslrq on t_jcxxzy_tjaj  (cost=0.44..32374439.85 rows=17507700 width=879)
(2 rows)
--子查詢執行計劃-嵌套一層
db_jcxxzypt=# explain 
db_jcxxzypt-# select * from (
db_jcxxzypt(# select * from db_jcxx.t_jcxxzy_tjaj order by d_slrq
db_jcxxzypt(# )tab1 limit 15 offset 0;
                                                QUERY PLAN                                                 
-------------------------------------------------------------------------
 Limit  (cost=0.44..28.32 rows=15 width=879)
   ->  Index Scan using idx_ttjaj_dslrq on t_jcxxzy_tjaj  (cost=0.44..32374439.85 rows=17507700 width=879)
(2 rows)

--子查詢執行計劃-嵌套兩層
db_jcxxzypt=# explain 
db_jcxxzypt-# select * from (
db_jcxxzypt(# select * from (
db_jcxxzypt(# select * from db_jcxx.t_jcxxzy_tjaj order by d_slrq
db_jcxxzypt(# )tab1 )tab2 limit 15 offset 0;
                                                QUERY PLAN                                                 
-------------------------------------------------------------------------
 Limit  (cost=0.44..28.32 rows=15 width=879)
   ->  Index Scan using idx_ttjaj_dslrq on t_jcxxzy_tjaj  (cost=0.44..32374439.85 rows=17507700 width=879)
(2 rows)

postgresql的子查詢即使嵌套多層，執行計劃還是和未嵌套一樣。原因就是postgresql在重寫sql的階段上拉子查詢（提升子查詢），把子查詢合併到父查詢中。

postgresql子查詢優化

子查詢可分為三類:一、([not]in/all/any/some),二、([not]exists),三、其他子查詢(sjp子查詢選擇、投影、連接)

子查詢可以出現在目標列、form子句、where子句、join/on子句、group by子句、having子句、orderby子句等位置。

db_jcxxzypt=#  explain select * from t_jcxxzy_tjaj aj ,(select * from t_jcxxzy_ajdsr) dsr where  dsr.c_ajbm = '1301020400000120090101';
                                      QUERY PLAN                                       
-------------------------------------------------------------------------
 Nested Loop  (cost=0.56..1252119.58 rows=17507700 width=1098)
   ->  Index Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..8.57 rows=1 width=219)
         Index Cond: (c_ajbm = '1301020400000120090101'::bpchar)
   ->  Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..1077034.00 rows=17507700 width=879)
(4 rows)

Time: 1.101 ms

postgresql子鏈接([not]in,[not]exists,all,some,any)

子查詢和子鏈接區別：子查詢是不在表達式中的子句，子鏈接在表達式中的子句

--in子鏈接
(1).
db_jcxxzypt=# explain select * from t_jcxxzy_tjaj aj where aj.c_ajbm in (select dsr.c_ajbm from t_jcxxzy_ajdsr dsr);
轉化為： select * from t_jcxxzy_tjaj aj join t_jcxxzy_ajdsr dsr aj.c_ajbm = dsr.c_ajbm;
                                                     QUERY PLAN                                             
-------------------------------------------------------------------------
 Hash Semi Join  (cost=362618.61..5537768.07 rows=7957409 width=879)
   Hash Cond: (t_jcxxzy_tjaj.c_ajbm = t_jcxxzy_ajdsr.c_ajbm)
   ->  Seq Scan on t_jcxxzy_tjaj  (cost=0.00..1077034.00 rows=17507700 width=879)
   ->  Hash  (cost=237458.59..237458.59 rows=6817202 width=23)
         ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..237458.59 rows=6817202 wi
dth=23)
(5 rows)
--in等價於=any
hash semi join表示執行的是兩張表的hash半連接，
原始sql中沒有(t_jcxxzy_tjaj.c_ajbm = t_jcxxzy_ajdsr.c_ajbm),表明此in子查詢被優化，優化後採用hash semi join演算法。
(2).相關子查詢
--當加入條件where aj.d_slrq='2001-06-14'後不能提升子鏈接，如果把where aj.d_slrq ='2001-06-14'放到父查詢 是支持子鏈接優化的
db_jcxxzypt=# explain
db_jcxxzypt-# select * from t_jcxxzy_tjaj aj where c_ajbm in (select c_ajbm from t_jcxxzy_ajdsr dsr  where aj.d_slrq='2001-06-14') ;
                                                        QUERY PLAN                                                      
-------------------------------------------------------------------------
 Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..2227874766580.75 rows=8753850 width=879)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Result  (cost=0.56..237458.59 rows=6817202 width=23)
           One-Time Filter: (aj.d_slrq = '2001-06-14'::date)
           ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr dsr  (cost=0.56..237458.59 rows=6817
202 width=23)
(6 rows
 (3).
 -- not in不能提升子鏈接
 db_jcxxzypt=#  explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm not in (select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                      QUERY PLAN                                                     
-------------------------------------------------------------------------
 Seq Scan on t_jcxxzy_tjaj  (cost=0.56..2875921362927.06 rows=8753850 width=879)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.56..311489.60 rows=6817202 width=23)
           ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..237458.59 rows=6817202 
width=23)
(5 rows)
 --not in與<>all含義相同

in子句存在不被優化的可能、當in子句中包含了主查詢的表欄位，和主查詢有相關性時不能提升子鏈接。

exists子鏈接

--exists子鏈接
db_jcxxzypt=# explain
db_jcxxzypt-# select * from t_jcxxzy_tjaj aj where  exists (select c_ajbm from t_jcxxzy_ajdsr dsr where aj.c_ajbm = dsr.c_ajbm);
                                                       QUERY PLAN                                           
            
-------------------------------------------------------------------------
 Hash Semi Join  (cost=362618.61..5537768.07 rows=7957409 width=879)
   Hash Cond: (aj.c_ajbm = dsr.c_ajbm)
   ->  Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..1077034.00 rows=17507700 width=879)
   ->  Hash  (cost=237458.59..237458.59 rows=6817202 width=23)
         ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr dsr  (cost=0.56..237458.59 rows=681720
2 width=23)
(5 rows)
-- 當加入where aj.c_xzdm = '150622'條件在子鏈接時，仍然支持上拉
db_jcxxzypt=# explain
db_jcxxzypt-# select * from t_jcxxzy_tjaj aj where  exists (select c_ajbm from t_jcxxzy_ajdsr dsr where aj.c_xzdm = '150622');
                                                   QUERY PLAN                                               
     
-------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.56..1361779.20 rows=5436 width=879)
   ->  Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..1120803.25 rows=5436 width=879)
         Filter: ((c_xzdm)::text = '150622'::text)
   ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr dsr  (cost=0.56..237458.59 rows=6817202 widt
h=0)
(4 rows)
--exists子鏈接
db_jcxxzypt=# explain
db_jcxxzypt-# select * from t_jcxxzy_tjaj aj where  exists (select c_ajbm from t_jcxxzy_ajdsr dsr where dsr.c_ajbm='1101120300000120030101')
db_jcxxzypt-# ;
                                               QUERY PLAN                                               
-------------------------------------------------------------------------
 Result  (cost=4.58..1077038.57 rows=17507700 width=879)
   One-Time Filter: $0
   InitPlan 1 (returns $0)
     ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr dsr  (cost=0.56..4.58 rows=1 width=0)
           Index Cond: (c_ajbm = '1101120300000120030101'::bpchar)
   ->  Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..1077034.00 rows=17507700 width=879)
(6 rows)
子查詢只執行了一次，作為aj表的參數。
--not exists子鏈接
db_jcxxzypt=# explain
db_jcxxzypt-# select * from t_jcxxzy_tjaj aj where not exists (select c_ajbm from t_jcxxzy_ajdsr dsr);
                                     QUERY PLAN                                      
-------------------------------------------------------------------------
 Result  (cost=0.04..1077034.04 rows=17507700 width=879)
   One-Time Filter: (NOT $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on t_jcxxzy_ajdsr dsr  (cost=0.00..281210.02 rows=6817202 width=0)
   ->  Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..1077034.00 rows=17507700 width=879)
(5 rows)
從執行計划上看，not exists子查詢並沒有被消除，子查詢只是執行了一次，將結果作為aj表的參數。

in和exists都存在不被優化的可能，對於in和exists的選擇，當父查詢結果集小於子查詢結果集則選擇exists，如果父查詢結果集大於子查詢結果集選擇in。

所有的all子鏈接都不支持上拉

db_jcxxzypt=#  explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm >all(select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                      QUERY PLAN                                            
-------------------------------------------------------------------------
 Seq Scan on t_jcxxzy_tjaj  (cost=0.56..2875921362927.06 rows=8753850 width=879)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Materialize  (cost=0.56..311489.60 rows=6817202 width=23)
           ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..237458.59 rows=6817202 
width=23)
(5 rows)

db_jcxxzypt=#  explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm =all(select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                      QUERY PLAN                                            
          
-------------------------------------------------------------------------
 Seq Scan on t_jcxxzy_tjaj  (cost=0.56..2875921362927.06 rows=8753850 width=879)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Materialize  (cost=0.56..311489.60 rows=6817202 width=23)
           ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..237458.59 rows=6817202 
width=23)
(5 rows)

db_jcxxzypt=#  explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm <all(select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                      QUERY PLAN                                                   
-------------------------------------------------------------------------
 Seq Scan on t_jcxxzy_tjaj  (cost=0.56..2875921362927.06 rows=8753850 width=879)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Materialize  (cost=0.56..311489.60 rows=6817202 width=23)
           ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..237458.59 rows=6817202 
width=23)
(5 rows)
關於all的查詢都都是以子查詢的形式，不會上拉

some/any

--some和any是等效的
db_jcxxzypt=#explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm  >some(select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                 QUERY PLAN                                                 
 
-------------------------------------------------------------------------
-
 Nested Loop Semi Join  (cost=0.56..11316607.35 rows=5835900 width=879)
   ->  Seq Scan on t_jcxxzy_tjaj  (cost=0.00..1077034.00 rows=17507700 width=879)
   ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..64266.97 rows=2272401 width=23)
         Index Cond: (c_ajbm < t_jcxxzy_tjaj.c_ajbm)
(4 rows)

db_jcxxzypt=#explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm  =some(select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                     QUERY PLAN                                             
        
-------------------------------------------------------------------------
 Hash Semi Join  (cost=362618.61..5537768.07 rows=7957409 width=879)
   Hash Cond: (t_jcxxzy_tjaj.c_ajbm = t_jcxxzy_ajdsr.c_ajbm)
   ->  Seq Scan on t_jcxxzy_tjaj  (cost=0.00..1077034.00 rows=17507700 width=879)
   ->  Hash  (cost=237458.59..237458.59 rows=6817202 width=23)
         ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..237458.59 rows=6817202 wi
dth=23)
(5 rows)

db_jcxxzypt=#explain select * from db_jcxx.t_jcxxzy_tjaj where c_ajbm  <some(select c_ajbm from db_jcxx.t_jcxxzy_ajdsr);
                                                 QUERY PLAN                                                 
-------------------------------------------------------------------------

 Nested Loop Semi Join  (cost=0.56..11316607.35 rows=5835900 width=879)
   ->  Seq Scan on t_jcxxzy_tjaj  (cost=0.00..1077034.00 rows=17507700 width=879)
   ->  Index Only Scan using idx_tajdsr_cajbm on t_jcxxzy_ajdsr  (cost=0.56..64266.97 rows=2272401 width=23)
         Index Cond: (c_ajbm > t_jcxxzy_tjaj.c_ajbm)
(4 rows)
--some中未出現子查詢，dsr表都被上拉到父查詢中，與aj表進行嵌套半連接和hash半連接

這些查詢中all是完全不支持上拉子子鏈接的，而in和exists存在不被上拉的可能。

不可上拉的子查詢

不支持帶有with子句的格式，集合操作、聚集函數(aggregates、group、distinct)、cte、having、limit/offset等子句格式

db_jcxxzypt=# explain select * from t_jcxxzy_tjaj aj ,(select * from t_jcxxzy_ajdsr limit 10) dsr where  dsr.c_ajbm = '1301020400000120090101';
                                         QUERY PLAN                                          
-------------------------------------------------------------------------
 Nested Loop  (cost=0.00..1252111.54 rows=17507700 width=1098)
   ->  Subquery Scan on dsr  (cost=0.00..0.54 rows=1 width=219)
         Filter: (dsr.c_ajbm = '1301020400000120090101'::bpchar)
         ->  Limit  (cost=0.00..0.41 rows=10 width=219)
               ->  Seq Scan on t_jcxxzy_ajdsr  (cost=0.00..281210.02 rows=6817202 width=219)
   ->  Seq Scan on t_jcxxzy_tjaj aj  (cost=0.00..1077034.00 rows=17507700 width=879)
(6 rows)

Time: 0.958 ms

上拉子查詢後，父級的多個表之間的連接順序是怎麼樣的呢？會有什麼變化嗎？

對於被上拉的子查詢，abase把子查詢的關係併入主from-list中，這樣關係的個數會增加，按照多表連接順序演算法就會產生更多的連接路徑比如A、B、C三張表的關聯就有{A,B}、{A,C}、{B,A}、{B,C}、{C,A}、{C,B}六種連接方式

join與子查詢固化或rewrite

join或子查詢的優化，屬於優化器優化JOIN的範疇。

當用戶的QUERY涉及到多個JOIN對象，或者涉及到多個子查詢時，優化器可以選擇是否改變當前的SQL，產生更多的plan選擇更優的執行計劃。
postgresql.conf文件中：
#from_collapse_limit = 8
當from列表的對象少於from_collapse_limit時，優化器可以將子查詢提升到上層進行JOIN，從而可能選擇到更優的執行計劃。  
#join_collapse_limit = 8                # 1 disables collapsing of explicit
                                        # JOIN clauses
當使用顯示的JOIN時（除了full join），例如a join b join c join d，優化器可以重排JOIN的順序，以產生更多的PLAN選擇更優的執行計劃。  
如果join_collapse_limit=1，則不重排，使用SQL寫法提供的順序。  

如果用戶要固化JOIN順序，請使用顯示的JOIN，同時將join_collapse_limit設置為1。
如果用戶不打算提升子查詢，同樣的，將from_collapse_limit 設置為1即可。

等價改寫

子查詢中沒有group by子句，也沒有聚集函數，則可使用下麵的等價轉換
val>all(select...)  to val>max(select...)
val<all(select...) to val<min(select...)
val>any(select...) to val>min(select...)
val<any(select...) to val<max(select...)
val>=all(select...) to val>=max(select...)
val<=all(select...) to val<=min(select...)
val>=any(select...) to val>=min(select...)
val<=any(select...) to val<=max(select...)
通常，聚集函數min(),max()的執行效率要比any、all效率高

結束語

1.postgresql子查詢的優化思路，子查詢不用執行多次

2.優化器可以根據統計信息來選擇不同的連接方法和不同的連接順序

3.子查詢中的連接條件，過濾條件分別變成了父查詢的連接條件、過濾條件、優化器可以對這些條件進行下推、提高執行效率

4.將子查詢優化為表連接後，子查詢只需要執行一次、而優化器可以根據統計信息來選擇不同的連接方式和連接順序、子查詢的連接條件和過濾條件分別變成父查詢的條件。

5.這些查詢中all是完全不支持上拉子子鏈接的，in和exists存在不被優化的可能

6.not exists雖然沒有被上拉，但是被優化為只執行一次，相對於not in稍好

7.可使用等價改寫的方式優化

8.可根據配置文件，固化子查詢，以及表的連接順序

postgresql子查詢優化(提升子查詢)