FQS：一種神奇的數倉查詢優化技術

本文分享自華為雲社區《根據執行計劃優化SQL【綻放吧！GaussDB(DWS)雲原生數倉】》，作者：西嶺雪山。引言如果您剛接觸DWS那一定會好奇想要知道"REMOTE_FQS_QUERY" 到底代表什麼意思？我們看官網的描述是代表這執行計劃已經CN直接將原語句下發到DN，各DN單獨執行，並將執行 ...

本文分享自華為雲社區《根據執行計劃優化SQL【綻放吧！GaussDB(DWS)雲原生數倉】》，作者：西嶺雪山。

引言

如果您剛接觸DWS那一定會好奇想要知道"REMOTE_FQS_QUERY" 到底代表什麼意思？我們看官網的描述是代表這執行計劃已經CN直接將原語句下發到DN，各DN單獨執行，並將執行結果在CN上進行彙總。且不需要做過多的調整了，真的是這樣嗎？

FQS計劃，完全下推

兩表JOIN，且其連接條件為各表的分佈列，在關閉stream運算元的情況下，CN會直接將該語句發送至各DN執行，最後結果在CN彙總。

SET enable_stream_operator=off;

SET explain_perf_mode=normal;

EXPLAIN (VERBOSE on,COSTS off) SELECT * FROM tt01,tt02 WHERE tt01.c1=tt02.c2;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

Data Node Scan on "__REMOTE_FQS_QUERY__"

Output: tt01.c1, tt01.c2, tt02.c1, tt02.c2

Node/s: All datanodes

Remote query: SELECT tt01.c1, tt01.c2, tt02.c1, tt02.c2 FROM dbadmin.tt01, dbadmin.tt02 WHERE tt01.c1 = tt02.c2

(4 rows)

像上面的執行計劃只顯示了Data Node Scan on "__REMOTE_FQS_QUERY__"，這樣的執行計劃太過粗糙，不知道內部是如何執行的，是否走了索引等更為詳細的信息。

下麵我們建表進行驗證

create table t5 (bh varchar(300),bh2 varchar(300),c_name varchar(300),c_info varchar(300))distribute by hash(bh);

insert into t4 select uuid_generate_v1(), uuid_generate_v1(),'測試','sdfffffffffffffffsdf' from generate_series(1,50000);

insert into t4 select * from t4;

--1、沒有索引的情況下：

postgres=# explain analyze select * from t4 where bh2 = '652e4e0e-ba60-0400-25b5-4ee5e490fffe';

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs

----+----------------------------------------------+---------+--------+--------+-------------+---------+---------+---------

1 | -> Data Node Scan on "__REMOTE_FQS_QUERY__" | 256.364 | 32 | 0 | 56KB | | 0 | 0.00



====== Query Summary =====

-----------------------------------------

Coordinator executor start time: 0.055 ms

Coordinator executor run time: 256.410 ms

Coordinator executor end time: 0.010 ms

Planner runtime: 0.145 ms

Query Id: 73746443917091633

Total runtime: 256.557 ms

(12 rows)

Time: 259.051 ms

--2、添加索引，並添加hint indexscan

postgres=# create index i_t4 on t4(bh2);

CREATE INDEX

Time: 3328.258 ms

postgres=# explain analyze select /*+ indexscan(t4 i_t4) */ * from t4 where bh2 = '652e4e0e-ba60-0400-25b5-4ee5e490fffe';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

id | operation | A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs

----+----------------------------------------------+--------+--------+--------+-------------+---------+---------+---------

1 | -> Data Node Scan on "__REMOTE_FQS_QUERY__" | 2.269 | 32 | 0 | 56KB | | 0 | 0.00



====== Query Summary =====

-----------------------------------------

Coordinator executor start time: 0.027 ms

Coordinator executor run time: 2.298 ms

Coordinator executor end time: 0.009 ms

Planner runtime: 0.074 ms

Query Id: 73746443917091930

Total runtime: 2.401 ms

(12 rows)

可以看到沒有創建索引的時候執行計劃和創建索引的執行計劃完全一樣，但是執行的時間是259.051ms和2.401ms，相差非常明顯，很可能第二個執行計劃已經走索引了，但是執行計劃一樣，這對於優化人員不夠直觀。

即使在執行計劃中加入了 /*+ indexscan(t4 i_t4) */，但並沒有列印出是否走了索引，執行計划過於簡潔，並且pg_stat_all_indexes中業務表的所有統計信息都是0，也沒發判斷。

CPUTime

對於上面的時間區別也可以用CPU耗時對比，在執行計劃中加入CPU的耗時：

--沒有索引的執行計劃

postgres=# explain (analyze,buffers,verbose,cpu,nodes )select * from t4 where bh2 = '652e4e0e-ba60-0400-25b5-4ee5e490fffe';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) (actual time=244.096..244.108 rows=32 loops=1)

Output: t4.bh, t4.bh2, t4.c_name, t4.c_info

Node/s: All datanodes

Remote query: SELECT bh, bh2, c_name, c_info FROM sa.t4 WHERE bh2::text = '652e4e0e-ba60-0400-25b5-4ee5e490fffe'::text

(CPU: ex c/r=762829, ex row=32, ex cyc=24410534, inc cyc=24410534)

Total runtime: 244.306 ms

(6 rows)

--創建索引後的執行計劃

postgres=# explain (analyze,buffers,verbose,cpu,nodes )select * from t4 where bh2 = '652e4e0e-ba60-0400-25b5-4ee5e490fffe';

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) (actual time=1.035..2.148 rows=32 loops=1)

Output: t4.bh, t4.bh2, t4.c_name, t4.c_info

Node/s: All datanodes

Remote query: SELECT bh, bh2, c_name, c_info FROM sa.t4 WHERE bh2::text = '652e4e0e-ba60-0400-25b5-4ee5e490fffe'::text

(CPU: ex c/r=6698, ex row=32, ex cyc=214354, inc cyc=214354)

Total runtime: 2.242 ms

(6 rows)

對比執行計劃可以看到是一樣的。

其中cyc代表的是CPU的周期數，ex cyc表示的是當前運算元的周期數，不包含其子節點；inc cyc是包含子節點的周期數；ex row是當前運算元輸出的數據行數；ex c/r則是ex cyc/ex row得到的每條數據所用的平均周期數。

cpu平均周期對比：沒索引：762829，創建索引後：6698，大約是一百多倍。

查看詳細計劃

__REMOTE_FQS_QUERY__是直接將語句發送給了nodedata，所以cn節點不生成執行計劃，所以沒法看到是否走索引，如果我們將enable_fast_query_shipping關閉，就能在cn上面生成執行計劃，可以看到是否走了索引。

--關閉fast_query

postgres=# set enable_fast_query_shipping to off;

postgres=# set explain_perf_mode=normal;

--走索引的執行計劃

postgres=# explain analyze select * from t4 where bh2 = '652e4e0e-ba60-0400-25b5-4ee5e490fffe';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------

Streaming (type: GATHER) (cost=4.95..51.75 rows=31 width=102) (actual time=1.695..2.263 rows=32 loops=1)

Node/s: All datanodes

-> Bitmap Heap Scan on t4 (cost=4.33..43.75 rows=31 width=102) (actual time=[0.040,0.040]..[0.057,0.153], rows=32)

Recheck Cond: ((bh2)::text = '652e4e0e-ba60-0400-25b5-4ee5e490fffe'::text)

-> Bitmap Index Scan on i_t4 (cost=0.00..4.33 rows=31 width=0) (actual time=[0.035,0.035]..[0.042,0.042], rows=32)

Index Cond: ((bh2)::text = '652e4e0e-ba60-0400-25b5-4ee5e490fffe'::text)

Total runtime: 2.569 ms

(7 rows)

Time: 5.226 ms

--刪除索引後的全表掃描

postgres=# explain analyze select * from t4 where bh2 = '652e4e0e-ba60-0400-25b5-4ee5e490fffe';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

Streaming (type: GATHER) (cost=0.62..31755.34 rows=31 width=102) (actual time=294.661..294.814 rows=32 loops=1)

Node/s: All datanodes

-> Seq Scan on t4 (cost=0.00..31747.34 rows=31 width=102) (actual time=[0.084,258.294]..[280.141,293.190], rows=32)

Filter: ((bh2)::text = '652e4e0e-ba60-0400-25b5-4ee5e490fffe'::text)

Rows Removed by Filter: 3199968

Total runtime: 295.154 ms

(6 rows)

Time: 297.348 ms

使用enable_fast_query_shipping控制是否使用分散式框架，以此來查看具體的執行計劃，針對優化SQL有幫助。

僅憑 "REMOTE_FQS_QUERY"是沒法判斷有沒有走索引，還需要進一步驗證。

小小的缺陷：即使SQL走了索引，統計信息表pg_stat_all_indexes和pg_stat_all_table中的index_scan索引掃描次數都是0。

分佈鍵類型影響

常見的fqs一般單表簡單查詢，以及多表連接且關聯鍵是同類型分佈鍵。

當查詢中有函數，多表關聯關聯鍵欄位類型不同，分佈鍵類型不同，以及非等值情況都可能造成不下推。

下麵舉例分佈鍵類型不一樣

--t1和t2表結構完全一樣，分佈鍵都是hash(id)

postgres=# \d+ t1

Table "sa.t1"

Column | Type | Modifiers | Storage | Stats target | Description

--------+------------------------+-----------+----------+--------------+-------------

id | character varying(300) | | extended | |

c_name | character varying(300) | | extended | |

c_info | character varying(300) | | extended | |

Indexes:

"i_t1" btree (id) TABLESPACE pg_default

"i_t1_id" btree (id) TABLESPACE pg_default

Has OIDs: no

Distribute By: HASH(id)

Location Nodes: ALL DATANODES

Options: orientation=row, compression=no

--可以下推，執行計劃顯示FQS

postgres=# explain select * from t1,t2 where t1.id=t2.id;

QUERY PLAN

----------------------------------------------------------------------------------

id | operation | E-rows | E-width | E-costs

----+----------------------------------------------+--------+---------+---------

1 | -> Data Node Scan on "__REMOTE_FQS_QUERY__" | 0 | 0 | 0.00

(3 rows)

--修改其中一個表的分佈鍵為隨機分佈roundrobin

postgres=# alter table t1 distribute by roundrobin;

ALTER TABLE

postgres=# explain select * from t1,t2 where t1.id=t2.id;

QUERY PLAN

------------------------------------------------------------------------------------------------

id | operation | E-rows | E-memory | E-width | E-costs

----+-----------------------------------------+----------+--------------+---------+-----------

1 | -> Streaming (type: GATHER) | 13021186 | | 60 | 159866.51

2 | -> Hash Join (3,5) | 13021186 | 1MB | 60 | 159449.88

3 | -> Streaming(type: REDISTRIBUTE) | 1600000 | 2MB | 30 | 53357.30

4 | -> Seq Scan on t1 | 1600000 | 1MB | 30 | 9357.33

5 | -> Hash | 1599999 | 48MB(4435MB) | 30 | 9355.33

6 | -> Seq Scan on t2 | 1600000 | 1MB | 30 | 9355.33



RunTime Analyze Information

----------------------------------

"sa.t1" runtime: 219.368ms

"sa.t2" runtime: 184.141ms



Predicate Information (identified by plan id)

--------------------------------------------------

2 --Hash Join (3,5)

Hash Cond: ((t1.id)::text = (t2.id)::text)



====== Query Summary =====

-------------------------------

System available mem: 4546560KB

Query Max mem: 4546560KB

Query estimated mem: 131072KB

(24 rows)

--將t2表修改為隨機分佈，結果是查詢時兩個表都需要重分佈

postgres=# alter table t2 distribute by roundrobin;

ALTER TABLE

postgres=# explain select * from t1,t2 where t1.id=t2.id;

QUERY PLAN

---------------------------------------------------------------------------------------------------

id | operation | E-rows | E-memory | E-width | E-costs

----+--------------------------------------------+----------+--------------+---------+-----------

1 | -> Streaming (type: GATHER) | 12804286 | | 60 | 203041.85

2 | -> Hash Join (3,5) | 12804286 | 1MB | 60 | 202625.22

3 | -> Streaming(type: REDISTRIBUTE) | 1600000 | 2MB | 30 | 53357.30

4 | -> Seq Scan on t2 | 1600000 | 1MB | 30 | 9357.33

5 | -> Hash | 1599999 | 68MB(4433MB) | 30 | 53357.30

6 | -> Streaming(type: REDISTRIBUTE) | 1600000 | 2MB | 30 | 53357.30

7 | -> Seq Scan on t1 | 1600000 | 1MB | 30 | 9357.33



RunTime Analyze Information

----------------------------------

"sa.t2" runtime: 203.933ms



Predicate Information (identified by plan id)

--------------------------------------------------

2 --Hash Join (3,5)

Hash Cond: ((t2.id)::text = (t1.id)::text)



====== Query Summary =====

-------------------------------

System available mem: 4546560KB

Query Max mem: 4546560KB

Query estimated mem: 131072KB

(24 rows)

當t1表是隨機分佈的時候連表查詢，t1表會要做重分佈，t2也是隨機分佈的時候，連表查詢也需要做重分佈。隨機分佈的情況下是沒法完全下推的。

replication模式就不演示了，因為replication是所有dn都有一份數據，所以數據量是dn數量*表數據量，每個節點都有一份完整的數據，肯定是可以下推的。

將t1和t2都改成hash分佈，然後關聯建選擇一個非分佈列，這很明顯的是沒法直接完全下推的：

postgres=# alter table t1 distribute by hash(id);

ALTER TABLE

postgres=# alter table t2 distribute by hash(id);

ALTER TABLE

--關聯建加入c_name

postgres=# explain select * from t1,t2 where t1.id=t2.c_name;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

id | operation | E-rows | E-memory | E-width | E-costs

----+--------------------------------------------------------------+----------+--------------+---------+-----------

1 | -> Streaming (type: GATHER) | 12621020 | | 61 | 182863.95

2 | -> Hash Join (3,5) | 12621020 | 1MB | 61 | 182447.32

3 | -> Streaming(type: PART REDISTRIBUTE PART ROUNDROBIN) | 1600000 | 2MB | 30 | 54688.64

4 | -> Seq Scan on t2 | 1600000 | 1MB | 30 | 9355.33

5 | -> Hash | 1599999 | 48MB(4433MB) | 31 | 32355.32

6 | -> Streaming(type: PART LOCAL PART BROADCAST) | 1600000 | 2MB | 31 | 32355.32

7 | -> Seq Scan on t1 | 1600000 | 1MB | 31 | 9355.33

-- 如果將t1改成replication

postgres=# alter table t1 distribute by replication ;

ALTER TABLE

postgres=# explain select * from t1,t2 where t1.id=t2.id;

QUERY PLAN

----------------------------------------------------------------------------------

id | operation | E-rows | E-width | E-costs

----+----------------------------------------------+--------+---------+---------

1 | -> Data Node Scan on "__REMOTE_FQS_QUERY__" | 0 | 0 | 0.00

(3 rows)

--可以看到t1是複製表，t2是hash表也可以完全下推

--再將t2改為隨機分佈，關聯查詢會是怎樣呢？

postgres=# alter table t2 distribute by replication;

ALTER TABLE

postgres=# explain select * from t1,t2 where t1.id=t2.id;

QUERY PLAN

----------------------------------------------------------------------------------

id | operation | E-rows | E-width | E-costs

----+----------------------------------------------+--------+---------+---------

1 | -> Data Node Scan on "__REMOTE_FQS_QUERY__" | 0 | 0 | 0.00

(3 rows)

當關聯建中有非分佈鍵的時候是沒法完全下推的，如果將其中一個表改成複製表（每個dn都有數據），無論另外一張表是如何分佈都是可以完全下推。但是複製表只適合小表

常見非FQS

聚合和排序操作：當查詢需要進行複雜的聚合操作或排序時，通常需要在協調節點上進行。FQS不適合這些情況，因為在數據節點上執行這些操作可能會導致性能下降。
跨多個分佈鍵的連接：如果查詢需要連接多個表，並且這些表的連接條件涉及不同的分佈鍵，FQS可能不是最佳選擇。這樣的查詢可能需要在協調節點上執行，以便正確處理跨多個數據節點的連接。
子查詢和複雜邏輯：包含複雜子查詢或邏輯的查詢通常需要在協調節點上進行，因為這些查詢需要協調多個步驟以生成結果。
涉及外部數據源或函數：如果查詢涉及與外部數據源通信或需要使用資料庫之外的函數，FQS可能無法應用，因為這些操作通常需要在協調節點上執行，函數分三種形態，要分具體情況

總的來說，FQS是一種性能優化工具，適用於許多查詢，但並非所有查詢都適合。資料庫查詢優化通常涉及權衡，需要根據具體查詢和性能需求來選擇合適的執行方式。可以通過觀察執行計劃和性能測試來確定是否應使用FQS。

總結

1、在DWS中，FQS（Fast Query Shipping）是一種查詢優化技術，允許將查詢轉發到數據節點以在數據節點上執行，從而減少數據傳輸和提高查詢性能。

2、DWS中當前主要存在三類計劃：

FQS：是cn直接將原語句下發到dn，各dn單獨執行，並將執行結果在cn上進行彙總
Stream：計劃是CN根據原語句生成計劃並將計划下發給DN進行執行，各DN執行過程中使用Stream運算元進行數據交互。
Remote-Query：CN生成計劃後，將部分原語句下發到DN，各DN單獨執行，執行後將結果發送給CN，CN執行剩餘計劃。

3、僅憑 "REMOTE_FQS_QUERY"是沒法判斷有沒有走索引，還需要進一步驗證，使用enable_fast_query_shipping控制是否使用分散式框架，以此來查看具體的執行計劃，針對優化SQL有幫助。

4、當使用隨機分佈的時候由於數據是隨機分佈的所以在進行關聯查詢的時候該表基本都需要進行重分佈，代價較高。

5、replication模式由於各個節點都有一份數據，所以都可以完全下推，使用replication模式適合查詢頻繁的小表。

6、分佈鍵和非分佈鍵關聯也不能完全下推，這是比較常見的情況，所以在進行表設計的時候分佈鍵欄位類型一致，join的列最好。

7、小小的缺陷：即使SQL走了索引，統計信息表pg_stat_all_indexes和pg_stat_all_table中的index_scan索引掃描次數都是0。

8、應該儘量保證執行計劃是fqs，在fqs的基礎上如果還能繼續優化就可以使用enable_fast_query_shipping關閉完全下推，查看執行計劃針對性的優化。

點擊關註，第一時間瞭解華為雲新鮮技術~