執行計劃路徑選擇 postgresql查詢規划過程中,查詢請求的不同執行方案是通過建立不同的路徑來表達的,在生成許多符合條件的路徑之後,要從中選擇出代價最小的路徑,把它轉化為一個計劃,傳遞給執行器執行,規劃器的核心工作就是生成多條路徑,然後從中找出最優的那一條。 代價評估 評估路徑優劣的依據是用系統 ...
執行計劃路徑選擇
postgresql查詢規划過程中,查詢請求的不同執行方案是通過建立不同的路徑來表達的,在生成許多符合條件的路徑之後,要從中選擇出代價最小的路徑,把它轉化為一個計劃,傳遞給執行器執行,規劃器的核心工作就是生成多條路徑,然後從中找出最優的那一條。
代價評估
評估路徑優劣的依據是用系統表pg_statistic中的統計信息估算出來的不同路徑的代價(cost),PostgreSQL估計計劃成本的方式:基於統計信息估計計劃中各個節點的成本。PostgreSQL會分析各個表來獲取一個統計信息樣本(這個操作通常是由autovacuum這個守護進程周期性的執行analyze,來收集這些統計信息,然後保存到pg_statistic和pg_class裡面)。
用於估算代價的參數postgresql.conf
# - Planner Cost Constants -
#seq_page_cost = 1.0 # measured on an arbitrary scale 順序磁碟掃描時單個頁面的開銷
#random_page_cost = 4.0 # same scale as above 隨機磁碟訪問時單頁面的讀取開銷
#cpu_tuple_cost = 0.01 # same scale as above cpu處理每一行的開銷
#cpu_index_tuple_cost = 0.005 # same scale as above cpu處理每個索引行的開銷
#cpu_operator_cost = 0.0025 # same scale as above cpu處理每個運算符或者函數調用的開銷
#parallel_tuple_cost = 0.1 # same scale as above 計算並行處理的成本,如果成本高於非並行,則不會開啟並行處理。
#parallel_setup_cost = 1000.0 # same scale as above
#min_parallel_relation_size = 8MB
#effective_cache_size = 4GB 再一次索引掃描中可用的文件系統內核緩衝區有效大小
也可以使用 show all的方式查看
路徑的選擇
--查看表信息
db_jcxxglpt=# \d t_jcxxgl_tjaj
Table "db_jcxx.t_jcxxgl_tjaj"
Column | Type | Modifiers
--------------+--------------------------------+-----------
c_bh | character(32) | not null
c_xzdm | character varying(300) |
c_jgid | character(32) |
c_ajbm | character(22) |
...
Indexes:
"t_jcxxgl_tjaj_pkey" PRIMARY KEY, btree (c_bh)
"idx_ttjaj_cah" btree (c_ah)
"idx_ttjaj_dslrq" btree (d_slrq)
首先更新統計信息vacuum analyze t_jcxxgl_tjaj,許多時候可能因為統計信息的不准確導致了不正常的執行計劃
--執行計劃,全表掃描
db_jcxxglpt=# explain (analyze,verbose,costs,buffers,timing)select c_bh,c_xzdm,c_jgid,c_ajbm from t_jcxxgl_tjaj where d_slrq >='2018-03-18';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on db_jcxx.t_jcxxgl_tjaj (cost=0.00..9.76 rows=3 width=96) (actual time=1.031..1.055 rows=3 loops
=1)
Output: c_bh, c_xzdm, c_jgid, c_ajbm
Filter: (t_jcxxgl_tjaj.d_slrq >= '2018-03-18'::date)
Rows Removed by Filter: 138
Buffers: shared hit=8
Planning time: 6.579 ms
Execution time: 1.163 ms
(7 rows)
--執行計劃,關閉全表掃描
db_jcxxglpt=# set session enable_seqscan = off;
SET
db_jcxxglpt=# explain (analyze,verbose,costs,buffers,timing)select c_bh,c_xzdm,c_jgid,c_ajbm from t_jcxxgl_tjaj where d_slrq >='2018-03-18';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Scan using idx_ttjaj_dslrq on db_jcxx.t_jcxxgl_tjaj (cost=0.14..13.90 rows=3 width=96) (actual time=0.012..0.026 rows=3 loops=1)
Output: c_bh, c_xzdm, c_jgid, c_ajbm
Index Cond: (t_jcxxgl_tjaj.d_slrq >= '2018-03-18'::date)
Buffers: shared hit=4
Planning time: 0.309 ms
Execution time: 0.063 ms
(6 rows)
d_slrq上面有btree索引,但是查看執行計劃並沒有走索引,這是為什麼呢?
代價計算:
一個路徑的估算由三部分組成:啟動代價(startup cost),總代價(totalcost),執行結果的排序方式(pathkeys)
代價估算公式:總代價=啟動代價+I/O代價+CPU代價(cost=S+P+W*T)
P:執行時要訪問的頁面數,反應磁碟的I/O次數
T:表示在執行時所要訪問的元組數,反映了cpu開銷
W:表示磁碟I/O代價和CPU開銷建的權重因數
統計信息:統計信息的其中一部分是每個表和索引中項的總數,以及每個表和索引占用的磁碟塊數。這些信息保存在pg_class表的reltuples和relpages列中。我們可以這樣查詢相關信息:
--查看統計信息
db_jcxxglpt=# select relpages,reltuples from pg_class where relname ='t_jcxxgl_tjaj';
relpages | reltuples
----------+-----------
8 | 141
(1 row)
total_cost = 1(seq_page_cost)*8(磁碟總頁數)+0.01(cpu_tuple_cost)*141(表的總記錄數)+0.0025(cpu_operation_cost)*141(表的總記錄數)=9.7625
可以看到走索引的cost=13.90比全表掃描cost=9.76要大。在表較小的情況下,全表掃描比索引掃描更有效, index scan 至少要發生兩次I/O,一次是讀取索引塊,一次是讀取數據塊。
seq_scan源碼
/*
* cost_seqscan
* Determines and returns the cost of scanning a relation sequentially.
*
* 'baserel' is the relation to be scanned
* 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
*/
void
cost_seqscan(Path *path, PlannerInfo *root,
RelOptInfo *baserel, ParamPathInfo *param_info)
{
Cost startup_cost = 0;
Cost cpu_run_cost;
Cost disk_run_cost;
double spc_seq_page_cost;
QualCost qpqual_cost;
Cost cpu_per_tuple;
/* Should only be applied to base relations */
Assert(baserel->relid > 0);
Assert(baserel->rtekind == RTE_RELATION);
/* Mark the path with the correct row estimate */
if (param_info)
path->rows = param_info->ppi_rows;
else
path->rows = baserel->rows;
if (!enable_seqscan)
startup_cost += disable_cost;
/* fetch estimated page cost for tablespace containing table */
get_tablespace_page_costs(baserel->reltablespace, NULL,&spc_seq_page_cost);
/*
* disk costs
*/
disk_run_cost = spc_seq_page_cost * baserel->pages;
/* CPU costs */
get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
startup_cost += qpqual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
cpu_run_cost = cpu_per_tuple * baserel->tuples;
/* tlist eval costs are paid per output row, not per tuple scanned */
startup_cost += path->pathtarget->cost.startup;
cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
/* Adjust costing for parallelism, if used. */
if (path->parallel_workers > 0)
{
double parallel_divisor = get_parallel_divisor(path);
/* The CPU cost is divided among all the workers. */
cpu_run_cost /= parallel_divisor;
/*
* It may be possible to amortize some of the I/O cost, but probably
* not very much, because most operating systems already do aggressive
* prefetching. For now, we assume that the disk run cost can't be
* amortized at all.
*/
/*
* In the case of a parallel plan, the row count needs to represent
* the number of tuples processed per worker.
*/
path->rows = clamp_row_est(path->rows / parallel_divisor);
}
path->startup_cost = startup_cost;
path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
}
一個SQL優化實例
慢SQL:
select c_ajbh, c_ah, c_cbfy, c_cbrxm, d_larq, d_jarq, n_dbjg, c_yqly from db_zxzhld.t_zhld_db dbxx join db_zxzhld.t_zhld_ajdbxx dbaj
on dbxx.c_bh = dbaj.c_dbbh where dbxx.n_valid=1 and dbxx.n_state in (1,2,3) and dbxx.c_dbztbh='1003'
and dbaj.c_zblx='1003' and dbaj.c_dbfy='0' and dbaj.c_gy = '2550'
and c_ajbh in (select distinct c_ajbh from db_zxzhld.t_zhld_zbajxx where n_dbzt = 1 and c_zblx = '1003' and c_gy = '2550' )
order by d_larq asc, c_ajbh asc limit 15 offset 0
慢sql耗時:7s
咋們先過下這個sql是乾什麼的、首先dbxx和dbaj的一個join連接然後dbaj.c_ajbh要包含在zbaj表裡面,做了個排序,取了15條記錄、大概就這樣。
Sql有個缺點就是我不知道查詢的欄位是從那個表裡面取的、建議加上表別名.欄位。
查看該sql的表的數據量:
db_zxzhld.t_zhld_db :1311
db_zxzhld.t_zhld_ajdbxx :341296
db_zxzhld.t_zhld_zbajxx :1027619
執行計劃:
01 Limit (cost=36328.67..36328.68 rows=1 width=107) (actual time=88957.677..88957.729 rows=15 loops=1)
02 -> Sort (cost=36328.67..36328.68 rows=1 width=107) (actual time=88957.653..88957.672 rows=15 loops=1)
03 Sort Key: dbaj.d_larq, dbaj.c_ajbh
04 Sort Method: top-N heapsort Memory: 27kB
05 -> Nested Loop Semi Join (cost=17099.76..36328.66 rows=1 width=107) (actual time=277.794..88932.662 rows=8605 loops=1)
06 Join Filter: ((dbaj.c_ajbh)::text = (t_zhld_zbajxx.c_ajbh)::text)
07 Rows Removed by Join Filter: 37018710
08 -> Nested Loop (cost=0.00..19200.59 rows=1 width=107) (actual time=199.141..601.845 rows=8605 loops=1)
09 Join Filter: (dbxx.c_bh = dbaj.c_dbbh)
10 Rows Removed by Join Filter: 111865
11 -> Seq Scan on t_zhld_ajdbxx dbaj (cost=0.00..19117.70 rows=219 width=140) (actual time=198.871..266.182 rows=8605 loops=1)
12 Filter: ((n_valid = 1) AND ((c_zblx)::text = '1003'::text) AND ((c_dbfy)::text = '0'::text) AND ((c_gy)::text = '2550'::text))
13 Rows Removed by Filter: 332691
14 -> Materialize (cost=0.00..66.48 rows=5 width=33) (actual time=0.001..0.017 rows=14 loops=8605)
15 -> Seq Scan on t_zhld_db dbxx (cost=0.00..66.45 rows=5 width=33) (actual time=0.044..0.722 rows=14 loops=1)
16 Filter: ((n_valid = 1) AND ((c_dbztbh)::text = '1003'::text) AND (n_state = ANY ('{1,2,3}'::integer[])))
17 Rows Removed by Filter: 1297
18 -> Materialize (cost=17099.76..17117.46 rows=708 width=32) (actual time=0.006..4.890 rows=4303 loops=8605)
19 -> HashAggregate (cost=17099.76..17106.84 rows=708 width=32) (actual time=44.011..54.924 rows=8605 loops=1)
20 Group Key: t_zhld_zbajxx.c_ajbh
21 -> Bitmap Heap Scan on t_zhld_zbajxx (cost=163.36..17097.99 rows=708 width=32) (actual time=5.218..30.278 rows=8605 loops=1)
22 Recheck Cond: ((n_dbzt = 1) AND ((c_zblx)::text = '1003'::text))
23 Filter: ((c_gy)::text = '2550'::text)
24 Rows Removed by Filter: 21849
25 Heap Blocks: exact=960
26 -> Bitmap Index Scan on i_tzhldzbajxx_zblx_dbzt (cost=0.00..163.19 rows=5876 width=0) (actual time=5.011..5.011 rows=30458 loops=1)
27 Index Cond: ((n_dbzt = 1) AND ((c_zblx)::text = '1003'::text))
28 Planning time: 1.258 ms
29 Execution time: 88958.029 ms
執行計劃解讀:
1:第27->21行,通過索引i_tzhldzbajxx_zblx_dbzt過濾表t_zhld_zbajxx的數據,然後根據過濾條件(c_gy)::text = '2550'::text過濾最終返回8605條數據
2:第17->15行,根據條件過濾t_zhld_db表的數據,最終返回了14條數據
3:第20->19行,對錶t_zhld_zbajxx做group by的操作
4:第13->11行,全表掃描t_zhld_ajdbxx 最終返回了8605條數據