恢復內容開始 [20170603]12c Top Frequency histogram.txt--//個人對直方圖瞭解很少,以前2種直方圖類型對於目前的許多應用來講已經足夠,或者講遇到的問題很少.--//抽一點點時間,簡單探究12c Top Frequency histogram.--//以前的頻 ...
---恢復內容開始---
[20170603]12c Top Frequency histogram.txt
--//個人對直方圖瞭解很少,以前2種直方圖類型對於目前的許多應用來講已經足夠,或者講遇到的問題很少.
--//抽一點點時間,簡單探究12c Top Frequency histogram.
--//以前的頻率直方圖Frequency histogram,受限bucket(桶的大小),如果有255個不同值,oracle分析後不會建立頻率直方圖,而是建立高
--//度直方圖.這樣的情況會導致一些流行值的統計在顯示執行計劃時差距很大.而12c引入了Top Frequency histogram,註意這裡的top,
--//我的理解就是流行值(popular),也就是這樣建立的直方圖僅僅包括popular,其他non-popular不考慮,這樣在sql語句的查詢這些
--//popular時,顯示的統計信息相對準確,從而有利於oracle選擇正確的執行計劃.
--//以下是我的學習筆記,也許會存在許多錯誤,僅僅做一個記錄.我也看了許多別人的blog.^_^.而且我目前的環境只有12.0.0.1(版本太
--//低).
1.環境:
SCOTT@test01p> @ ver1
PORT_STRING VERSION BANNER CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0 12.1.0.1.0 Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production 0
--//如果要建立Top Frequency histogram必須要滿足幾個條件:
--//鏈接 raajeshwaran.blogspot.co.id/2016/06/top-frequency-histogram-in-12c.html
The database creates a Top frequency histogram, when the following criteria are met.
NDV is greater than n, where n is the requested number of buckets (default 254)
The percentage of rows occupied by Top-frequent values is greater than or equal to the threshold p where p is (1-(1/n)*100).
The estimate_percent parameter in dbms_stats gathering procedure should be auto_sample_size (set to default)
--//翻譯過來NDV(也就是欄位的不同值)大於N(指bucket的數量).
--//流行值(popular)在Top-frequent中合計數量/總計數量之比要大於(1-(1/n)*100).如果建立10個桶,這樣流行值的總計必須在90%以上
2.首先驗證(1-(1/n)*100)比值是否正確:
SCOTT@test01p> create table t as select * from dba_objects;
Table created.
select column_name,num_distinct,density,histogram,SAMPLE_SIZE
from user_tab_col_statistics
where table_name ='T'
and column_name ='OWNER';
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 .03125 NONE 91695
--//12c ctas 建立統計信息,但是不會建立直方圖.density 1/32=.03125.
SCOTT@test01p> select count(*) from t;
COUNT(*)
----------
91695
--//隨手寫的sql語句:
with a as (select distinct owner,count(*) over(partition by owner) n1 ,count(*) over () n2 from t order by 2 desc ),
b as (select owner,n1,n2,sum(n1) over (order by n1 desc) n3 from a order by n1 desc)
select rownum,owner,n1,n2,n3,round(n3/n2,5) x1,round(1-1/rownum,5) x2 from b;
ROWNUM OWNER N1 N2 N3 X1 X2
------ ----------------- ----- ---------- ---------- ---------- ----------
1 SYS 41942 91695 41942 .45741 0
2 PUBLIC 37142 91695 79084 .86247 .5
3 APEX_040200 3405 91695 82489 .8996 .66667
4 ORDSYS 3157 91695 85646 .93403 .75
5 MDSYS 1819 91695 87465 .95387 .8
6 XDB 985 91695 88450 .96461 .83333
7 SYSTEM 641 91695 89091 .9716 .85714
8 CTXSYS 405 91695 89496 .97602 .875
9 WMSYS 387 91695 89883 .98024 .88889
10 DVSYS 352 91695 90235 .98408 .9
11 SH 309 91695 90544 .98745 .90909
12 ORDDATA 292 91695 90836 .99063 .91667
13 LBACSYS 209 91695 91045 .99291 .92308
14 OE 142 91695 91187 .99446 .92857
15 SCOTT 96 91695 91283 .99551 .93333
16 GSMADMIN_INTERNAL 77 91695 91360 .99635 .9375
17 IX 58 91695 91418 .99698 .94118
18 DBSNMP 55 91695 91473 .99758 .94444
19 PM 44 91695 91517 .99806 .94737
20 HR 35 91695 91552 .99844 .95
21 OLAPSYS 25 91695 91577 .99871 .95238
22 OJVMSYS 23 91695 91600 .99896 .95455
23 DVF 19 91695 91619 .99917 .95652
24 FLOWS_FILES 13 91695 91632 .99931 .95833
25 AUDSYS 12 91695 91644 .99944 .96
26 ORDPLUGINS 10 91695 91664 .99966 .96154
27 OUTLN 10 91695 91664 .99966 .96296
28 BI 8 91695 91688 .99992 .96429
29 ORACLE_OCM 8 91695 91688 .99992 .96552
30 SI_INFORMTN_SCHEM 8 91695 91688 .99992 .96667
31 APPQOSSYS 5 91695 91693 .99998 .96774
32 TEST 2 91695 91695 1 .96875
--//如果加入條件where round(n3/n2,5) >round(1-1/rownum,5),全部輸出.也就是這樣如果桶小於32,大於1.建立的都是Top Frequency.
3.繼續測試:
D:\temp>cat a1.sql
cat a1.sql
exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',method_opt=>'for columns owner size &1');
select column_name,num_distinct,density,histogram,SAMPLE_SIZE from user_tab_col_statistics where table_name ='T' and column_name ='OWNER';
SCOTT@test01p> @ a1.sql 2
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 .03125 HYBRID 5500
SCOTT@test01p> @ a1.sql 3
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 5.4529E-06 TOP-FREQUENCY 91695
SCOTT@test01p> @ a1.sql 4
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 5.4529E-06 TOP-FREQUENCY 91695
SCOTT@test01p> @ a1.sql 31
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 5.4529E-06 TOP-FREQUENCY 91695
SCOTT@test01p> @ a1.sql 32
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 5.4529E-06 FREQUENCY 91695
--//除了bucket=2,32建立的直方圖HYBRID,FREQUENCY外,建立的都是TOP-FREQUENCY.
--//以10個bucket為例.解方程式(90235-x)/(91695-x)=0.9 ,得到x=77095.也就是要減少77095.
--//delete t where owner='SYS' and rownum<=41000;
--//delete t where owner='PUBLIC' and rownum<=36095;
SCOTT@test01p> delete t where owner='SYS' and rownum<=41000;
41000 rows deleted.
SCOTT@test01p> delete t where owner='PUBLIC' and rownum<=36095;
36095 rows deleted.
SCOTT@test01p> commit ;
Commit complete.
with a as (select distinct owner,count(*) over(partition by owner) n1 ,count(*) over () n2 from t order by 2 desc ),
b as (select owner,n1,n2,sum(n1) over (order by n1 desc) n3 from a order by n1 desc)
select rownum,owner,n1,n2,n3,round(n3/n2,5) x1,round(1-1/rownum,5) x2 from b where rownum<=11;
ROWNUM OWNER N1 N2 N3 X1 X2
------ ----------- ---- ---------- ---------- ---------- ----------
1 APEX_040200 3405 14600 3405 .23322 0
2 ORDSYS 3157 14600 6562 .44945 .5
3 MDSYS 1819 14600 8381 .57404 .66667
4 PUBLIC 1047 14600 9428 .64575 .75
5 XDB 985 14600 10413 .71322 .8
6 SYS 942 14600 11355 .77774 .83333
7 SYSTEM 641 14600 11996 .82164 .85714
8 CTXSYS 405 14600 12401 .84938 .875
9 WMSYS 387 14600 12788 .87589 .88889
10 DVSYS 352 14600 13140 .9 .9
11 SH 309 14600 13449 .92116 .90909
11 rows selected.
--//backet=10,前面10個值占90%.
SCOTT@test01p> @ a1 10
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 .000034247 TOP-FREQUENCY 14600
--//再減少1條記錄.
SCOTT@test01p> delete t where owner='SYS' and rownum<=1;
1 row deleted.
SCOTT@test01p> commit ;
Commit complete.
ROWNUM OWNER N1 N2 N3 X1 X2
------ ----------- ---- ---------- ---------- ---------- ----------
1 APEX_040200 3405 14599 3405 .23324 0
2 ORDSYS 3157 14599 6562 .44948 .5
3 MDSYS 1819 14599 8381 .57408 .66667
4 PUBLIC 1047 14599 9428 .6458 .75
5 XDB 985 14599 10413 .71327 .8
6 SYS 941 14599 11354 .77772 .83333
7 SYSTEM 641 14599 11995 .82163 .85714
8 CTXSYS 405 14599 12400 .84937 .875
9 WMSYS 387 14599 12787 .87588 .88889
10 DVSYS 352 14599 13139 .89999 .9
11 SH 309 14599 13448 .92116 .90909
11 rows selected.
--//現在前10占.89999.
SCOTT@test01p> @ a1 10
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 .018378 HYBRID 14599
--//可以發現建立的直方圖不是TOP-FREQUENCY,而是HYBRID(混合型直方圖).
4.轉化成TOP-FREQUENCY.
SCOTT@test01p> insert into t select * from dba_objects where owner='SYS' and rownum=1;
1 row created.
SCOTT@test01p> commit ;
Commit complete.
SCOTT@test01p> @ a1 10
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM SAMPLE_SIZE
-------------------- ------------ ---------- --------------- -----------
OWNER 32 .000034247 TOP-FREQUENCY 14600
--//DENSITY=1/SAMPLE_SIZE/2, 1/14600/2=.00003424657534246575正好符合.
5.現在看看執行計劃:
SCOTT@test01p> select count(*) from t where owner='DVSYS';
COUNT(*)
----------
352
SCOTT@test01p> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 2at34f0zaqhzj, child number 0
-------------------------------------
select count(*) from t where owner='DVSYS'
Plan hash value: 2966233522
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 428 (100)| | 1 |00:00:00.01 | 1544 |
| 1 | SORT AGGREGATE | | 1 | 1 | 8 | | | 1 |00:00:00.01 | 1544 |
|* 2 | TABLE ACCESS FULL| T | 1 | 352 | 2816 | 428 (0)| 00:00:01 | 352 |00:00:00.01 | 1544 |
---------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / T@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OWNER"='DVSYS')
--//可以發現e_rows=A-Rows.可以發現非常準確.
--//查看直方圖信息.
SCOTT@test01p> select endpoint_number,endpoint_actual_value from user_tab_histograms where table_name ='T' and column_name ='OWNER' order by 1;
ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU
--------------- --------------------
3405 APEX_040200
3810 CTXSYS
4162 DVSYS
5981 MDSYS
9138 ORDSYS
10185 PUBLIC
11127 SYS
11768 SYSTEM
12155 WMSYS
13140 XDB
10 rows selected.
--//4162-3810=352,可以發現正好符合.也就是popular值統計很正確.看看非popular值.
SCOTT@test01p> select count(*) from t where owner='DVSYS1';
COUNT(*)
----------
0
Plan hash value: 2966233522
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 428 (100)| | 1 |00:00:00.01 | 1544 |
| 1 | SORT AGGREGATE | | 1 | 1 | 8 | | | 1 |00:00:00.01 | 1544 |
|* 2 | TABLE ACCESS FULL| T | 1 | 66 | 528 | 428 (0)| 00:00:01 | 0 |00:00:00.01 | 1544 |
---------------------------------------------------------------------------------------------------------------------
SCOTT@test01p> select count(*) from t where owner='SCOTT';
COUNT(*)
----------
96
Plan hash value: 2966233522
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 428 (100)| | 1 |00:00:00.01 | 1544 |
| 1 | SORT AGGREGATE | | 1 | 1 | 8 | | | 1 |00:00:00.01 | 1544 |
|* 2 | TABLE ACCESS FULL| T | 1 | 66 | 528 | 428 (0)| 00:00:01 | 96 |00:00:00.01 | 1544 |
---------------------------------------------------------------------------------------------------------------------
--//可以發現估計E-Rows如何計算的呢?
6.做10053跟蹤:
D:\tools\sqllaji>cat 10053x.sql
execute dbms_sqldiag.dump_trace(p_sql_id=>'&1',p_child_number=>&2,p_component=>'Compiler',p_file_id=>'&&1');
SCOTT@test01p> @ 10053x f31kz63ksu1tc 0
PL/SQL procedure successfully completed.
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T[T]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
Column (#1):
NewDensity:0.004545, OldDensity:0.000034 BktCnt:13140.000000, PopBktCnt:13140.000000, PopValCnt:10, NDV:32
Column (#1): OWNER(VARCHAR2)
AvgLen: 8 NDV: 32 Nulls: 0 Density: 0.000000
Histogram: Top-Freq #Bkts: 13140 UncompBkts: 13140 EndPtVals: 10 ActualVal: yes
Table: T Alias: T
Card: Original: 14600.000000 Rounded: 66 Computed: 66.36 Non Adjusted: 66.36
Access Path: TableScan
Cost: 428.36 Resp: 428.36 Degree: 0
Cost_io: 428.00 Cost_cpu: 14122025
Resp_io: 428.00 Resp_cpu: 14122025
Best:: AccessPath: TableScan
Cost: 428.36 Degree: 1 Resp: 428.36 Card: 66.36 Bytes: 0
check parallelism for statement[<unnamed>]
kkfdtParallel: parallel is possible (no statement type restrictions)
kkfdPaForcePrm: dop:1 ()
use dictionary DOP(1) on table
kkfdPaPrm:- The table : 106380
kkfdPaPrm:DOP = 1 (computed from hint/dictionary/autodop)
kkfdiPaPrm: dop:1 serial(?)
***************************************
--//使用 NewDensity:0.004545.
BktCnt:13140.000000, PopBktCnt:13140.000000 => 對應就是前10個流行值的總和.
--//非流行值的數量: 14600-13140=1460
--//非流行值的桶數量: 32-10=22
--//非流行值的數量/非流行值的桶數量 1460/22=66.36363636363636363636,四捨五入66,正好符合執行計劃的推斷.
--//NewDensity的計算 =1460/14600/22=.00454545454545454545,非常接近.
---恢復內容結束---