[20200401]優化的困惑5.txt--//春節前對一個生產系統做優化,完成後使用ash_wait_chains.sql檢查,發現control file parallel write有點多。--//當時並沒有在意,總感覺哪裡不對,感覺這套系統磁碟IO有問題,現在有空分析看看。1.環境:> @ ...
[20200401]優化的困惑5.txt
--//春節前對一個生產系統做優化,完成後使用ash_wait_chains.sql檢查,發現control file parallel write有點多。
--//當時並沒有在意,總感覺哪裡不對,感覺這套系統磁碟IO有問題,現在有空分析看看。
1.環境:
> @ ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.3.0 Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
2.問題提出:
> @ tpt/ash/dash_wait_chains event2 1=1 trunc(sysdate-2) trunc(sysdate-1)
%This SECONDS AAS WAIT_CHAIN
------ ---------- ------- ---------------------------------------------
59% 2790 .0 -> control file parallel write
21% 980 .0 -> ON CPU
13% 640 .0 -> log file parallel write
4% 170 .0 -> db file async I/O submit
1% 70 .0 -> log file sync -> log file parallel write
1% 60 .0 -> db file sequential read
0% 20 .0 -> LNS wait on SENDREQ
0% 10 .0 -> os thread startup
0% 10 .0 -> ADR block file read
9 rows selected.
--//trunc(sysdate-2) trunc(sysdate-1) 範圍在2020/3/29 2020/3/30 之間,是星期天.
--//很明顯這是一套根本不忙的系統(我已經做了許多優化),當然的時間區間也很大,不過還是可以看出伺服器不忙。
--//同時很不理解為什麼control file parallel write能達到2790秒。
3.分析:
# lsof /u01/app/oracle/oradata/xxxyyy/control01.ctl
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
oracle 3827 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 3829 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 3831 oracle 256uW REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
oracle 3837 oracle 257u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 3861 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 3995 oracle 257u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 4112 oracle 257u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 10221 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 18633 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 18688 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 18732 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 23969 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
oracle 24082 oracle 256u REG 8,3 11812864 35061901 /u01/app/oracle/oradata/xxxyyy/control01.ctl
--//僅僅進程3831有寫。
# ps -ef | grep 383[1]
oracle 3831 1 0 2016 ? 04:13:39 ora_ckpt_xxxyyy
--//正是ckpt進程。
SYS@xxxyyy> show parameter filesystem
NAME TYPE VALUE
-------------------- ------ -------
filesystemio_options string ASYNCH
--//打開了磁碟非同步IO。
$ ls -l /proc/3831/fd | grep control0
lrwx------ 1 oracle oinstall 64 2020-03-31 09:18:42 256 -> /u01/app/oracle/oradata/xxxyyy/control01.ctl
lrwx------ 1 oracle oinstall 64 2020-03-31 09:18:42 257 -> /u01/app/oracle/oradata/xxxyyy/control02.ctl
--//控制文件對應文件句柄是256,257。
$ strace -f -e io_submit -Ttt -p 3831
Process 3831 attached - interrupt to quit
09:24:52.522886 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.022249>
09:24:53.545944 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.015565>
09:24:54.561196 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.016591>
09:24:55.579054 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.015069>
09:24:56.594928 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.015453>
09:24:57.612264 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.014407>
09:24:58.627982 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.023324>
09:24:59.652000 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.032261>
09:25:00.685230 io_submit(47787382317056, 2, {{0x2b765df0c210, 0, 1, 0, 256}, {0x2b765df0c460, 0, 1, 0, 257}}) = 2 <0.081960>
Process 3831 detached
--//你可以發現每秒都調用一次io_submit寫入,時間間隔大約1秒1次,實際上就是寫增量檢查點,資料庫很空閑。
0.022249+0.015565+0.016591+0.015069+0.015453+0.014407+0.023324+0.032261+0.081960 = .236879
.236879/9 = .02632,平均占.02632秒。
--//如果換算成毫秒的話,相當於26ms.這是一個相當差勁的IO,我嚴重懷疑BIOS沒有打開寫回功能(write back),而是設置在寫通模式
--//(write through)上.
--//相當於一天調用 86400 /(1+.02632) = 84184次。需要84184*.02632 = 2215.72288秒,當然這是io_submit的時間,並非control file
--//parallel write.非常接近上面看到control file parallel write=2790秒。
--//再加大取樣量看看。
# strace -f -e io_submit -Ttt -p 3831 -o/tmp/aaa
Process 3831 attached - interrupt to quit
Process 3831 detached
# cat /tmp/aaa | awk '{print $17}' | tr -d "<>" | awk '{j++;i=i+$NF}END{print j,i,i/j}'
42 1.04516 0.0248849
--//取樣42次,0.0248849秒相當於25ms。相當慢的磁碟IO。
4.如何確定磁碟在write back還是write Through狀態呢?
--//方法一:首先想到的進入BIOS檢查,當然目前不行。
--//方法二:dmesg,沒有發現相關信息。
--//方法三:我知道dell隨機光碟裡面有一個使用程式可以通過瀏覽器查看硬體配置的程式。這個比較麻煩,機器已經太老了。相關資料
--//放在哪裡也不知道。
--//方法四:hdparm,看了一下man文檔也沒有發現相關設置與檢查。
# hdparm 2>&1 | grep -i write
--Istdout write identify data to stdout as ASCII hex
-n get/set ignore-write-errors flag (0/1)
-W set drive write-caching flag (0/1) (DANGEROUS)
--//看man hdparm,感覺這個命令很危險DANGEROUS,放棄。
--//方法五:想想linux還有什麼命令瞭解硬體配置呢,馬上想到dmidecode。
# dmidecode | grep -i write
Operational Mode: Write Through
Operational Mode: Write Through
Operational Mode: Write Back
Operational Mode: Write Through
Operational Mode: Write Through
Operational Mode: Write Back
--//很明顯了安裝人員沒有很好的設置磁碟操作模式。
# dmidecode > /tmp/aaa1
# dmidecode -t 7
# dmidecode 2.11
SMBIOS 2.7 present.
Handle 0x0700, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not Specified
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Through
Location: Internal
Installed Size: 128 kB
Maximum Size: 128 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 8-way Set-associative
Handle 0x0701, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not Specified
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Through
Location: Internal
Installed Size: 1024 kB
Maximum Size: 1024 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x0702, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not Specified
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 10240 kB
Maximum Size: 10240 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: <OUT OF SPEC>
Handle 0x0703, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not Specified
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Through
Location: Internal
Installed Size: 0 kB
Maximum Size: 0 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Unknown
System Type: Data
Associativity: Unknown
Handle 0x0704, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not Specified
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Through
Location: Internal
Installed Size: 0 kB
Maximum Size: 0 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Unknown
System Type: Unified
Associativity: Unknown
Handle 0x0705, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not Specified
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 0 kB
Maximum Size: 0 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Unknown
System Type: Unified
Associativity: Unknown
# dmidecode -t 7 | egrep -i "write|Installed Size"
Operational Mode: Write Through
Installed Size: 128 kB
Operational Mode: Write Through
Installed Size: 1024 kB
Operational Mode: Write Back
Installed Size: 10240 kB
Operational Mode: Write Through
Installed Size: 0 kB
Operational Mode: Write Through
Installed Size: 0 kB
Operational Mode: Write Back
Installed Size: 0 kB
--//找了一臺相似的機器檢查發現:
# dmidecode | grep -i write
Operational Mode: Write Back
Operational Mode: Write Back
Operational Mode: Write Back
Operational Mode: Write Back
Operational Mode: Write Back
Operational Mode: Write Back
--//基本可以驗證我的推斷,安裝OS的人員沒有很好的設置BIOS,設置在Write Through模式導致寫入磁碟IO有點慢。
--//要麼還有一種可能就是板上可能有電池,已經沒電了,無法設置在write back模式。僅僅是我的推斷。
5.順便測試read看看:
--//session 1:
> @ spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
------------ ------------ ------------------------ --------- ------ ------- ------------ --------------------------------------------------
843 60253 40936 DEDICATED 4912 126 56 alter system kill session '843,60253' immediate;
> @ check
檢查點隊列
當前時間 臟塊數量
low_rba on_disk_rba on_disk_rba_scn( on_disk_rba_time(CP full checkpoint_rba full_checkpoint( full_checkpoint_tim diff_scn(on_disk_rdb-ch_scn) current_group SYSDATE CPDRT
-------------------- -------------------- ---------------- ------------------- --------------------- ---------------- ------------------- ---------------------------- ------------- ------------------- ------------
12898.27952.0 12898.31085.0 30787894340 2020-04-02 09:45:35 12898.2.16 30787875482 2020-04-02 08:53:27 18858 1 2020-04-02 09:45:36 486
--//session 2:
# strace -x -f -e pread,io_getevents,io_submit -Ttt -p 4912
Process 4912 attached - interrupt to quit
09:45:36.707530 pread(257, "\x15\xc2\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x04\x15\x4c\x00\x00\x00\x00\x00\x00\x00\x00\x20\x0b\x56\x37\x09\x0f"..., 16384, 16384) = 16384 <0.000024>
09:45:36.707785 pread(257, "\x15\xc2\x00\x00\x0f\x00\x00\x00\xd7\x59\x19\x00\xff\xff\x01\x04\xa8\x3c\x00\x00\x00\x47\x00\x00\x00\x00\x00\x00\x00\x00\x00\x45"..., 16384, 245760) = 16384 <0.000018>
09:45:36.707888 pread(257, "\x15\xc2\x00\x00\x11\x00\x00\x00\xd7\x59\x19\x00\xff\xff\x01\x04\xc4\x34\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd6\x6a\xdc\x33"..., 16384, 278528) = 16384 <0.000018>
09:45:36.707985 pread(257, "\x15\xc2\x00\x00\x14\x00\x00\x00\xc8\x59\x19\x00\xff\xff\x01\x04\x2e\x77\x00\x00\x0f\x00\x00\x00\x9a\xb2\x19\x2b\x07\x00\x1c\x0a"..., 16384, 327680) = 16384 <0.000017>
09:45:36.708108 pread(257, "\x15\xc2\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x04\x5b\x9a\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\xe6\x01\x00\x00"..., 16384, 49152) = 16384 <0.000019>
Process 4912 detached
--//非同步IO讀取使用pread函數嗎,資料庫沒有使用asm原因?
--//參考鏈接http://blog.itpub.net/267265/viewspace-2222208/,執行如下:
> @ viewsess 'physical read total'
NAME STATISTIC# VALUE SID
---------------------------------------- ---------- ------------ ------------
physical read total IO requests 48 273 843
physical read total multi block requests 49 6 843
physical read total bytes 52 6833152 843
> @ check
檢查點隊列
當前時間 臟塊數量
low_rba on_disk_rba on_disk_rba_scn( on_disk_rba_time(CP full checkpoint_rba full_checkpoint( full_checkpoint_tim diff_scn(on_disk_rdb-ch_scn) current_group SYSDATE CPDRT
-------------------- -------------------- ---------------- ------------------- --------------------- ---------------- ------------------- ---------------------------- ------------- ------------------- ------------
12898.34936.0 12898.36928.0 30787898313 2020-04-02 09:58:32 12898.2.16 30787875482 2020-04-02 08:53:27 22831 1 2020-04-02 09:58:33 332
> @ viewsess 'physical read total'
NAME STATISTIC# VALUE SID
---------------------------------------- ---------- ------------ ------------
physical read total IO requests 48 278 843
physical read total multi block requests 49 6 843
physical read total bytes 52 6915072 843
--//上下比較,可以發現physical read total IO requests增加5次(278-273=5),physical read total bytes增加 6915072-6833152=81920,正好等於16384*5= 81920.
--//你可以發現讀16384位元組 每次pread的時間很小。
總結:
--//再次提醒自己註意一些細節。資料庫上線前給仔細檢查,實際上還是我提到的分工問題,如果節點上的人做好自己的工作,
--//鏈接:http://blog.itpub.net/267265/viewspace-2102914/ => [20160519]淺談行業分工.txt ,這台機器就是當年的這台伺服器。
--//實際上如果當時我沒有再仔細看,這個問題就給划過了。
--//還有一點要說明的是通過awr報表很容易划過這個問題。
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU 265 73.26
log file sync 8,377 98 12 27.09 Commit
db file sequential read 6,670 15 2 4.28 User I/O
enq: KO - fast object checkpoint 151 7 47 1.95 Application
SQL*Net more data to client 176,457 3 0 0.82 Network
--//負載太輕了。不在前臺等待事件。
Background Wait Events
ordered by wait time desc, waits desc (idle events last)
Only events with Total Wait Time (s) >= .001 are shown
%Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time
log file parallel write 9,963 0 112 11 1.16 37.05
control file parallel write 3,610 0 106 29 0.42 35.23
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
db file async I/O submit 877 0 51 59 0.10 17.08
os thread startup 37 0 1 25 0.00 0.30
LNS wait on SENDREQ 9,935 0 1 0 1.16 0.17
--//註意看 Avg wait (ms) ,只能講自己功力還是不夠,沒有仔細看這些細節,主要精力放在前臺事件的優化上了,忽略後後臺事件。
--//磁碟IO存在問題。
--//附上測試腳本:
$ cat check.sql
column "full checkpoint_rba" format a21
column low_rba format a20
column low_rba16 format a20
column on_disk_rba format a20
column on_disk_rba16 format a20
column rtckp_rba format a20
column diff_date format 9999999.99
rem column CPOSD_ono_disk_rba_scn format 99999999999999999999999999999999
column cpdrt heading "檢查點隊列|臟塊數量|CPDRT"
column cpodt_on_disk_rba heading "檢查點隊列|on disk rba|時間戳|CPODT"
column cpods heading "檢查點隊列|on disk rba scn|CPODS"
column cphbt heading "檢查點心跳|CPHBT"
column current_sysdate heading "當前時間|SYSDATE"
set num 12
SELECT b.cplrba_seq || '.' || b.cplrba_bno || '.' || b.cplrba_bof "low_rba"
,b.cpodr_seq || '.' || b.cpodr_bno || '.' || b.cpodr_bof "on_disk_rba"
,b.CPODS "on_disk_rba_scn(CPODS)"
,TO_DATE (b.CPODT, 'MM-DD-YYYY HH24:MI:SS') "on_disk_rba_time(CPODT)"
,a.rtckp_rba_seq || '.' || a.rtckp_rba_bno || '.' || a.rtckp_rba_bof
"full checkpoint_rba"
,a.rtckp_scn "full_checkpoint(rtckp_scn)"
,TO_DATE (a.rtckp_tim, 'MM-DD-YYYY HH24:MI:SS')
"full_checkpoint_time_rtckp_tim"
,b.CPODS - a.rtckp_scn "diff_scn(on_disk_rdb-ch_scn)"
,a.rtcln "current_group"
,sysdate current_sysdate
,CPDRT
FROM x$kccrt a, x$kcccp b
WHERE a.rtnum = b.cptno AND A.INST_ID = b.inst_id;
$ cat viewsess.sql
set verify off
column name format a70
SELECT b.NAME, a.statistic#, a.VALUE,a.sid
FROM v$mystat a, v$statname b
WHERE lower(b.NAME) like lower('%&1%') AND a.statistic# = b.statistic#
and a.value>0;