## 一:背景 ### 1. 講故事 前些天有位朋友微信上找到我,說他們學校的Web系統記憶體一直下不去,讓我看下到底是怎麼回事,老規矩讓朋友生成一個dump文件丟給我,看一下便知。 ## 二:WinDbg 分析 ### 1. 托管還是非托管 要想看托管還是非托管,可以用 `!address -sum ...
一:背景
1. 講故事
前些天有位朋友微信上找到我,說他們學校的Web系統記憶體一直下不去,讓我看下到底是怎麼回事,老規矩讓朋友生成一個dump文件丟給我,看一下便知。
二:WinDbg 分析
1. 托管還是非托管
要想看托管還是非托管,可以用 !address -summary
觀察下記憶體段。
0:000> !address -summary
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 384 7dbe`3d662000 ( 125.743 TB) 98.24%
<unknown> 9653 241`9e9fd000 ( 2.256 TB) 99.98% 1.76%
Stack 153 0`125d0000 ( 293.812 MB) 0.01% 0.00%
Image 1912 0`0b0f2000 ( 176.945 MB) 0.01% 0.00%
Heap 110 0`0669a000 ( 102.602 MB) 0.00% 0.00%
Other 12 0`001ce000 ( 1.805 MB) 0.00% 0.00%
TEB 51 0`00066000 ( 408.000 kB) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 384 7dbe`3d662000 ( 125.743 TB) 98.24%
MEM_RESERVE 2794 23f`666dd000 ( 2.248 TB) 99.59% 1.76%
MEM_COMMIT 9098 2`5c2b1000 ( 9.440 GB) 0.41% 0.01%
從卦中信息的 MEM_COMMIT
和 Heap
來看,應該就是托管記憶體的問題了。
2. 托管堆排查
要查看托管堆,可以用 !eeheap -gc
觀察下。
0:000> !eeheap -gc
========================================
Number of GC Heaps: 8
----------------------------------------
Heap 0 (00000273748727a0)
Small object heap
segment begin allocated committed allocated size committed size
generation 0:
02b388ff24f0 027615400020 0276157f76f8 027615800000 0x3f76d8 (4159192) 0x400000 (4194304)
...
generation 1:
02b388ff5e00 02762a000020 02762a01e460 02762a0b0000 0x1e440 (123968) 0xb0000 (720896)
generation 2:
02b388f81840 027385000020 0273853a2b20 027385400000 0x3a2b00 (3812096) 0x400000 (4194304)
02b388f818f0 027385400020 0273857ee220 027385800000 0x3ee200 (4121088) 0x400000 (4194304)
02b388f81e70 027387400020 0273877914e8 0273877b2000 0x3914c8 (3740872) 0x3b2000 (3874816)
02b388f823f0 027389400020 0273897bf580 0273897df000 0x3bf560 (3929440) 0x3df000 (4059136)
02b388f82600 02738a000020 02738a3644e0 02738a368000 0x3644c0 (3556544) 0x368000 (3571712)
...
02b388f85fc0 02739f000020 02739f253e10 02739f257000 0x253df0 (2440688) 0x257000 (2453504)
02b388f861d0 02739fc00020 02739fffc9f0 02739ffff000 0x3fc9d0 (4180432) 0x3ff000 (4190208)
02b388f86490 0273a0c00020 0273a0f829c0 0273a0fbd000 0x3829a0 (3680672) 0x3bd000 (3919872)
02b388f87040 0273a5000020 0273a5332f78 0273a5337000 0x332f58 (3354456) 0x337000 (3371008)
02b388f875c0 0273a7000020 0273a72e4710 0273a7305000 0x2e46f0 (3032816) 0x305000 (3166208)
...
Large object heap
segment begin allocated committed allocated size committed size
02b388f84f40 027399000020 02739908f520 027399090000 0x8f500 (587008) 0x90000 (589824)
Pinned object heap
segment begin allocated committed allocated size committed size
02b388f812c0 027383000020 02738300b080 027383011000 0xb060 (45152) 0x11000 (69632)
------------------------------
GC Allocated Heap Size: Size: 0x2212923b0 (9146278832) bytes.
GC Committed Heap Size: Size: 0x23b676000 (9586565120) bytes.
從卦中看當前的托管記憶體是 9.5G,通過觀察記憶體都是被 Gen2 給吃掉了,那 Gen2 上都是什麼對象呢?我們用 !dumheap -stat
觀察下。
0:000> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
...
7ff954ecd918 91,030 9,467,120 System.Reflection.RuntimeMethodInfo
7ff95701c8f0 572,034 18,305,088 System.Globalization.DateTimeFormatInfo+TokenHashValue
7ff954c9fd00 1,010,024 62,198,216 System.String
7ff95502fc10 62,645 173,045,678 System.Byte[]
0273747f9610 28,313 8,761,969,520 Free
Total 3,305,047 objects, 9,143,856,580 bytes
不看卦不知道,一看嚇一跳,9G的記憶體,Free 就吃掉了 8.7G,看樣子又是經典的 記憶體碎片化
了,趕緊到 Gen2 上去看一看狼狽現場。
0:000> !dumpheap 02739fc00020 02739fffc9f0
Address MT Size
02739fc00020 0273747f9610 717,136 Free
02739fcaf170 7ff95502fc10 8,216
02739fcb1188 7ff955478798 64
02739fcb11c8 0273747f9610 607,864 Free
02739fd45840 7ff95502fc10 8,216
02739fd47858 7ff955478798 64
02739fd47898 0273747f9610 2,205,336 Free
02739ff61f30 7ff95502fc10 8,216
02739ff63f48 7ff955478798 64
02739ff63f88 0273747f9610 99,736 Free
02739ff7c520 7ff95502fc10 8,216
02739ff7e538 7ff955478798 64
02739ff7e578 0273747f9610 76,504 Free
02739ff91050 7ff95502fc10 8,216
02739ff93068 7ff955478798 64
02739ff930a8 0273747f9610 355,728 Free
02739ffe9e38 7ff95502fc10 8,216
02739ffebe50 7ff955478798 64
02739ffebe90 0273747f9610 60,168 Free
02739fffa998 7ff95502fc10 8,216
02739fffc9b0 7ff955478798 64
Statistics:
MT Count TotalSize Class Name
7ff955478798 7 448 System.IO.FileSystemWatcher+AsyncReadState
7ff95502fc10 7 57,512 System.Byte[]
0273747f9610 7 4,122,472 Free
Total 21 objects, 4,180,432 bytes
...
卦中信息是明顯的 記憶體碎片化
現象,可以看到每一個 Free 後面都跟著一個 8216,那這玩意是幹嘛的,為什麼它不會 GC 回收呢?
3. 碎片化排查
要想找到不會回收的原因,我們用 !gcroot
看一下。
0:000> !gcroot 02739fffa998
Caching GC roots, this may take a while.
Subsequent runs of this command will be faster.
HandleTable:
0000027374723fc0 (async pinned handle)
-> 02739dc758c8 System.Threading.OverlappedData
-> 02739fffa998 System.Byte[]
從卦中可以看到它是被 System.Threading.OverlappedData
持有,熟悉非同步編程的朋友應該都知道IO完成埠,C# 中的 OverlappedData 還會綁定 handle
,byte[]
,IOThread
等信息,我們挖一下其中的 _userState
。
0:000> !do 02739dc758c8
Name: System.Threading.OverlappedData
Fields:
MT Field Offset Type VT Attr Value Name
00007ff95502bce0 4000b00 8 System.IAsyncResult 0 instance 0000000000000000 _asyncResult
00007ff954af93b0 4000b01 10 System.Object 0 instance 000002739dc75910 _callback
00007ff95547f208 4000b02 18 ...eading.Overlapped 0 instance 000002739dc75880 _overlapped
00007ff954af93b0 4000b03 20 System.Object 0 instance 000002739fffa998 _userObject
00007ff954c9ac30 4000b04 28 PTR 0 instance 000002b41023f130 _pNativeOverlapped
00007ff954c99250 4000b05 30 System.IntPtr 1 instance 0000000000000000 _eventHandle
00007ff954c1e8c0 4000b06 38 System.Int32 1 instance 0 _offsetLow
00007ff954c1e8c0 4000b07 3c System.Int32 1 instance 0 _offsetHigh
0:000> !do 000002739dc75880
Name: System.Threading.ThreadPoolBoundHandleOverlapped
Fields:
MT Field Offset Type VT Attr Value Name
00007ff95547f558 4000b08 8 ...ng.OverlappedData 0 instance 000002739dc758c8 _overlappedData
00007ff955478d48 4000d2f 10 ...ompletionCallback 0 instance 00000273869e27a8 _userCallback
00007ff954af93b0 4000d30 18 System.Object 0 instance 000002739fffc9b0 _userState
00007ff9554791a8 4000d31 20 ...locatedOverlapped 0 instance 000002739dc75860 _preAllocated
00007ff954c9ac30 4000d32 30 PTR 0 instance 000002b41023f130 _nativeOverlapped
00007ff955479790 4000d33 28 ...adPoolBoundHandle 0 instance 0000000000000000 _boundHandle
00007ff954c1b3c0 4000d34 38 System.Boolean 1 instance 0 _completed
00007ff955478d48 4000d2e 930 ...ompletionCallback 0 static 00000273869e2898 s_completionCallback
0:000> !do 000002739fffc9b0
Name: System.IO.FileSystemWatcher+AsyncReadState
Fields:
MT Field Offset Type VT Attr Value Name
00007ff954c1e8c0 400002b 30 System.Int32 1 instance 1 <Session>k__BackingField
00007ff95502fc10 400002c 8 System.Byte[] 0 instance 000002739fffa998 <Buffer>k__BackingField
00007ff955026688 400002d 10 ...es.SafeFileHandle 0 instance 000002739dc757d8 <DirectoryHandle>k__BackingField
00007ff955479790 400002e 18 ...adPoolBoundHandle 0 instance 000002739dc75828 <ThreadPoolBinding>k__BackingField
00007ff9554791a8 400002f 20 ...locatedOverlapped 0 instance 000002739dc75860 <PreAllocatedOverlapped>k__BackingField
00007ff955479248 4000030 28 ...eSystem.Watcher]] 0 instance 000002739dc75848 <WeakWatcher>k__BackingField
從卦中可以看到原來是 FileSystemWatcher
在作祟,說實話看到這個東西我馬上就有條件反射
,肯定又是經典的 ReloadOnChange=true
導致的,果然直覺就是對的,一搜代碼果然有,截圖如下:
三:總結
說實話 ReloadOnChange=true
真的是萬惡之源,據往期分析可列出如下四大罪證:
- 文件句柄暴漲
- byte[]導致的記憶體暴漲
- pinned 導致的記憶體碎片化
- 線程池暴漲 導致的程式無響應
大家使用前需 慎之又慎,三思而後行!!!
