最近花了好幾周解決一個WPF高記憶體的問題,問題的表象是記憶體不斷增加、未被回收,根源是GC的FinalizeThread被阻塞,導致整個GC掛掉。從以下幾步來分析這個問題: 1.用ANTS Memory Profiler去掉強引用 既然是高記憶體,肯定要先從記憶體著手。這裡必須要贊一下ANTS的這個工具,
最近花了好幾周解決一個WPF高記憶體的問題,問題的表象是記憶體不斷增加、未被回收,根源是GC的FinalizeThread被阻塞,導致整個GC掛掉。從以下幾步來分析這個問題:
1.用ANTS Memory Profiler去掉強引用
既然是高記憶體,肯定要先從記憶體著手。這裡必須要贊一下ANTS的這個工具,圖形化做的非常好,一目瞭然,個人覺得比SciTech的.net memory profiler好用。找個基準點take一個SnapShot,打開關閉視窗後再take一個snapshot,比較2個快照里多出了哪些對象,或者視窗對象被什麼強引用了導致未被釋放,都很清楚。一般來說是自己代碼的問題,但也有第三方組件的坑,比如:
- DevExpress.Data.DelayedExecutionExtension里有static的Dictionary,會持有很多控制項的強引用,需要在視窗Close時調用RemoveDelayedExecute()
- TypeDescriptor相關字樣的一堆類(包括DPCustomTypeDescriptor、DependencyPropertyDescriptor、DependencyObjectProvider等),都通過DependencyObject._effectiveValues持有對視窗等控制項的強引用,需要在視窗Close時調用TypeDescriptor.Refresh(object)
註:不查不知道,這個TypeDescriptor還大有來頭,不僅管理著所有.net object的metadata,還可以動態修改這些metadata,這對於封裝一些代理類、提供Transparent功能的場景應該很有用。詳情見這篇博客TypeDescriptor
2.當去掉所有強引用後,大量對象堆積在FinalizeQueue上
在ANTS里看到,所有希望回收的對象,都堆積在FinalizeQueue上,即使記憶體飆到1G也無法回收。手動調GC強制回收,阻塞在WaitForPendingFinalizers()上一直無法返回。
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
這時就只能抓dump用windbg分析到底是哪裡卡住了。
3.抓dump分析線程堆棧
用ProcDump -ma [ProcessName]抓dump,分析如下:
0:000> .loadby sos clr
0:000> !threads
ThreadCount: 63
UnstartedThread: 0
BackgroundThread: 34
PendingThread: 0
DeadThread: 28
Hosted Runtime: no
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 1b80 00000000008c9cf0 26020 Preemptive 0000000000000000:0000000000000000 000000000087ce90 0 STA
2 2 ba8 00000000008d4790 2b220 Preemptive 0000000000000000:0000000000000000 000000000087ce90 0 MTA (Finalizer)
......
40 63 1748 0000000021ecc3d0 1029220 Preemptive 0000000000000000:0000000000000000 000000000087ce90 0 MTA (Threadpool Worker)
0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
-----------------------------
Total 173
CCW 5
RCW 5
ComClassFactory 0
Free 25
//並沒有線程被鎖住,看看2號Finalizer線程在幹嘛
0:000> ~2kb
RetAddr : Args to Child : Call Site
000007fe`fcf210dc : 00000000`00000000 00000000`1b66f308 00000000`1b66ee60 00000000`1b66edd0 : ntdll!NtWaitForSingleObject+0xa
000007fe`fd2de68e : 00000000`1bf51bf0 00000000`00911fb0 00000000`00000000 00000000`0000044c : KERNELBASE!WaitForSingleObjectEx+0x79
000007fe`fd413700 : 00000000`008fd0b0 00000000`1bf51bf0 00000000`00000246 00000000`008fd0b0 : ole32!GetToSTA+0x8a
000007fe`fd41265b : 00000000`00000000 00000000`ffffffff 00000000`61a6d4cc 00000000`ffffffff : ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x13b
......
000007fe`e53b383c : 00000000`1bfd2380 00000000`1b66f9a8 00000000`00000000 000007fe`e5417ad3 : clr!CtxEntry::EnterContext+0x232
000007fe`e53b37e6 : 00000000`1b66f9a8 000007fe`e524307c 00000000`008d4790 00000000`1b66f9f0 : clr!RCW::EnterContext+0x3d
000007fe`e544319f : 000007fe`e5b055b0 00000000`008d4790 00000000`008d4790 00000000`00000000 : clr!SyncBlockCache::CleanupSyncBlocks+0xc2
000007fe`e536ab47 : 00000000`00000001 00000000`00000001 00000000`008d4790 00000000`00000000 : clr!Thread::DoExtraWorkForFinalizer+0xdc
000007fe`e52b458c : 0030002e`00340076 00000000`1b66fcc0 00000000`00390031 00000000`00001000 : clr!WKS::GCHeap::FinalizerThreadWorker+0x109
000007fe`e52b451a : 00000000`1b66fcc0 00000000`00000000 0000cd2d`f0a20ef6 000007fe`e53ff57a : clr!Frame::Pop+0x50
...
000007fe`e5391d90 : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`0000001e : clr!ManagedThreadBase_NoADTransition+0x3f
000007fe`e53133de : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : clr!WKS::GCHeap::FinalizerThreadStart+0xb4
00000000`76fa59ed : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : clr!Thread::intermediateThreadProc+0x7d
00000000`770dc541 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d
//強烈建議關註一下GetToSTA,這個方法是COM組件引起GC阻塞的典型特征。原因是STA的COM組件必須在創建它的線程上被回收,所以FinalizerThread想GetToSTA線程去執行回收的代碼。但它想GetTo的是哪個線程?那個線程又因為什麼阻塞了呢?
000007fe`fd413700 : 00000000`008fd0b0 00000000`1bf51bf0 00000000`00000246 00000000`008fd0b0 : ole32!GetToSTA+0x8a
//首先用|查看進程ID
0:000> |
. 0 id: 3b8c examine name: E:\...\Process.exe
//然後在這個方法的參數列表上,依次執行dd並查找進程ID:3b8c,進程ID旁邊就是它要去的線程ID,我是在第二個參數里找到的。
0:000> dd 1bf51bf0
00000000`1bf51bf0 fd4283e0 000007fe fd450628 000007fe
00000000`1bf51c00 00000000 00000000 00000001 0000102a
00000000`1bf51c10 00000000 00000000 0000044c 00000000
00000000`1bf51c20 00000000 00000000 00003400 1b803b8c
//最後一行的1b80 3b8c,後半段是進程ID、前半段是它要去STA線程。回上面一看,原來就是0號主線程。
0 1 1b80 00000000008c9cf0 26020 Preemptive 0000000000000000:0000000000000000 000000000087ce90 0 STA
//再看看0號主線程在幹嘛?原來停在ConnectNamedPipe里,對應的.net代碼是NamedPipeStreamServer.WaitForConnection()。這是個非托管的阻塞方法,只要等不到Connection,就會一直阻塞。
0:000> kb
RetAddr : Args to Child : Call Site
000007fe`fcf33c2f : 000007fe`e5311ed4 00000000`0027e428 00000000`0027e480 00000000`05d5cac8 : ntdll!NtFsControlFile+0xa
*** WARNING: Unable to verify checksum for System.Core.ni.dll
000007fe`e1ab8017 : 000007fe`e169adb8 00000000`00000000 00000000`00000000 00000000`05d5c7d8 : KERNELBASE!ConnectNamedPipe+0x6f
......
000007fe`e53a2a7e : 00000000`00000000 00000000`00000004 00000000`00000000 00000000`00000004 : clr!MethodDescCallSite::CallTargetWorker+0x2e2
000007fe`e53a31d6 : 00000000`00000004 00000000`00000000 00000000`00000000 00000000`03162eb8 : clr!RunMain+0x1e7
000007fe`e53a30d0 : 00000000`008f13b0 00000000`00000200 00000000`008f13b0 00000000`00000200 : clr!Assembly::ExecuteMainMethod+0xb6
000007fe`e53a2c46 : 00000000`0027f8c8 00000000`00bf0000 00000000`00000000 00000000`00000000 : clr!SystemDomain::ExecuteMainMethod+0x45e
000007fe`e53a2b9e : 00000000`00bf0000 00000000`0027fa20 00000000`00000000 000007fe`f6f441c0 : clr!ExecuteEXE+0x3f
000007fe`e53a3574 : ffffffff`ffffffff 00000000`00000000 00000000`00000000 00000000`00000000 : clr!CorExeMainInternal+0xae
000007fe`f6ee77ad : 00000000`00000000 000007fe`00000091 00000000`00000000 00000000`0027f988 : clr!CorExeMain+0x14
000007fe`f6fa5b21 : 00000000`00000000 000007fe`e53a3560 00000000`00000000 00000000`00000000 : mscoreei!CorExeMain+0xe0
00000000`76fa59ed : 000007fe`f6ee0000 00000000`00000000 00000000`00000000 00000000`00000000 : mscoree!CorExeMain_Exported+0x57
00000000`770dc541 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d
怪不得FinalizerThread會掛掉,因為它要釋放COM組件,所以要進到創建COM組件的STA線程,而這個STA線程又在無限期的等待Connection,所以就掛掉了。GC都掛了,永遠不執行垃圾回收,當然會高記憶體。另:GetToSTA的手法太妖了,這麼hack的方法我當然想不出來,請參見gcHang0、gcHang1、gcHang2、gcHang3
4.解決方案
- 把創建COM的線程改為MTA,因為主線程必須是STA的,所以只能新建一個MTA的線程來乾這個事兒了。
- 如果非得在STA的線程里乾這事兒,那就不能使用非托管的阻塞方法,比如WaitForConnection,而要使用托管的阻塞方法,比如WaitHandle.WaitOne, WaitAny, WaitAll, Monitor.Enter, Monitor.Block, Thread.Join, GC.WaitForPendingFinalizers這些都是,這些方法在阻塞線程的同時,還能正確的pump messages。這些托管的阻塞方法,配合對應的Begin/End非同步方法,就能響應各種消息了。
5.演示的Demo
最後用一個小Demo把問題重現了一遍,也把解決方案附在裡面了,有興趣的同學可以試一下。