A: background
1. Tell a story
Last Thursday, a friend of mine and WX inquired about his program memory leak to a certain extent, and it could not be recovered by GC. Finally, the machine ran out of memory, which was very embarrassing.
I have also done a preliminary dump analysis and found that there is a 10W + byte[] array on the managed heap, which takes up about 1.1G of memory. After extracting several byte[] gcroots, I found that there is no reference, and then I can’t search it. I knew the problem might be byte[], but I couldn’t find the proof. 😪 😪 😪
Since you trusted me so much, I have to do a relatively comprehensive output report. I can’t live up to everyone’s trust. As usual, talk to Windbg.
2. WINDBG analysis
1. Check the source of leakage
Old readers who have read my article should know that to troubleshoot this kind of memory leak, first of all, it is necessary to binary to find out what is the problem of managed or unmanaged, to facilitate the subsequent corresponding response measures.
Use it next! Address -summary takes a look at the commit memory of the process.
| | 2:2:08 0 >! address -summary --- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal Mem_private 573 1 '5c191000 (5.439 GB) 95.19% Mem_image 1115 0' 0becf000 (190.809 MB) 3.26% 0.00% Mem_mapped 44 0 05 a62000 ` (90.383 MB) 1.54% 0.00% - State Summary -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- RgnCount -- -- -- -- -- -- -- -- -- -- - the Total Size -- -- -- -- -- -- -- -- % ofBusy %ofTotal MEM_FREE 2017ffe '9252e000 (127.994 TB) 100.00% MEM_COMMIT 1477 0' d439f000 (3.316 GB) 58.04% 0.00% MEM_reserve 255 0 '99723000 (2.398 GB) 41.96% 0.00%
To be honest, I generally recommend that 5G+ is the minimum threshold for memory leak analysis. After all, the larger the memory, the easier it is to analyze. Next, look at the memory footprint of the managed heap.
| | 2:2:08 0 >! eeheap -gc Number of GC Heaps: 1 generation 0 starts at 0x00000002b37c0c48 generation 1 starts at 0x00000002b3781000 generation 2 starts at 0x0000000000cc1000 ------------------------------ GC Heap Size: Size: 0xbd322bb0 (3174181808) bytes.
As you can see, the managed heap occupy 3174181808/1024/1024/1024 = 2.95 G, ha ha, see this number, an ecstasy in my heart, the problem on the managed heap, about the bag for me… After all, I haven’t missed anything yet, so check the managed heap quickly to see where the problem is.
2. Look at the managed heap
To view the managed heap, use! The dumpheap-stat command, and I’m going to display the Top10 Size.
| | 2:2:08 0 >! dumpheap -stat Statistics: MT Count TotalSize Class Name 00007ffd7e130ab8 116201 13014512 Newtonsoft.Json.Linq.JProperty 00007ffdd775e560 66176 16411648 System.Data.SqlClient._SqlMetaData 00007ffddbcc9da8 68808 17814644 System.Int32[] 00007ffddbcaf788 14140 21568488 System.String[] 00007ffddac72958 50256 22916736 System.Net.Sockets.SocketAsyncEventArgs 00007ffd7deb64b0 369 62115984 System.Collections.Generic.Dictionary`2+Entry[[System.Reflection.ICustomAttributeProvider, mscorlib],[System.Type, mscorlib]][] 00007ffddbcc8610 8348 298313756 System.Char[] 00007ffddbcc74c0 1799807 489361500 System.String 000000000022e250 312151 855949918 Free 00007ffddbccc768 109156 1135674368 System.Byte[]
Byte[] is currently the No. 1 choice, Free is the No. 2 choice, and String is the No. 2 choice. Here are some rules of thumb. Its internal structures contain strings and Byte[]. For example, I’m sure MemoryStream contains Byte[], right? So put aside the No. 1 and No. 2 and look at the No. 2 and other complex types.
If your eyes are sharp, you’ll notice that the number of Free is 31W+, and you’re wondering, what does that mean? Yes, this indicates that there are currently 31W+ free chunks on the managed heap. The technical term for this is fragmentation, so this information reveals that the managed heap is currently relatively fragmented. The next question is why? Most of the time, this fragmentation occurs because there are many pinned-to-top objects on the managed heap, which prevent GC from moving them during collection. This can fragment the managed heap in the long run, so identifying this phenomenon can be very helpful to solve the leak problem.
In addition, dotMemory can be used here, where red represents the pinned-object, visible to the naked eye for a large number of red intervals, and the final fragmentation rate is 85%.
The next problem is finding the pinned-objects, which are recorded in the Gchandles table in the CLR.
3. Check the GCHandles
To find all pinned-objects, use! The gchandles-stat command provides the following simplified output:
| | 2:2:08 0 >! gchandles -stat Statistics: MT Count TotalSize Class Name 00007ffddbcc88a0 278 26688 System.Threading.Thread 00007ffddbcb47a8 1309 209440 System.RuntimeType+RuntimeTypeCache 00007ffddbcc7b38 100 348384 System.Object[] 00007ffddbc94b60 9359 673848 System.Reflection.Emit.DynamicResolver 00007ffddb5b7b98 25369 2841328 System.Threading.OverlappedData Total 36566 objects Handles: Strong Handles: 174 Pinned Handles: 15 Async Pinned Handles: 25369 Ref Count Handles: 1 Weak Long Handles: 10681 Weak Short Handles: 326
Async Handles stay fixed to the top half of the window: 25369, this means that the current 2.5 w asynchronous operations be pinned object, in the process of this index is not normal, and it can be seen with the 2.5 w System. Threading. OverlappedData from afar with this train of thought, Looking back at the managed heap, are there 2.5W objects of similar complex types that encapsulate asynchronous operations? Here I’ll list the managed heap for top10 Size again.
| | 2:2:08 0 >! dumpheap -stat Statistics: MT Count TotalSize Class Name 00007ffd7e130ab8 116201 13014512 Newtonsoft.Json.Linq.JProperty 00007ffdd775e560 66176 16411648 System.Data.SqlClient._SqlMetaData 00007ffddbcc9da8 68808 17814644 System.Int32[] 00007ffddbcaf788 14140 21568488 System.String[] 00007ffddac72958 50256 22916736 System.Net.Sockets.SocketAsyncEventArgs 00007ffd7deb64b0 369 62115984 System.Collections.Generic.Dictionary`2+Entry[[System.Reflection.ICustomAttributeProvider, mscorlib],[System.Type, mscorlib]][] 00007ffddbcc8610 8348 298313756 System.Char[] 00007ffddbcc74c0 1799807 489361500 System.String 000000000022e250 312151 855949918 Free 00007ffddbccc768 109156 1135674368 System.Byte[]
With this kind of preconceived ideas, I think you must find the 50256 on the managed heap System.Net.Sockets.SocketAsyncEventArgs, it looks like this time leaks and Socket can’t get away, And then you can check who is referring to these SocketAsyncEventArgs?
4. Check the SocketAsyncEventArgs reference root
To see the reference root, first derive several addresses from the SocketAsyncEventArgs.
| | 2:2:08 0 >! dumpheap -mt 00007ffddac72958 0 0000000001000000 Address MT Size 0000000000cc9dc0 00007ffddac72958 456 0000000000ccc0d8 00007ffddac72958 456 0000000000ccc358 00007ffddac72958 456 0000000000cce670 00007ffddac72958 456 0000000000cce8f0 00007ffddac72958 456 0000000000cd0c08 00007ffddac72958 456 0000000000cd0e88 00007ffddac72958 456 0000000000cd31a0 00007ffddac72958 456 0000000000cd3420 00007ffddac72958 456 0000000000cd5738 00007ffddac72958 456 0000000000cd59b8 00007ffddac72958 456 0000000000cd7cd0 00007ffddac72958 456
Then look at the reference roots of the first and second Address.
| | 2:2:08 0 >! gcroot 0000000000cc9dc0 Thread 86e4: 0000000018ecec20 00007ffd7dff06b4 xxxHttpServer.DaemonThread`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].DaemonThreadStart() rbp+10: 0000000018ececb0 -> 000000000102e8c8 xxxHttpServer.DaemonThread`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 00000000010313a8 xxxHttpServer.xxxHttpRequestServer`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 000000000105b330 xxxHttpServer.HttpSocketTokenPool`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 000000000105b348 System.Collections.Generic.Stack`1[[xxxHttpServer.HttpSocketToken`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]], xxxHttpServer]] -> 0000000010d36178 xxxHttpServer.HttpSocketToken`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]][] -> 0000000008c93588 xxxHttpServer.HttpSocketToken`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 0000000000cc9dc0 System.Net.Sockets.SocketAsyncEventArgs ||2:2:080> ! gcroot 0000000000ccc0d8 Thread 86e4: 0000000018ecec20 00007ffd7dff06b4 xxxHttpServer.DaemonThread`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].DaemonThreadStart() rbp+10: 0000000018ececb0 -> 000000000102e8c8 xxxHttpServer.DaemonThread`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 00000000010313a8 xxxHttpServer.xxxHttpRequestServer`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 000000000105b330 xxxHttpServer.HttpSocketTokenPool`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 000000000105b348 System.Collections.Generic.Stack`1[[xxxHttpServer.HttpSocketToken`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]], xxxHttpServer]] -> 0000000010d36178 xxxHttpServer.HttpSocketToken`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]][] -> 0000000000ccc080 xxxHttpServer.HttpSocketToken`2[[xxx.xxx, xxx],[xxx.RequestInfo, xxx]] -> 0000000000ccc0d8 System.Net.Sockets.SocketAsyncEventArgs
It looks like the program itself has set up an HttpServer and a pool of HttpSocketTokenPool. If you are interested, export this class to see how it is written.
5. Look for problem code
Same old way, use it! SAVEModule exports the problem code and decompiles it using ILSpy.
To be sure, the pool package is pretty rough. Since SocketAsyncEventArgs has 5W+, I guess there are tens of thousands in the pool. To verify your idea, use windbg to mine it out.
This pool has about 2.5W of HttpSockets. This indicates that the Socket pool is not wrapped properly.
Three:
To encapsulate a Pool yourself, you have to implement some complex logic, not just a PUSH and POP… So the optimization direction is also very clear, to find a way to control the Pool, the Pool to achieve the effect of the implementation.
For more high-quality dry goods: see my GitHub:dotnetfly