A: background
1. Tell a story
A few days ago a friend wx seeks help, its program CPU often bike full, did not find a reason, hope to help see.
These days continuous received several CPU high dump, all see tired of 😂 send, hope to the back of several other aspects of the dump, from the communication, the friend’s performance is very miserable, may actually be more miserable, since found me, I will try my best to help him find the real murderer behind the scene, not to say much, on windbg.
2. WINDBG analysis
1. Look at managed threads
Since threads are supported by the CPU, it is also a good idea to start with threads. To see all the managed threads of your program, use the T command.
0:00 0 >! t ThreadCount: 38 UnstartedThread: 0 BackgroundThread: 34 PendingThread: 0 DeadThread: 3 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 105c 000000000067f600 2a020 Preemptive 0000000000000000:0000000000000000 0000000000671ec0 0 MTA 2 2 13d0 00000000192c4f40 2b220 Preemptive 0000000000000000:0000000000000000 0000000000671ec0 0 MTA (Finalizer) ... XXXX 15 0 000000001bc64970 8039820 Preemptive 0000000000000000:0000000000000000 0000000000671ec0 0 MTA (Threadpool Completion Port) 24 23 1380 000000001bc660e0 8029220 Preemptive 0000000000000000:0000000000000000 0000000000671ec0 0 MTA (Threadpool Completion Port) XXXX 53 0 000000001bc63200 8039820 Preemptive 0000000000000000:0000000000000000 0000000000671ec0 0 MTA (Threadpool Completion Port) XXXX 27 10dc 000000001bd0dbf0 1029220 Preemptive 0000000002CB40F8:0000000002CB4108 0000000000671ec0 1 MTA (GC) (Threadpool Worker)
In the diagram: the program has 38 threads, died 3, I go, there is a bright spot, the last line appears a familiar MTA (GC) words, what does this mean? This means that the current thread is triggering a GC, but the thread that triggered the GC is dead, so you can ask: “How can I see this?” Look for WaitUntilGCComplete and try_allocate_more_space.
2. Look at the thread stack
To see the thread stack for all threads, use the Dumpstack command.
- search
WaitUntilGCComplete
The keyword.
There are 18 threads waiting, and we can see System.String.Concat. Is it the same as the CPU spike of his in the last post? 😘 😘 😘
- search
try_allocate_more_space
The keyword.
There is no try_allocate_more_space, which is not the same as his, 😩 if it is not a good time for the dump, it is possible that the program is in some weird behavior.
It looks like this is the end of the road, but again, threads are dependent on the CPU, so just take a hard look at what each thread is doing, and to make the results clearer, use another command. Clrstack.
As can be seen from the diagram, 25 threads are currently stuck at findTry (System.__canon), and from the call stack, it looks like the DLL contained in Aliyun. Why are there so many threads stuck here? That’s a big question mark, right? So I’m going to export this Aliyun code to see what’s going on.
3. Review the problem code
To export the problem code, use the classic! ip2md + ! The saveModule composite command.
0:00 0 >! ip2md 000007fe9a1a0641 MethodDesc: 000007fe9a5678e0 Method Name: Aliyun.Acs.Core.Utils.CacheTimeHelper.AddLastClearTimePerProduct(System.String, System.String, System.DateTime) Class: 000007fe9a595a08 MethodTable: 000007fe9a567900 mdToken: 00000000060000a6 Module: 000007fe9a561f58 IsJitted: yes CodeAddr: 000007fe9a1a0610 Transparency: Critical 0:000> ! savemodule 000007fe9a561f58 E:\dumps\AddLastClearTimePerProduct.dll 3 sections in file section 0 - VA=2000, VASize=14148, FileAddr=200, FileSize=14200 section 1 - VA=18000, VASize=3fc, FileAddr=14400, FileSize=400 section 2 - VA=1a000, VASize=c, FileAddr=14800, FileSize=200
Then use ILSpy to decompile this DLL, because it is Ali cloud code, I can be at ease boldly put out.
// Aliyun.Acs.Core.Utils.CacheTimeHelper using System; using System.Collections.Generic; public class CacheTimeHelper { private static Dictionary<string, DateTime> lastClearTimePerProduct = new Dictionary<string, DateTime>(); private const int ENDPOINT_CACHE_TIME = 3600; public static bool CheckCacheIsExpire(string product, string regionId) { string key = product + "_" + regionId; DateTime dateTime; if (lastClearTimePerProduct.ContainsKey(key)) { dateTime = lastClearTimePerProduct[key]; } else { dateTime = DateTime.Now; lastClearTimePerProduct.Add(key, dateTime); } if (3600.0 < (dateTime.now - dateTime).totalSeconds) {return true; } return false; } public static void AddLastClearTimePerProduct(string product, string regionId, DateTime lastClearTime) { string key = product + "_" + regionId; if (lastClearTimePerProduct.ContainsKey(key)) { lastClearTimePerProduct.Remove(key); } lastClearTimePerProduct.Add(key, lastClearTime); }}
As you can see, this code in the if above (lastClearTimePerProduct. Either ContainsKey (key)) walk not bottom go to, if chased down, can consult a Dictionary of the source code.
public class Dictionary<TKey, TValue> { // System.Collections.Generic.Dictionary<TKey,TValue> public bool ContainsKey(TKey key) { return FindEntry(key) >= 0; }}
So at this point, can you see what’s wrong with this CacheTimeHelper? Yes, in a multi-threaded environment, we’re using a non-thread-safe Dictionary
, which is a problem.
4. What’s wrong with using a Dictionary
Using a Dictionary in a multithreaded environment will definitely cause data corruption, no doubt about that, and there will be some iteration exceptions, but if this misuse causes CPU spikes, I haven’t seen one in my field of vision… Just to be sure, go to Bing and search for people like this.
High CPU in.NET App Using a Static Generic.Dictionary
😁 If you are using a Dictionary in a multithreaded environment, you will probably see a loop in findTry (key), and then 25 loops will lift the CPU together to make up for the current overcrowded CPU utilization…
0:00 0 >! tp CPU utilization: 100% Worker Thread: Total: 27 Running: 27 Idle: 0 MaxLimit: 32767 MinLimit: 4 Work Request in Queue: 0 -------------------------------------- Number of Timers: 1 -------------------------------------- Completion Port Thread:Total: 4 Free: 3 MaxFree: 8 CurrentLimit: 3 MaxLimit: 1000 MinLimit: 4
Three:
Since it is Ali cloud SDK out of the bug, this problem is troublesome… Change also can not be changed, tell a friend to raise work sheet to solve next.
I thought this was the end of the matter, but I think that a few years ago, Ali cloud other SDK also encountered similar problems with high CPU, and later through the upgrade of SDK to fix, this time also gambling to see, first look at the assembly information.
[assembly: CompilationRelaxations(8)] [assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)] [assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)] [assembly: TargetFramework(".netStandard,Version=v2.0", frameworkDisplayName = "")] [Assembly: AssemblyCompany("Alibaba Cloud")] [assembly: AssemblyConfiguration("Release")] [assembly: AssemblyCopyright("©2009-2018 Alibaba Cloud")] [Assembly: AssemblyDescription("Alibaba Cloud SDK for C#")] [Assembly: AssemblyDescription("Alibaba Cloud SDK for C#")] AssemblyFileVersion (" 1.1.12.0 ")] [assembly: AssemblyInformationalVersion (" 1.1.12 ")] [assembly: AssemblyProduct("aliyun-net-sdk-core")] [assembly: AssemblyTitle("aliyun-net-sdk-core")] [assembly: AssemblyVersion (" 1.1.12.0 ")"
You can see that your friend is currently using version 1.1.12.0, so update aliyun-net-sdk-core to the latest version and see if the CacheTimeHelper is fixed.
As expected, it was fixed in the new version, so experience tells me that when using the SDK of Ali Cloud, remember to upgrade frequently, otherwise all kinds of pits are waiting for you… 😔 😔 😔
For more high-quality dry goods: see my GitHub:dotnetfly