I don’t know when to start, many programmers like ToLower, ToUpper to implement case-insensitive string equality comparison, it is possible that this habit was introduced from other languages, a guess is JS, in order not to cause controversy, I refer to JS meaning ~
A: background
1. Tell a story
In our order aggregation system, every order will be marked with its source, such as JD, Taobao, Etao, Shopex and other channels. UI also provides advanced configuration and input custom order source. Later, the customer feedback input XXX can not find the order, so take Shopex as an example. The user used lowercase Shopex to query, but the system marked with uppercase shopex, so naturally there was no match. In order to solve this problem, the developer changed it to uppercase for comparison, which was expressed in the code as follows:
var orderfrom = "shopex".ToUpper();
customerIDList = MemoryOrders.Where(i =>i.OrderFrom.ToUpper()==orderFrom)
.Select(i => i.CustomerId).ToList();
Copy the code
After the change is so cattle on the line, at first glance, there is no problem, the results of a query obviously feel slower than before several seconds, simply a few more, ok… In the monitoring found CPU and memory suddenly high suddenly low, abnormal fluctuations, the little brother is writing a bug, check the code to ask him why to write so, little brother said in JS is such a comparison ~
2. String.Com pare said
In C#, there is a special method for case insensitive comparison that is high performance and memory free. It is string.Compare, so change the above code to the following.
var orderfrom = "shopex";
customerIDList = MemoryOrders.Where(string.Compare(i.TradeFrom, tradefrom,
StringComparison.OrdinalIgnoreCase) == 0)
.Select(i => i.CustomerId).ToList();
Copy the code
Of these StringComparison. OrdinalIgnoreCase enumeration is used to ignore case, in addition to the CPU after online or a little fluctuation, other all have no problem.
Two: Why ToLower and ToUpper have such a big influence
For the sake of demonstration, I found a short article in English, and then used a query word to demonstrate why ToUpper has such a large impact on CPU and memory and query performance. The code is as follows:
public static void Main(string[] args) { var strList = "Hooray! It's snowing! It's time to make a snowman.James runs out. He makes a big pile of snow. He puts a big snowball on top. He adds a scarf and a hat. He adds an orange for the nose. He adds coal for the eyes and buttons.In the evening, James opens the door. What does he see? The snowman is moving! James invites him in. The snowman has never been inside a house. He says hello to the cat. He plays with paper towels.A moment later, the snowman takes James's hand and goes out.They go up, up, up into the air! They are flying! What a wonderful night! The next morning, James jumps out of bed. He runs to the door.He wants to thank the snowman. But he's gone.".Split(' '); var query = "snowman".ToUpper(); for (int i = 0; i < strList.Length; i++) { var str = strList[i].ToUpper(); if (str == query) Console.WriteLine(str); } Console.ReadLine(); }Copy the code
1. Memory fluctuation exploration
Since memory fluctuate, the memory into the dirt, to learn the basic knowledge of c # should know the string is immutable, once have change will generate a new string, that is to say after ToUpper there will be a new string, in order to use the data, use windbg demonstrate.
0:00 0 >! dumpheap -type System.String -stat Statistics: MT Count TotalSize Class Name 00007ff8e7a9a120 1 24 System.Collections.Generic.GenericEqualityComparer`1[[System.String, mscorlib]] 00007ff8e7a99e98 1 80 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]] 00007ff8e7a9a378 1 96 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]][] 00007ff8e7a93200 19 2264 System.String[] 00007ff8e7a959c0 429 17894 System.String Total 451 objectCopy the code
The managed heap has Count=429 strings. Query: 128 +128 + 165 + 2 + 6=429
! dumpheap -mt 00007ff8e7a959c0 > ! DumpObj 000002244282a1f8
0:00 0 >! DumpObj /d 0000017800008010 Name: System.String MethodTable: 00007ff8e7a959c0 EEClass: 00007ff8e7a72ec0 Size: 38(0x26) bytes File: C:\WINDOWS\Microsoft.Net \ assembly \ GAC_64 \ mscorlib \ v4.0 _4. 0.0.0 __b77a5c561934e089 \ mscorlib DLL String: HOUSE. The Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7a985a0 4000281 8 System.Int32 1 instance 6 m_stringLength 00007ff8e7a96838 4000282 c System.Char 1 instance 48 m_firstChar 00007ff8e7a959c0 4000286 d8 System.String 0 shared static Empty >> Domain:Value 0000017878943bb0:NotInit << 0:000> ! DumpObj /d 0000017800008248 Name: System.String MethodTable: 00007ff8e7a959c0 EEClass: 00007ff8e7a72ec0 Size: 40(0x28) bytes File: C:\WINDOWS\Microsoft.Net \ assembly \ GAC_64 \ mscorlib \ v4.0 _4. 0.0.0 __b77a5c561934e089 \ mscorlib DLL String: SNOWMAN Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7a985a0 4000281 8 System.Int32 1 instance 7 m_stringLength 00007ff8e7a96838 4000282 c System.Char 1 instance 53 m_firstChar 00007ff8e7a959c0 4000286 d8 System.String 0 shared static Empty >> Domain:Value 0000017878943bb0:NotInit <<Copy the code
Checked two all uppercase “HOUSE”, “SNOWMAN”, I have a small scene again millions of orders, will have on the managed heap to generate millions of small string, if the point again and generates millions of small, memory can not jump about it…
2. Explore the CPU and query time
Now that you know there are probably millions of strings on the heap, the allocation and release of these strings is causing the CPU a lot of stress, causing the CPU to slow down after toUpper, and even worse, causing the GC to shudder, causing all threads to be suspended and recycled even more slowly…
Three: string.Compare parsing
For a look at string.Compare, see 🐮👃. Dnspy contains a core function:
// Token: 0x060004B8 RID: 1208 RVA: 0x00010C48 File Offset: 0x0000EE48
[SecuritySafeCritical]
private unsafe static int CompareOrdinalIgnoreCaseHelper(string strA, string strB)
{
int num = Math.Min(strA.Length, strB.Length);
fixed (char* ptr = &strA.m_firstChar)
{
fixed (char* ptr2 = &strB.m_firstChar)
{
char* ptr3 = ptr;
char* ptr4 = ptr2;
while (num != 0)
{
int num2 = (int)(*ptr3);
int num3 = (int)(*ptr4);
if (num2 - 97 <= 25)
{
num2 -= 32;
}
if (num3 - 97 <= 25)
{
num3 -= 32;
}
if (num2 != num3)
{
return num2 - num3;
}
ptr3++;
ptr4++;
num--;
}
return strA.Length - strB.Length;
}
}
}Copy the code
This code is very clever and uses 97 to compare two strings one by one in uppercase ASCII, which is much faster than doing a bunch of stuff on a heap.
Then I’ll modify the code to see what happens on the heap…
public static void Main(string[] args) { ... var query = "snowman"; for (int i = 0; i < strList.Length; i++) { if (string.Compare(strList[i], query, StringComparison.OrdinalIgnoreCase) == 0) { Console.WriteLine(strList[i]); } } Console.ReadLine(); } 0:00 0 >! dumpheap -type System.String -stat Statistics: MT Count TotalSize Class Name 00007ff8e7a9a120 1 24 System.Collections.Generic.GenericEqualityComparer`1[[System.String, mscorlib]] 00007ff8e7a99e98 1 80 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]] 00007ff8e7a9a378 1 96 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]][] 00007ff8e7a93200 19 2264 System.String[] 00007ff8e7a959c0 300 13460 System.String Total 322 objectsCopy the code
As you can see from system. String, there are now 300 touppers on the heap, compared to 429, which means 129 touppers are missing.
Three:
We usually what bad writing method, in front of a large number of data vulnerable, but also a good growth opportunity ~