Yesterday found online test run phase of a program of hang up, is running well at ordinary times, has examined the log because yesterday operation runs the top level for a beauty makeup brand shop, the member quantity is almost never, in one fell swoop to 128 gb of memory to burst, then ran two parallel tasks, populace first sketch current limit, a piece of code behind to do further optimization.

A: background

1. Background

Because is write their own code, so I know where the problem is, if you read my articles should know before I run with full memory a lot of models for user label, a model is a set of directional filter, and to speed up processing, I’ll atomization filter condition, then while the query cache atomization conditions for the number of people, If the latter model hits the atomization condition of the previous model, then you can just read its number of people directly from the cache. This is also the idea of dynamic programming. If you don’t understand, let me draw a picture.

As can be seen from the above figure, when calculating model 2, the number of people in condition 1 can be directly obtained from condition 1 under model 1, and the number of people in condition 2 and 5 under model 3 can also be directly obtained from model 1 and 2, thus greatly accelerating the processing speed.

2. Looking for a reason

I don’t know why I used this type, but it looks like this:

/ / / < summary > / / / cache the crowd / / / the key: atomization condition / / / the value: /// </summary> public ConcurrentDictionary<string, List<long>> CachedCrowds {get; set; } = new ConcurrentDictionary<string, List<long>>();Copy the code

I’m talking about the List<long> inside, I actually used the long type to store the customerID, maybe because I saw the long originally defined by the ancestors of this project, so I followed the trend of long, 😄😄😄, who has countless customers, the country is only 1.4 billion people, and a long takes 8 bytes, obviously a waste.

Two: solutions

1. Convert long to int

People are lazy, GaiDian GaiDian less code, less of province back pan, bad news travels fast, bad news travels fast, so here are expressed in the int is enough, and should be able to save half of the space right, next to demonstrate, in a List < long > and List < int > into 500 w customer ID, respectively, in the HTML code is as follows:

public static void Main(string[] args) { var rand = new Random(); List<int> intCustomerIDList = Enumerable.Range(1, 5000000).OrderBy(m => rand.Next(0, 100000)) .Take(5000000).ToList(); List<long> longCustomerIDList = Enumerable.Range(1, 5000000).OrderBy(m => rand.Next(0, 100000)) .Take(5000000).Select(m => (long)m).ToList(); Console.WriteLine(" Processing completed...") ); Console.Read(); }Copy the code

Next, use Windbg to see how much memory each has in the heap.

~0s -> ! clrstack -l -> ! Dumpobj finds the local variables for List<int> and List<long> from the main thread and looks at size.

0:000> ~0s ntdll! ZwReadFile+0x14: 00007ff8`fea4aa64 c3 ret 0:000> ! clrstack -l OS Thread Id: 0x5b70 (0) Child SP IP Call Site 00000015c37feed0 00007ff889e60b9c ConsoleApp2.Program.Main(System.String[]) [C:\dream\Csharp\ConsoleApp1\ConsoleApp2\Program.cs @ 35] LOCALS: 0x00000015c37fef90 = 0x0000014ad7c12d88 0x00000015c37fef88 = 0x0000014ad7c13060 0x00000015c37fef80 = 0x0000014ad7c33438 00000015c37ff1a8 00007ff8e9396c93 [GCFrame: 00000015c37ff1a8] 0:000> ! do 0x0000014ad7c13060 Name: System.Collections.Generic.List`1[[System.Int32, mscorlib]] MethodTable: 00007ff8e7aaa068 EEClass: 00007ff8e7c0b008 Size: 40(0x28) bytes File: C:\WINDOWS\Microsoft.Net \ assembly \ GAC_64 \ mscorlib \ v4.0 _4. 0.0.0 __b77a5c561934e089 \ mscorlib DLL Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7a98538 400189e 8 System.Int32[] 0 instance 0000014af02d1020 _items 00007ff8e7a985a0 400189f 18 System.Int32 1 instance 5000000 _size 00007ff8e7a985a0 40018a0 1c System.Int32 1 instance 5000000 _version 00007ff8e7a95dd8 40018a1 10 System.Object 0 instance 0000000000000000 _syncRoot 00007ff8e7a98538 40018a2 0 System.Int32[] 0 shared static _emptyArray >> Domain:Value dynamic statics NYI 0000014ad61166c0:NotInit << 0:00 0 >! do 0000014af02d1020 Name: System.Int32[] MethodTable: 00007ff8e7a98538 EEClass: 00007ff8e7c05918 Size: 33554456(0x2000018) bytes Array: Rank 1, Number of elements 8388608, Type Int32 (Print Array) Fields: None 0:000> ! do 0x0000014ad7c33438 Name: System.Collections.Generic.List`1[[System.Int64, mscorlib]] MethodTable: 00007ff8e7aad2a0 EEClass: 00007ff8e7c0bd70 Size: 40(0x28) bytes File: C:\WINDOWS\Microsoft.Net \ assembly \ GAC_64 \ mscorlib \ v4.0 _4. 0.0.0 __b77a5c561934e089 \ mscorlib DLL Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7aa6c08 400189e 8 System.Int64[] 0 instance 0000014a80001020 _items 00007ff8e7a985a0 400189f 18 System.Int32 1 instance 5000000 _size 00007ff8e7a985a0 40018a0 1c System.Int32 1 instance 5000000 _version 00007ff8e7a95dd8 40018a1 10 System.Object 0 instance 0000000000000000 _syncRoot 00007ff8e7aa6c08 40018a2 0 System.Int64[] 0 shared static _emptyArray >> Domain:Value dynamic statics NYI 0000014ad61166c0:NotInit << 0:00 0 >! do 0000014a80001020 Name: System.Int64[] MethodTable: 00007ff8e7aa6c08 EEClass: 00007ff8e7c09e50 Size: 67108888(0x4000018) bytes Array: Rank 1, Number of elements 8388608, Type Int64 (Print Array) Fields: NoneCopy the code

Looking closely at the figure above, three variables are found in the stack of the main thread. The last two variables are our List<int> and List<long>, respectively

Size: 33554456(0x2000018) bytes => 33554456/1024/1024 = 32M

Size:67108888(0x4000018) bytes => 67108888/1024/1024 = 64M

I know that 500W int takes 32M of memory, although the memory space is optimized by half, but there is no essential optimization, still have to continue to dig up, otherwise running 4 tasks at the same time will blow up the memory again…

2. Use the bitarray

When we are learning data structures, I believe that many people have learned bitmap. The number of people who are just obtained by the atomized filter conditions is large, and using Bitmap just meets my business needs. If YOU do not know bitmap, I will explain briefly.

<1> Principle

We all know that an int is 4 bytes. That’s 4 bytes, or 32bit, and it’s 32 squares, as shown below:

The default value of 32 cells for an int is not wasteful, but 32 cells can hold 32 numbers (1-32). Like 1 in the first cell, 3 in the third cell… 32 in the 32nd grid, then two int can store 1-64 number, that is to say, ideally 32 times can be optimized space, thinking should be reversed, the figures as an array subscript, because it is a bit, so the two kinds of 0, 1 just can say whether the grid has been set up, one has been set, 0 means unset, have a good taste, if you still don’t understand, refer to my article from eight years ago:

Classic algorithm daily exercise – the eleventh Bitmap algorithm

In C#, we already have a BitArray class. Let’s take a look at how BitArray sets the value of each cell. The bottom layer is still m_array, which is actually an int[].

public void Set(int index, bool value) { if (value) { m_array[index / 32] |= 1 << index % 32; } else { m_array[index / 32] &= ~(1 << index % 32); } _version++; } public bool Get(int index) { return (m_array[index / 32] & (1 << index % 32)) ! = 0; }Copy the code

<2> Check the memory usage

List<int> = List<int> = List<int>

public static void Main(string[] args) { var rand = new Random(); List<int> intCustomerIDList = Enumerable.Range(1, 5000000).OrderBy(m => rand.Next(0, 100000)) .Take(5000000).ToList(); BitArray bitArray = new BitArray(intCustomerIDList.Max() + 1); foreach (var customerID in intCustomerIDList) { bitArray[customerID] = true; } console. WriteLine(" Processing done...") ); Console.Read(); }Copy the code

Then grab the dump file and use windbg to see the memory usage.

0:00 0 >! do 0x0000026e4d0332b8 Name: System.Collections.BitArray MethodTable: 00007ff8e7a89220 EEClass: 00007ff8e7c01bc0 Size: 40(0x28) bytes File: C:\WINDOWS\Microsoft.Net \ assembly \ GAC_64 \ mscorlib \ v4.0 _4. 0.0.0 __b77a5c561934e089 \ mscorlib DLL Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7a98538 4001810 8 System.Int32[] 0 instance 0000026e5dfd9bd8 m_array 00007ff8e7a985a0 4001811 18 System.Int32 1 instance 5000001 m_length 00007ff8e7a985a0 4001812 1c System.Int32 1 instance  5000000 _version 00007ff8e7a95dd8 4001813 10 System.Object 0 instance 0000000000000000 _syncRoot 0:000> ! DumpObj /d 0000026e5dfd9bd8 Name: System.Int32[] MethodTable: 00007ff8e7a98538 EEClass: 00007ff8e7c05918 Size: 625028(0x98984) bytes Array: Rank 1, Number of elements 156251, Type Int32 (Print Array) Fields: NoneCopy the code

As you can see from the figure, yes, it is a bitArray type. As you can see from the Size:

Size: 625028(0x98984) Bytes => 625028/1024/1024 = 0.59m

See, this is 🐮👃, optimized from the original 64M to 0.6m, it is not too cool, see such a small usage, I feel boring and boring, ha ha, this parallel run dozens of not afraid, here to remind, if the number of customers is small and the number is large, do not use bitArray, but waste space, Of course, it doesn’t matter how much data you use.

Three:

Run small shop when the code how to write all right, the data volume is large pits everywhere, your scene also always has the method of optimization ~