Introduction to the
Today we are going to look at OC memory alignment.
Let’s get your heads humming
- Why memory alignment?
A: To improve efficiency, simply trade space for time, which is mainly related to chip design. The CPU execution process can be divided into four simple steps
-
The CPU reads the command to which the PC pointer points and imports it into the instruction register
-
The CPU determines the instruction type and parameter in the instruction register
-
Execute to different units respectively. Calculation instruction -> Logical operation unit; Store instructions -> control unit
-
PC pointer increment, proceed to the next instruction.
In assembly, different lengths of memory access will use different assembly instructions. If a piece of memory is randomly placed on the address, the CPU will need multiple different instructions to access it, and those of you who know the instruction cycle will know that this will greatly reduce efficiency.
Let’s start with a picture to review what we gave to the teacher (Cooci)
Conclusion first
The following are the principles of memory alignment
- The memory size of a structure is not the sum of its internal elements. 1 + 8 + 4! = 16
- The starting address of a structure variable, divisible by the size or module of the maximum element base type;
- The memory alignment of a structure is based on the size of its internal largest element base type or module;
- Pragma pack(n) the value of the module varies from platform to platform, but can also be changed by using the precompiled command #pragma pack(n). An embedded GG summary, pulled to the end
- If the spatial address allows, the internal elements of the structure will be pieced together in the same aligned space;
- There are structure-variable elements in the body of a structure, which are aligned with the maximum internal element size or modulus of the structure-variable instead of being expanded and then aligned
Verify internal alignment of structures
First we create a project, declare three structs, respectively FSStruct1, FSStruct2, FSStruct3. The following code
struct LGStruct1 {
double a; / / 8 7 [0]
char b; / / 1 [8]
int c; // 4 (9 10 11 [12 13 14 15]
short d; // 2 [16 17] 24
}struct1;
struct LGStruct2 {
double a; / / 8 7 [0]
int b; // 4 [8 9 10 11]
char c; / / 1 [12]
short d; // 2 (13 [14 15] 16
short e; / / 2 [17]
}struct2;
// Homework: structure internal alignment
struct LGStruct3 {
double a; / / 8 7 [0]
int b; // 4 [8 9 10 11]
char c; / / 1 [12]
short d; // 2 (13 [14 15] 16
int e; // 4 [16 17 18 19] 20
struct LGStruct1 str; // 8 (20 [24 25... 31]
}struct3;
Copy the code
Then in main.m we initialize these structures and analyze the actual memory allocation based on printing and debugging to verify our point. We first see Struct a and b, in addition to the order of the internal elements is different, the other is consistent, very strange oh, his two actually inconsistent memory size 24 | a = b = 16. Memory address a=0x7ffee7a5cc20 b= 0x7FFee7a5CC10 c= 0x7ffee7a5Cbe0 LGStruct2 b
Analyze LGStruct1 A according to memory alignment principles
- Double a, 8 bytes, starting from 0, is divisible by min (0, 8) module, i.e. [0,7];
- Char b, 1 byte, starting at 8, divisible by min (8, 1); char B, 1 byte, starting at 8, divisible by min (8, 1);
- Int c, 4 bytes, starting from 9, at this time does not meet the modulus of min (9, 4) divisible, need to complement 12 to meet the modulus of divisible, i.e. [12,15] store C;
- Short d, accounting for 2 bytes, starts from 16, at this time, meet min (16, 2) module divisible, namely [16,17] store D;
- Double a = 1.1 to hexadecimal 1.1000000000000001 -> 0x3FF199999999999A
- Char b = ‘a’ converts to hex ASCII 96 -> 0x61
- Int c = 2 converted to hexadecimal 2 -> 0x00000002
- Short d = 3 converts to hexadecimal 3 -> 0x0003
- Based on the small-endian pattern correction and 8-byte alignment, the last one should be
0x3ff199999999999a 0x0000000200000061 0x0000000000000003
, 8 + 8 + 8 = 24 bytes
Let’s actually verify that the LLDB input, x/4gx 0x7FFEE7a5CC20, corresponds to our conjecture.
If Po is not output directly, e is short for expression. Input help expression in LLDB to get a detailed instruction. We will only smoke the one used to explain. The first argument **-f is a shorthand for the output format, and the second argument f** is the format, which represents float, — is the format requirements described below.
-f <format> ( --format <format> )
Specify a format to be used for display.
'f' or "float"
Examples:
expr my_struct->a = my_array[3]
expr -f bin -- (index * 8) + 5
expr unsigned int $foo = 5
expr char c[] = \"foo\"; c[0]
Important Note: Because this command takes 'raw' input, if you use
any command options you must use ' -- ' between the end of the command options and the beginning of the raw input.
Copy the code
So that one could be a coincidence, and then we analyze LGStruct2 b based on memory alignment
- Double a, 8 bytes, starting from 0, is divisible by min (0, 8) module, i.e. [0,7];
- Int b, 4 bytes, starting from 8, then meet min (8, 4) module divisible, i.e. [8,11];
- Char c, 1 byte, starting at 12, divisible by min (12, 1);
- Short D, accounting for 2 bytes, starts from 13. If min (13, 2) is not divisible by the module, it needs to be completed to 14 to meet the module divisible by the module, that is, [14,15] is stored in D.
- Double a = 1.1 convert to hexadecimal 1.1000000000000001 -> 0x3FF199999999999A;
- Int b = 2 to hex 2 -> 0x00000002;
- Char C = ‘b’ converted to hexadecimal ASCII 97 -> 0x62
- Short d = 3 converts to hexadecimal 3 -> 0x0003
- Based on the small-endian pattern correction and 8-byte alignment, the last one should be
0x3ff199999999999a 0x0003006200000002
, 8 + 8 = 16 bytes
Similarly, let’s actually verify that the LLDB input, x/4gx 0x7FFEE27ACC10, also corresponds to our conjecture
At this point, conclusions 1-5 are basically proved, and we will analyze LGStruct3 C of the structure
- Double a, 8 bytes, starting from 0, is divisible by min (0, 8) module, i.e. [0,7];
- Int b, 4 bytes, starting from 8, then meet min (8, 4) module divisible, i.e. [8,11];
- Char c, 1 byte, starting at 12, divisible by min (12, 1);
- Short D, accounting for 2 bytes, starts from 13. If min (13, 2) is not divisible by the module, it needs to be completed to 14 to meet the module divisible by the module, that is, [14,15] is stored in D.
- Int e, 4 bytes, starting at 16, is divisible by min (16, 4) module, i.e. [16,19];
- Struct FSStruct1 STR, 15 bytes,
FSStruct1
The maximum value of the element is 8, starting from 20. If min(20, 8) is not divisible by the module, it needs to be completed to 24 to meet the module divisible by the module, i.e. [24, 47].FSStruct1
See above analysis, size 24 bytes, same calculation rules here; - Double a = 1.1 convert to hexadecimal 1.1000000000000001 -> 0x3FF199999999999A;
- Int b = 2; int b = 2;
- Char c = ‘b’ converted to hexadecimal ASCII 97 -> 0x62;
- Short d = 3 convert to hexadecimal 3 -> 0x0003;
- Int e = 4 converted to hexadecimal 4-> 0x00000004;
- Struct FSStruct1 STR, repeat
FSStruct1 a
Steps 1-9, should be0x3ff199999999999a 0x0000000200000061 0x0000000000000003
- Based on the small endian mode correction and 8-byte alignment, the last one should be
0x3ff199999999999a 0x0003006300000002 0x0000000000000004 0x3ff199999999999a 0x0000000200000061 0x0000000000000003
, 8 * 6 = 48 bytes
Verify iOS object memory alignment
So once we’re done with structs, we probably don’t use structs a lot in iOSer, but we use OC objects a lot, so we still need to analyze OC objects. So in order to eliminate some of the unimportant factors, we create a new FSStudent with the same attributes as the Struct. The following code
@interface FSStudent : NSObject
@property (nonatomic) double a;
@property (nonatomic) char b;
@property (nonatomic) int c;
@property (nonatomic) short d;
@end
@interface FSStudent2 : NSObject
@property (nonatomic) double a;
@property (nonatomic) int b;
@property (nonatomic) char c;
@property (nonatomic) short d;
@end
@interface FSStudent3 : NSObject
@property (nonatomic) double a;
@property (nonatomic) int b;
@property (nonatomic) char c;
@property (nonatomic) short d;
@property (nonatomic) int e;
@property (nonatomic, strong) FSStudent *str;
@end
Copy the code
We can also use print to look at the result, sizeof prints 8, this is the sizeof the pointer type; Class_getInstanceSize prints 24, 24, 40. This is the memory size of the obtained instance object, 8 bytes aligned; Malloc_size prints 32, 32, 48. This is the actual memory size allocated by the system, 16 bytes aligned
Sizeof (a) class_getInstanceSize(FSStudent class
FSStudent A 0x00000001054FACA0 isa pointer 0x0000000200030061 is a clear combination of 0x00000002, 0x0003 and 0x61 (corresponding to b= ‘a’, C =2, d=3). Apple rearranges the memory for the attributes. Because b, C, and D correspond to char (1), int (4), and short (2) respectively, 1+2+4 are aligned with 8 bytes, and the incomplete ones are stored in the same block of memory. The actual object memory is 8*3=24 bytes, following the 16-byte alignment, the actual allocation of 8*4=32 bytes, followed by 0; For the OC object whose attributes are not aligned, FSStudent2 B, we can see that the actual size of the memory occupied by the object and the actual allocated size are the same as a, and the memory arrays are the same except for the actual values of 0x61 and 0x62. What we can confirm here is that Apple has rearranged the OC object’s attributes to optimize memory allocation.
(lldb) po a <FSStudent: 0x600000237ac0>
(lldb) x/8gx 0x600000237ac0
0x600000237ac0: 0x00000001054faca0 0x0000000200030061
0x600000237ad0: 0x3ff199999999999a 0x0000000000000000
(lldb) po b
<FSStudent2: 0x600000237b20>
(lldb) x/8gx 0x600000237b20
0x600000237b20: 0x00000001054facf0 0x0000000200030062
0x600000237b30: 0x3ff199999999999a 0x0000000000000000
Copy the code
conclusion
- Structs allocate different memory sizes for elements of the same data type in different order, so good awareness and code habits will optimize program performance. The author is engaged in intelligent hardware, audio and video related industries, some camera embedded development, the lack of memory resources appears to be particularly important.
- Apple iOS rearranges the properties of objects to optimize memory. In fact, there are many optimization of this system, for example, NSArray is also based on C array optimization, interested students can explore. It may not be useful in normal development, but this kind of in-depth exploration can’t hurt.
- For the exploration of 2, I will talk about a simple syntax I learned before. After exploring the assembly, I found that simple syntax like if and Switch case can also be optimized for different scenarios. It may be to adjust the order, but also to optimize the performance and save computing power.