lead

Take a look at this example:

// 64-bit system
#include<stdio.h>

struct{
    int  a;
    char b;
}s;

int main (a)
{
    printf ("%d\n".sizeof(s);  
    return 0;
}
Copy the code

In theory, on 64-bit systems, ints account for 4 bytes and chars for 1 byte, so putting them in a structure should account for 4+1 = 5bytes; But in reality, running the program yields 8 bytes, which is what memory alignment is all about.

Note: The content discussed in this article is on 64-bit systems.

The outline

  • What is memory alignment

  • Why memory alignment

  • Memory alignment rule

What is memory alignment

The memory space in the computer is divided by byte. Theoretically, it seems that access to any type of variable can start at any address, but the reality is: When accessing a variable of a particular type, it is usually accessed at a specific memory address, which requires that there is a limit on the location of the data in memory. Various types of data are arranged in space according to certain rules, instead of being sequenced one after another, which is called alignment.

Memory alignment is the domain of the compiler. The compiler places each “data unit” in its proper place in the program.

Why memory alignment

To explain this, how does the processor read memory?

If we think of memory as a simple array of bytes, such as in C, char * represents a block of memory. Then we might think that its memory reads can be in 1byte order, as shown in the following figure.

However, although memory is measured in bytes, most processors do not access memory in byte blocks, depending on the data type and processor setup; It typically accesses memory in two-byte, four-byte, eight-byte, 16-byte, and even 32-byte chunks, which we call memory access granularity.

Now that we know that the processor reads in chunks of a certain size, that’s our premise, so to explain why we need memory alignment? So why don’t we take a look at what’s wrong with misalignment?

Alignment has to do with where the data is in memory. If the memory address of a variable is exactly an integer multiple of its length, it is said to be naturally aligned. For example, an integer variable (4 bytes) with an address of 0x00000016 is naturally aligned.

Now suppose that an integer variable (4 bytes) is not naturally aligned, and its starting address falls at 0x00000002 (the blue area in the figure). The processor wants to access its value and reads in 4-byte chunks, starting at 0x0 in the figure, reading in 4-byte sizes, and reading up to 0x3

After such a read, instead of fetching the integer we want to access, the processor then reads down again, offsetting it by 4 bytes, starting at 0x4 and ending at 0x7

At this point, the processor can read the memory data that we need to access, and of course there is a process of culling and merging.

So, in this case, when the integer variable starts at 0x2 (out of alignment), it takes two reads for the processor to get to what we want to access.

What if it’s aligned?

Obviously, if it is aligned, for this example, we can read the target data only once.

As you can see, alignment or not affects our read efficiency.

At the same time:

Hardware platforms vary greatly in how they treat storage space. Some platforms can only access certain types of data from certain addresses, rather than from any address in memory.

For example, in architectures where the CPU has an error accessing an unaligned variable, the program must ensure byte alignment. This may not be the case on other platforms, but it is most common to lose access efficiency if data storage is not aligned to the requirements appropriate for their platform.

Because the data can only be read at certain addresses, the processor may need to make multiple accesses to unaligned memory while accessing some data; For aligned memory, you only need to access it once.

That’s why memory alignment is important.

Memory alignment not only facilitates fast CPU access, but also saves storage space by properly using byte alignment.

We can also compare the different effects of different memory access granularity on the same task.

Set the same task: read 4 bytes from Address0 from Address0 and Address1 respectively into the processor register.

  • Look at the single-byte granularity first

In both figures, the left side represents memory, the right side represents register, and the arrow in the middle represents the process of reading. Because of the single-byte access granularity, read memory is accessed by 1 byte, so for Address0, it takes 4 reads to get 4 bytes from 0, and the same for Address1. It doesn’t matter if it’s not memory aligned.

  • Now look at the two-byte granularity

Reading four bytes from Address0 takes half as many reads as a processor with one byte granularity. Because each memory access has a fixed overhead, minimizing the number of accesses can actually improve performance. Address0 is also memory-aligned (the starting position of the data falls in position 0), so the first read of address 01 and the second read of address 23 fetch the target data.

However, when reading from Address1. Because the address was not evenly on the boundary of the processor’s memory access (Address1 registers in the black box area is area of the data memory address, the starting position is 1, not aligned), the processor to get the data, the first address from 0 begin to read the first 2 bytes (01), to eliminate unwanted byte (0 address), The next 2-byte block (23) is read starting at address 2, and the next 2-byte block (45) is read starting at address 4, eliminating unwanted bytes (address 5). In this way, after reading three times, the last three pieces of data will be merged into the register to get the target data.

  • What about the four-byte granularity?

A processor with four-byte granularity reading from Address0 can extract four bytes from the aligned address in a single reading of address 0123.

However, when reading from Address1, because it is not aligned, address 0123 is read, address 0 is deleted, then address 4567 is read, address 5, address 6, and address 7 are deleted. In this way, after reading for two times, the remaining two pieces of data are merged into the register and the target data is obtained.

As you can see from the two-byte and four-byte granularity, there are more reads and culling and merging processes for unaligned memory. This is obviously inefficient.

It is also important to note that for aligned memory, different access granularity also affects access efficiency. Small size means more access times, large size means a waste of space. So the compiler on each particular platform has its own default access granularity.

Now that we know about memory alignment and why, let’s move on to what the principles of memory alignment are.

Memory alignment rules

For standard data types

Its address only needs to be an integer multiple of its length

For the structure

In a structure, the compiler allocates space for each member of the structure in alignment with its natural boundaries. Members are stored in memory in the order they were declared, with the address of the first member being the same as the address of the entire structure. The specific rules are as follows:

  • 1. The first member must start at the address where the offset of the structure variable is 0.

  • 2. From now on, the offset of each member relative to the head address of the structure is an integer multiple of the size of the member. If necessary, the compiler adds padding bytes between the members.

  • 3. The total size of the structure is an integer multiple of the maximum alignment number (each member variable has its own alignment number), and the compiler adds padding bytes after the last member if necessary.

  • 4. If a nested structure is aligned to an integer multiple of its own maximum aligned number, the overall size of the structure is an integer multiple of all the maximum aligned numbers that include the nested structure.

Let me give you an example

Example 1:

struct test_t {
  int   a;
  long  b;
  short c;
};
Copy the code

The first member is int, accounting for 4 bytes. The memory distribution is 00 01 02 03. I’ll do it in red

The second member is of type long and is 8 bytes long. The memory offset 04 is not a multiple of 8, so bytes are filled (green indicates the fill bytes) to 0x7, and data b of type long is written to memory (yellow).

The third member is short and is 2 bytes long. The memory offset 16 is an integer multiple of the number of bytes short occupies, so it is written directly to memory (shown in blue).

At this point, the data data members in the structure body are aligned, but the total size of the current structure is 18, which does not satisfy rule 3, so you need to fill in 6 bytes after the last member to give it a total size of 24.

So the size of this structure is 24.

If you swap short and long in a structure,

struct test_t {
  int   a;
  short b;
  long  c;
};
Copy the code

What would be the result? We still use red for ints, blue for shorts, yellow for longs, and green for padding bytes.

That’s 16. If the order of the members is changed, the size of the structure will be affected. Therefore, we should not only understand the memory alignment, but also use it correctly.

For structures nested structures, given in rule 4, I’m not going to draw the diagram here, but you’re smart enough to see that you’ll get the right answer.

Let’s review the alignment rules above:

The starting address of each member variable must be a multiple of the number of bytes occupied by the type of the variable.

Each member variable in storage according to the order of the structure in order to apply for space, at the same time according to the alignment of the above position, the empty byte automatically filled

And to ensure that the size of the structure is a multiple of the structure’s byte boundaries (that is, the number of bytes of the type that occupies the most space in the structure), space is allocated for the last member variable and the empty bytes are automatically filled as needed

This is the general rule for memory alignment, but it is important to note that:

If we run the same example on a processor with a different architecture, we may get different results, and even different compiler configurations may affect this result. What are the factors that affect memory alignment rules?

What factors affect memory alignment results?

1.#pragma pack(n)

Each compiler on a particular platform has its own default “alignment coefficient” (also known as the alignment modulus). Programmers can change this by precompiling the command #pragma pack(n), n=1,2,4,8,16, where n is the “alignment factor” you want to specify. This is an upper bound and affects only members whose alignment unit is greater than n, not members whose alignment bytes are not greater than n.

The processor can be thought of as reading/writing n bytes from memory at a time. For members of size less than n, align them according to their own alignment conditions, because they can be taken out at once no matter how they are placed. For members whose alignment criteria are greater than n bytes, alignment by its own alignment criteria takes the same number of reads as alignment by n bytes, but alignment by n bytes saves space.

Uncustomize byte alignment with the precompiled command #pragma pack().

It can also be written as:

#pragma pack(push,n)

#pragma pack(pop)

2.__attribute__((aligned (n)))

__attribute__((aligned (n))), so that structure members are aligned on n-byte natural boundaries. If there are members in the structure whose length is greater than n, align them according to the length of the largest member.

__attribute__((Packed)) : Disables the optimized alignment of structures during compilation and aligns them according to the actual number of bytes occupied.

Note that the number of memory aligned alignments depends on the smaller value between the alignment coefficient and the number of bytes in the member.

For example:

struct test
{
char x1;
short x2;
float x3;
char x4;
}
Copy the code

By default, the first member of the structure, x1, with offset address 0, occupies the first byte. The second member, X2, is of type short and its starting address must be bound by 2 bytes, so the compiler fills in a null byte between X2 and x1. The third and fourth members of the structure, x3 and x4, fall on their natural boundary addresses, and no additional padding bytes are required before them. In the test structure, member X3 requires 4-byte alignment, the largest boundary unit required of any member of the structure, so the natural alignment condition for the test structure is 4 bytes, and the compiler fills three empty bytes after member X4. The entire structure takes up 12 bytes of space.

Let the compiler do 1-byte alignment for this structure when using: #pragma pack(1)

#pragma pack(1) // Tell the compiler to align this structure with 1 byte
struct test
{
char x1;
short x2;
float x3;
char x4;
};
#pragma pack() // Cancel the 1-byte alignment and restore the default 4-byte alignment
Copy the code

Sizeof (struct test) is 8.

Same logic: Use __attribute__((Packed))

#define PACKED __attribute__((packed))
struct PACKED test
{
char x1;
short x2;
float x3;
char x4;
}test;

Copy the code

Now sizeof(test) is still 8.

conclusion

In essence, memory alignment is to customize a set of rules to make proper use of memory space and improve memory access efficiency. By adding some padding, the compiler allows each member to be accessed in a single instruction, rather than requiring multiple accesses to concatenate.

It’s a process of exchanging space for time.