Recently I read the design of Redis SDS very clever, just from the type of string design in my opinion, it is used: one element to express two kinds of information (Java thread pool, part of the lock also used); Different scenarios distinguish different types of strings, structures are compact, and so on.

Redis uses __attribute__ ((__packed__)) to cancel memory alignment in order to save memory and make it easier for BUf [-1] to get the FLAGS address, resulting in CPU performance degradation. Here is my understanding:

Why memory alignment?

The memory stores data for use by the CPU. When the CPU accesses memory data, it is limited by the width of the address bus. When the CPU obtains data from the memory, the start ADDRESS must be a multiple of the width of the address bus. Data is also read piecewise by the CPU. The block size is known as memory read granularity.

For example, if the CPU address bus is 64 bits (8 bytes), how does the CPU get an int when an int (4 bytes) is stored at the address 0x06 (0x06, 0x10)?

Step 1: read 8 bytes from 0x00 to 0x08, then save the next two bytes (0x06-0x08) to the first two bytes of int.

Step 2: Read 8 bytes from 0x08 to 0x0F, then save the first two bytes (0x09-0x10) to the last two bytes of int.

Therefore, reading an int from memory into the CPU requires two reads from memory, which takes extra clock cycles to process alignment, arithmetic, etc., which greatly reduces the efficiency of execution and leads to CPU performance degradation.

What is memory alignment

Because it seems to the CPU memory is distribution in a block, then the starting address of the read data is not arbitrary, want to reduce the performance of CPU, you will need to various types of data according to certain rules, the rules of the CPU reads the memory) in the memory space to store, not carried out in accordance with the order and then stored (space, in time, causing a certain waste of memory, But in exchange for CPU performance)

Memory alignment rule

The compiler makes optimized alignment by default during compilation.

Every platform compiler has a default “alignment factor” (also called the alignment modulus), which is changed by the precompilation command #pragma pack(n), n=1,2,4,8,16, where n is the “alignment factor” you want to specify, which defaults to 8. When the n value of the #pragma pack is equal to or exceeds the length of all data members, the size of this n value has no effect.

Data member alignment rules

Data member of struct (or union), the first data member is placed at offset 0, then each data member is aligned according to the value specified by #pragma pack and the length of the data member itself, The smaller one runs (min(#pragma pack())), and the offset from the first address of the structure is a multiple of min(#pragma pack()), which can also be called %N (min(#pragma pack())) = 0, The start address is a value greater than the original start address and the least multiple of N, denoted as OFFSET’ > OFFSET, OFFSET’ % N = 0.

Global alignment rules for structures (or unions)

After the data members have been individually aligned, the structure (or union) itself is also aligned according to the value specified by #pragma Pack and the smaller of the structure (or union) maximum data member length. Structure (or union) aligned size must be a multiple of min(#pragma pack()), also known as the structure size %N (min(#pragma pack())) = 0, the structure size is greater than the original structure size and is the smallest multiple of N, simply called SLEN ‘> SLEN, SLEN % N = 0.

Struct x1 {include <stdio.h> struct x1 {include <stdio.h> struct x1 {include <stdio.h> 0 =0; [0,3] int I; // Length 1 < 8 press 1; Start offset=4 4%1=0; [4] char c1; // Length 1 < 8 press 1; Start offset=5 5%1=0; [5] char c2; // Maximum data member length 4 < 8, struct x1 size 6, padding 2 bytes, is 8, 8 is greater than 6 (struct x1 size) and is the least multiple of 4}; Struct x2 {// length 1 < 8; 1=0; Char c1; // Align length 4 < 8 with 4; Start offset=1 1%4=1, does not match the start address %n=0, then the selected start address must be greater than 0, and %4=0; [4,7] int I; // Length 1 < 8 press 1; Start offset=8 8%1=0; [8] char c2; // maximum data member length 4 < 8, structure x1 size 9, fill 3 bytes, is 12,12 is greater than 9 and is the smallest multiple of 4}; Struct x3 {// length 1 < 8; 1=0; Char c1; // Length 1 < 8 press 1; Start offset=1 1%1=0; [1] char c2; // Align length 4 < 8 with 4; Start offset=2 2%4=2; Do not match the start address %n=0, then the selected start address must be greater than 1, and %4=0; [4,7] int I; // Maximum data member length 4 < 8, struct x1 size 6, padding 2 bytes (after two chars), is 8, 8 is greater than 6 (struct x1 size) and is the least multiple of 4}; int main() { printf("%d\\n", sizeof(struct x1)); Printf ("%d\ n", sizeof(struct x2)); Printf ("%d\ n", sizeof(struct x3)); Return 0; }Copy the code

Change alignment coefficient

#pragma pack(n) /* n = 1, 2, 4, 8, 16 */

#pragma pack(n) /* n = 1, 2, 4, 8, 16 */
struct x{
    int a;
    char b;
    short c;
    char d;
};
Copy the code

Compact arrangement of SDS

struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
Copy the code

Redis uses __attribute__ ((__packed__)) (compact) to save memory and make it easier for BUf [-1] to fetch the flags address.

Why is SDS compact

Save memory

For example, sdshdr32 defaults to 12 with memory alignment and 9 with __attribute__ ((__packed__)) (64-bit machines)

Compatible with C language String function and easy to get type

Because the String function is compatible with C language, the returned pointer is only a pointer to BUF, and different systems have different fill bits. Therefore, if you use the BUF pointer to obtain flags, you need to process for different machine platforms with different bits, which undoubtedly complicates the problem.

Structure elements are declared in reasonable order and self-aligned

__attribute__ ((__packed__)) is the only difference between an __attribute__ ((__packed__)) and not an __attribute__ ((__packed__)).

If there is any misunderstanding or writing wrong place, hope readers comment or private letter correction, very grateful.

Source: memory alignment section describes zhuanlan.zhihu.com/p/66441507