Why is Redis so fast?

  1. Redis is single-threaded, avoiding the context switching and concurrency control overhead of multithreading;

  2. Most Redis operations are memory-based and require no disk I/O to read and write data, so it is very fast.

  3. Redis uses I/O multiplexing to improve network I/O concurrency.

  4. Redis provides efficient data structures such as skip tables, hash tables, etc.

Basic data structure

SDS

Redis’s simple dynamic string SDS is mutable and follows a C string ending in a 1-byte null character with a maximum length of 512 megabytes.

Why does SDS use 1-byte null character endings? A function that ends part of a reusable C string with a 1-byte null character.

Structure definition

SDS uses a byte array to store the contents of the string. The length of the string can be obtained by the complexity of O(1) through the len attribute.

struct sdshdr{
	// The length of the string, i.e. the number of bytes used by buf[]
	int len;
	//buf[] Number of remaining bytes
	int free;
	// An array of bytes to hold the contents of the string
	char buf[];
};
Copy the code

Memory allocation policy

SDS uses space preallocation and lazy space release to optimize the number of SDS memory allocation (n times → n times at most).

  1. Space preallocation

Space preallocation is used to optimize string growth operations. When modifying SDS requires an expansion of memory space, not only the required space is allocated, but additional unused space is allocated based on the LEN attribute.

  • If len < 1MB after SDS modification, unused space of the same size as LEN will be allocated;
  • If len > 1MB after SDS modification, 1MB of unused space will be allocated;
  1. Inert space release

Lazy space release is used to optimize SDS shortening operations. When SDS is shortened, the program does not immediately reclaim unused space, but uses free to record unused space length and wait for future use. (You can also call a function to release space manually)

SDS and C strings

  1. Get the length of the string
  • C string needs to traverse the entire string counting statistics length, time complexity is O(n);
  • SDS only needs to obtain SDSHDr. Len, and the time complexity is O(1);
  1. Buffer overflow
  • C String does not record its length. If sufficient memory is not allocated during modification, buffer overflow may occur.
  • SDS will check whether the memory is sufficient according to the sdSHDR. free attribute before modification. If the memory is insufficient, expansion will be performed before modification.
  1. Binary security
  • C The string cannot contain null characters except the end of the string. Otherwise, the first null character read in will be mistaken for the end of the string. (So C strings can only hold text data, not binary data)
  • SDS determines whether the string ends by sdshDr. len, which can be used to store binary data.
  1. Number of memory allocations
  • C Each change of a string requires memory reallocation.
  • SDS requires up to n memory reallocations;

The list

The characteristics of

  1. Bidirectional linked list: obtain the complexity O(1) of the precursor and successor nodes of a node;
  2. Acyclic: the precursor pointer to the head node and the successor pointer to the tail node point to NULL;
  3. Insert and delete fast, time complexity O(1); Slow search, time complexity O(n);

Structure definition

Node definition:

typedef struct listNode{
	// The precursor node pointer
	struct listNode *prev;
	// Next node pointer
	struct listNode *next;
	/ / the node values
	void*value; } listNode;Copy the code

Linked list definitions:

typedef struct list {
    / / head node
    listNode *head;
    / / end nodes
    listNode *tail;
    // Copy the node value function
    void *(*dup)(void *ptr);
    // Release the node value function
    void (*free) (void *ptr);
    // Compare node value functions
    int (*match)(void *ptr, void *key);
    // Number of nodes
    unsigned long len;
} list;
Copy the code

The dictionary

The characteristics of

Dictionaries are used to hold key-value pairs, each containing a unique key.

Structure definition

The dictionary HT [2] is an array of two hash tables, and ht[0] is normally used only for rehash purposes. Dict. rehashidx records the current rehash progress.

typedef struct dict {
    // a specific type of manipulation function
    dictType *type;
    // Private data (parameters passed to a particular type of operation function)
    void *privdata;
    / / a hash table
    dictht ht[2];
    // Rehash index, not rehashidx=-1 when rehash is performed
    long rehashidx;   
} dict;
Copy the code

The underlying hash table is implemented based on a dictEntry array, each item holds a key-value pair, and the hash to the nodes of the same array item is joined by next.

typedef struct dictht {
    // Hash table array
    dictEntry **table;
    // Hash table size
    unsigned long size;
    Sizemask =size-1, sizemask=size-1, sizemask=size-1, sizemask=size-1
    unsigned long sizemask;
    // Number of existing nodes
    unsigned long used;
} dictht;

typedef struct dictEntry {
    / / key
    void *key;
    / / value
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    // Next node
    struct dictEntry *next;
} dictEntry;
Copy the code

Hash process

  1. Calculate the hash value of key-value pairs according to the hash function;
  2. According to the hash value of the hash table mask dictht.sizemask modulus calculation index value;
  3. Puts the key-value pair into the corresponding index position of the hash table array according to the index value;

How to resolve hash conflicts? Resolve hash conflicts using chain address method. Each hash table node dictEntry holds a next pointer, and the nodes assigned to the same array item are connected into a one-way linked list through the next pointer.

Rehash process

  1. Allocate space for HT [1] : If it is a capacity expansion operation, the size of HT [1] is equal to 2n of the first value ≥ HT [0]. Used *2. If it is a shrink operation, ht[1] is equal to the first 2n ≥ HT [0]. Used;
  2. Rehash all key-value pairs on HT [0] into HT [1]; — Incremental rehash: rehash all key pairs from ht[0] on the rehashidx index to HT [1], rehashidx++ for each increment/delete/change/lookup of the dictionary during rehash;
  3. When all key pairs on HT [0] are migrated to HT [1], release HT [0], set HT [1] to HT [0], and create a HT [1] to prepare for the next rehash.

1. When will rehash be triggered? Depending on the load factor of the hash table (dictht.used/dictht.size), rehash is triggered when the hash table is too large or too small:

  1. When the server is not running BGSAVE/BGWRITEAOF, the load factor of the hash table is greater than or equal to 1.
  2. When the server is running BGSAVE/BGWRITEAOF, the load factor of the hash table is greater than or equal to 5.

2. Benefits of progressive Rehash The rehash operation is spread over each add/delete/change/look up operation of the dictionary, avoiding server downtime due to the large amount of computation caused by centralized Rehash.

3. Search/insert operations during rehash

  1. Search: Both HT [0] and HT [1] are used in the rehash process. If a key is to be searched, the key is searched in HT [0] first. If no key is found, the key is searched in HT [1].
  2. Insert: New key-value pairs added during rehash are saved to HT [1], ensuring that the number of key-value pairs in HT [0] is only decreased.

Jump table

The characteristics of

The skip list is sorted based on the score value from small to large, and the search process is similar to binary search.

Structure definition

Skip list is based on the ordered linked list, through the linked list on the basis of adding multi-level index to improve the efficiency of search. Each layer of a skip list is a linked list. The lowest linked list contains all elements. Each node of the linked list contains 2 Pointers, a forward pointer points to the next node of the node in the same linked list, and a backward pointer points to the previous node of the node in the same linked list.

typedef struct zskiplist {
    // Head node, tail node
    struct zskiplistNode *header, *tail;
    // Skip list length (number of nodes)
    unsigned long length;
    // The level of the node with the largest level in the skip list
    int level;
} zskiplist;

typedef struct zskiplistNode {
    // Object, unique
    robj *obj;
    // The skip list is arranged according to the score from smallest to largest
    double score;
    // Back pointer to the last node of the current node for traversal from the end of the table
    struct zskiplistNode *backward;
    // Hierarchy array
    struct zskiplistLevel {
        // The forward pointer is used to traverse the table head
        struct zskiplistNode *forward;
        // The span between the node to which the forward pointer points and the current node
        unsigned int span;
    } level[];
} zskiplistNode;
Copy the code

The operation process

  1. Search: Starting from the current highest level, search forward. If the score of the current node is less than that of the inserted node, continue to search for the next node. Otherwise, the search continues one level down (similar to binary search).

  1. Insert: Start from the current highest level and search forward. If the score of the current node is less than that of the inserted node, continue to search for the next node. Otherwise, the search continues down to the first level. The time complexity of skip list insert operation is O(logN).

  2. Delete: search from top to bottom for the index where the node appears for the first time and find the node corresponding to each layer. Delete the nodes found in each layer, or if there is only one node left in the layer, delete the entire layer (except the original list). The time complexity of skip list deletion is O(logN).

The integer set

The characteristics of

Collections of integers can be de-duplicated and hold integer values in order.

The set of integers is located by binary search to obtain element search and insert position.

Structure definition

Integer collections can hold 16-bit, 32-bit, or 64-bit integers, depending on the Encoding attribute.

typedef struct intset {
    // Encoding mode: INT16_t, int32_t, int64_t
    uint32_t encoding;
    // Number of elements
    uint32_t length;
    // Hold an array of elements
    int8_t contents[];
} intset;
Copy the code

Type of upgrade

A type upgrade is triggered when a new element is inserted whose type is longer than the existing element type of the integer collection.

Type upgrade process:

  1. According to the new element type, extend the space size of intset.contents, the bottom array of integer set, and allocate space for the new element;
  2. Convert an existing element to the same type as the new element and place it in the correct place in the array.
  3. Add a new element to the array;

Integer collections increase the flexibility of the collection through type upgrades, without fear of type errors, and upgrades only occur when needed, saving memory.

Integer collections do not support type degradation.

List of compression

The characteristics of

Compressed lists are sequential data structures developed to save memory.

Structure definition

A compressed list can contain any node, and each node can contain either a byte array or an integer value.

  1. Zlbytes: records the size of the compressed list.
  2. Zltail: tail node offset (zLTail can determine the position of the tail node);
  3. Zllen: records the number of nodes in the compressed list.
  4. Entry: Compressed list node;
  5. Zlend: marks the end of the compressed list;

  1. Previour_entry_length: records the number of bytes occupied by the previous node. You can calculate the start address of the previous node based on the start address of the current node +previour_entry_length to traverse from the end of the table to the end of the table.
  • If the previous node <254 bytes, previour_entry_length=1 byte;
  • If the previous object is larger than 254 bytes, previour_entry_length is ≥5 bytes.
  1. Encoding: Content Indicates the data type and length.
  2. Content: The node value, which can be a byte array or an integer value;

Chain update

The previour_entry_length attribute of each node in the compressed list records the length of the previous node. If the size of the previous node changes (increase, delete, or change), previour_entry_length may need to be reallocated (1 byte →5 byte or 5 byte →1 byte). There can be a chain reaction that causes previour_entry_LENGTH on multiple nodes to be reallocated.

Underlying data types

Redis uses basic data structure to realize object system, including string object, list object, hash object, set object, ordered set object five types. You can choose the encoding of the object based on the usage scenario, increasing the flexibility and efficiency of Redis.

typedef struct redisObject {
    // Object type (string, list, hash, set, ordered set)
    unsigned type:4;
    // Object encoding
    unsigned encoding:4;
	// Record the time when the object was last accessed to calculate the idle time of the object
    unsigned lru:REDIS_LRU_BITS; 
	Object reference count, used for memory reclamation and object sharing
    int refcount;
    // Points to the underlying implementation data structure
    void *ptr;
} robj;
Copy the code

When executing a command, Redis determines whether the object can execute the command according to the object type by obtaining redisObject.type.

String (string)

Redis strings are mutable, based on SDS implementation.

In Redis, floating-point numbers of types long and double are converted to strings before being saved, and then converted to floats when read.

coding

  1. Int: long integer;
  2. Raw: a character string with a length greater than 32 bytes.
  3. Embstr: a string of less than 32 bytes, read-only.

The RAW code calls the memory allocation function twice to create the redisObject structure and SDSHDR structure. The EMbSTR code calls the memory allocation function only once to allocate a contiguous memory space, which contains the redisObject structure and SDSHDR structure at one time.

Code conversion

  1. A string encoded by int will no longer hold integer values, int → raw;
  2. Embstr → raw;

The command

127.0.0.1:6379 >set name1 java	# add
OK
127.0.0.1:6379> mset name2 c name3 go	# Batch addOK 127.0.0.1:6379 > keys *Get all keys
1) "name3"
2) "name2"
3) "name1"127.0.0.1:6379 > mget name1 name3# Batch fetch
1) "java"
2) "go"127.0.0.1:6379 >setA 1 OK 127.0.0.1:6379> incr aOn the # 1
(integer) 2
127.0.0.1:6379> incrby a 10	Since # 10
(integer) 12
127.0.0.1:6379> decr a	# 1
(integer) 11
127.0.0.1:6379> decrby a 5	# since minus 5
(integer6)Copy the code

List (list)

coding

  1. Ziplist: Uses a compressed list implementation where each node holds one list element;

  1. Linkedlist: implemented using a two-way linkedlist, each node holds a list element of type string;

Code conversion

Ziplist encoding is used when all string elements of a list are <64 bytes long and the list contains <512 elements, otherwise linkedList encoding is used.

The command

127.0.0.1:6379> Rpush Books Java C go Python add (integer) 4
127.0.0.1:6379> llen books	# get length
(integer) 4
127.0.0.1:6379> lpop books	# Out (first in, first out)
"java"127.0.0.1:6379 > rpop books# out of stack (last in first out)
"python"127.0.0.1:6379> Lrange Books 0-1Get all elements
1) "c"
2) "go"127.0.0.1:6379 > lindex books 1Get the element with index 1
"go"
Copy the code

Hash table (hash)

coding

  1. Ziplist: Using a compressed list, two nodes of the same key-value pair are next to each other, with the key in front and the value in the back.

  1. Hashtable: use a dictionary implementation;

Code conversion

Ziplist encoding is used when the length of all key and value strings in the hashtable is <64 bytes and the number of key and value pairs in the hashtable is <512, otherwise hashtable encoding is used.

The command

127.0.0.1:6379> hset books book1 java	# add
(integer) 1
127.0.0.1:6379> hmset books book2 go book3 c	# Batch addOK 127.0.0.1:6379 > hgetall booksGet all key-value pairs
1) "book1"
2) "java"
3) "book2"
4) "go"
5) "book3"
6) "c"127.0.0.1:6379 > hlen books# get length
(integer) 3
Copy the code

Set (set)

coding

  1. Intset: implemented using a collection of integers;

  1. Hashtable: implemented using a dictionary, where each key is a collection element of type string and the value is NULL;

Code conversion

Intset encoding is used when all elements of the collection are integer values and the number of elements in the collection is <512, otherwise hashTable encoding is used.

The command

127.0.0.1:6379 > sadd books c# add
(integer) 1
127.0.0.1:6379> sadd books java go python	# Batch add
(integer) 3
127.0.0.1:6379> smembers books	# fetch all elements (out of order, not in insert order)
1) "java"
2) "python"
3) "go"
4) "c"127.0.0.1:6379 > sismember books HTMLCheck if an element exists
(integer) 0
127.0.0.1:6379> sismember books java
(integer) 1
127.0.0.1:6379> scard books	# get length
(integer) 4
127.0.0.1:6379> spop books	# Pop up an element
"java"
Copy the code

Ordered set (Zset)

coding

  1. Ziplist: Using compressed lists, each set element is stored in two adjacent nodes, the first node holds the element, and the second node holds the element’s score.

  1. Skiplist: implemented using the zset structure. A zset structure contains a skiplist and a dictionary. The elements in the skiplist and dictionary share objects through Pointers.
  • Skip list saves all elements according to the score value from small to large, through skip list can carry on the range operation to the ordered set;
  • The dictionary implements the mapping of object to score value, and each key-value pair in the dictionary stores a set element, so as to achieve O(1) complexity to find object score value.

Code conversion

Ziplist encoding is used when the length of all elements in an ordered set is <54 bytes and the number of elements is <128, otherwise skpilist encoding is used.

The command

127.0.0.1:6379> zadd books 5 java	# add
(integer) 1
127.0.0.1:6379> zadd books 2 c
(integer) 1
127.0.0.1:6379> zadd books 6 go
(integer1 127.0.0.1:6379> Zadd Books 9 HTML (integer) 1
127.0.0.1:6379> zrange books 0 10	Get the element in positive order [0,10]
1) "c"
2) "java"
3) "go"
4) "html"127.0.0.1:6379> Zrange Books 3Get the elements in positive order [2,3]
1) "go"
2) "html"127.0.0.1:6379> Zrevrange Books 0-1Get all elements in reverse order
1) "html"
2) "go"
3) "java"
4) "c"127.0.0.1:6379 > zcard books# get length
(integer) 4
127.0.0.1:6379> zscore books java	Get the score of the specified element
"5"127.0.0.1:6379> ZrangebyScore books 3Get the element with a score of [3,7]
1) "java"
2) "go"127.0.0.1:6379 > zrem books c# delete element
(integer) 1 127.0.0.1:6379> Zrange Books 0-1Get all elements
1) "java"
2) "go"
3) "html"
Copy the code

conclusion

Special data type

Geo

After Redis 3.2, a Geo module based on GeoHash and ordered collections is provided to implement location-related functions.

Realize the principle of

Geo is implemented based on geohash and ordered set. The ordered set is used to store the location object, and the score is 52 corresponding to the longitude and latitude of the location object, which is the Geohash value.

The command

127.0.0.1:6379> geoadd users: 120.346111 31.556381 user1 120.375821 31.5603682 user2# Add location information
(integer) 2 127.0.0.1:6379> geopos users: Locations user1Get the location of user1
1) 1) "120.34611314535140991"
   2) "31.55637987511895659"
127.0.0.1:6379> geodist users:locations user1 user2 m	# calculate the distance between user1 and user2 positions (m: distance in meters)
"2850.3519"
127.0.0.1:6379> georadius users:locations 120.375821 31.556381 5 km withcoord withdist withhash asc count 100	# get up to 100 elements within the specified range of 5km (asC: sort by distance, desc: sort by distance, count 100: return up to 100 elements)
1) 1) "user2"
   2) "0.4433"	# withdist: Distance from target location(3)integer), 4054421167795118 (4) 1)"120.37582129240036011"	# withcoord: Latitude and longitude of the element
      2) "31.5603669915025975"
2) 1) "user1"
   2) "2.8157"(3)integer), 4054421060663027 (4) 1)"120.34611314535140991"
      2) "31.55637987511895659"
127.0.0.1:6379> georadiusbymember users:locations user1 5 km withcoord withdist withhash asc count 100	Get up to 100 elements within 5km of the specified element
1) 1) "user1"
   2) "0.0000"(3)integer), 4054421060663027 (4) 1)"120.34611314535140991"
      2) "31.55637987511895659"
2) 1) "user2"
   2) "2.8504"(3)integer), 4054421167795118 (4) 1)"120.37582129240036011"
      2) "31.5603669915025975"
Copy the code

Application scenarios

  1. Look for people nearby;

HyperLogLog

HyperLogLog is a probabilistic data structure that provides an imprecise de-counting scheme with a standard error of 0.81%.

Difference between HyperLogLog and set

  1. Set stores statistical elements, so a large number of elements occupy a large amount of memory space.
  2. HyperLogLog does not store statistical elements, only existing tags, and requires only 12K memory space at most;

The command

127.0.0.1:6379> pfadd users1 user1 user3 user5 user7	# add
(integer) 1
127.0.0.1:6379> pfcount users1	Statistical users1 #
(integer) 4 127.0.0.1:6379> pfadd users2 user1 user2 user4integer) 1
127.0.0.1:6379> pfcount users2
(integer) 3
127.0.0.1:6379> pfmerge users1 users2	Add users1 and users2 together to form a new users1
OK
127.0.0.1:6379> pfcount users1
(integer6)Copy the code

Application scenarios

  1. Daily/monthly Active user statistics: One HyperLogLog is used to record daily active users and the user ID is used as the identifier. The monthly active users can be obtained by combining the daily HyperLogLog of the current month.
  2. Statistics webpage visitor UV: use cookie as identification, the same client access only counts 1 time;

Bloom filter

A Bloom filter is a data structure. When a Bloom filter says a value exists, it may not exist; When it says it doesn’t exist, it definitely doesn’t exist.

Realize the principle of

Bloom filter is based on a bit array + multiple hash functions implementation.

  1. insert

When adding a key to a Bloom filter, multiple hash functions are first used to hash the key to obtain a hash value, and then the index value is obtained by modulating each hash value against the length of the bitarray. Each hash function calculates a different index value, and then the index position of the bitarray is set to 1. When a key exists in a Bloom filter, multiple hash functions are first used to hash the key to obtain a hash value, and then the index value is obtained by modulo operation of each hash value with respect to the length of the bitwise array. Each hash function can calculate a different index value. If one of the bits is 0, then the key does not exist. If all the bits are 1, it does not mean that the key exists, but it is very likely that the key exists, because these bits are set to 1 because other keys exist.

Application scenarios

1. Filter spam; 2. The URL to heavy; 3. Prevent cache penetration.

persistence

Redis has two main persistence modes: RDB persistence and AOF persistence.

RDB persistence

RDB persistence can be performed manually or bgSave can be performed periodically, depending on the server.

Realize the principle of

RDB persistence saves data at a point in time to an RDB file (snapshot), which is then saved to disk. When the server starts, you can restore the data by loading the RDB file.

When the server starts, it automatically loads the RDB file (the RDB file exists and the AOF function is not enabled). During the loading, the server is blocked.

RDB persistence can be performed using the save/bgsave command.

The save command

Executing the save command causes the Redis server process to block until the RDB file is created, and the server cannot process any command requests while the server process is blocked.

Bgsave command

The BGSave command gives birth to a child of the Redis server process, which creates the RDB file. When the RDB is created, the child sends information to the parent, during which time the server process can continue processing command requests and polling for signals waiting for the child.

1. Why does BGSave use child processes instead of threads? Take advantage of the operating system’s copy-on-write mechanism to avoid unnecessary memory copying.

2. Automatic bgSave The server periodically performs bgSave based on the configured save option (save condition), which is stored in the RedisServer.saveParam array in seconds – number of modifications. Each time the serverCron() function is called, the criteria for saving the save option are checked (one of the criteria for saving the save option is met) and the bgSave command is executed if so.

  • Get the number of changes based on redisserver. dirty (how many changes have been made to server data since the last save/bgsave command was successfully executed);
  • Get the interval based on redisServer.lastSave (the last time the save/ bgSave command was successfully executed).

RDB file structure

  1. REDIS: The RDB file starts with a 5-byte “REDIS” string, which checks whether the file is an RDB file when the server starts loading the file.
  2. Db_version: a 4-byte string integer that records the RDB file version number. For example, 006 indicates the sixth version of the RDB file.
  3. Database: key-value pair data saved by each database;

  • SELECTDB: 1 byte, indicating that the database number is to be read next;
  • Db_number: specifies the database number. After the db_number is read, the select command is executed to switch the database.
  • Key_value_pairs: Records the key type, key, and value for unexpired keys and stores the expiration flag and expiration time for expired keys.
  1. EOF: 1 byte, indicating the end of the RDB file body.
  2. Check_sum: the checksum of an 8-byte unsigned integer calculated by REDIS+db_version+database+EOF. The checksum is used to check whether the RDB file is damaged when it is loaded.

AOF persistence

AOF persistence Stores write commands executed by the server in the format of Redis command request protocol (plain text). When the server is started, the data can be restored by executing write commands in the AOF file.

Implementation process

  1. Command appending: The server appends the write command to the end of the redisServer.aof_buf buffer (string) after executing a write command.
  2. File writing: Writes data from the redisServer.aof_buf buffer to the AOF file each time the serverCron() function is executed;
  3. File synchronization: Determine whether AOF files need to be flushed to disk according to the persistent behavior configured by the server.
  • Always: writes all data in the redisserver. aof_buf buffer to the AOF file and flushis the AOF file to the disk. The slowest, but the safest;
  • Everysec: the default value. Writes all data in the redisServer.aof_buf buffer to an AOF file. If it has been more than 1s since the last AOF file was flushed to the disk, the AOF file is flushed to the disk again. Faster, but the 1s command data will be lost if a fault occurs.
  • No: Writes all data in the redisserver. aof_buf buffer to an AOF file. The OPERATING system decides when to flush the AOF file to the disk. The fastest, but failure will lose the data after the last AOF file synchronization;

In the operating system, the write() system call stores the written data in the kernel buffer before flushing it to disk.

How do I load AOF files

When the server starts loading the AOF file, it only needs to execute all commands in the AOF file once to restore the data.

  1. Create a pseudo-client with no network connection (the Redis command can only be executed in the client context);
  2. Execute command in AOF file through pseudo client;

AOF rewrite

The longer the server runs, the more content in AOF files, and the larger the file size, the longer it takes to restore data using AOF files. AOF rewrite can solve the problem of AOF file swelling.

What is AOF rewriting?

Redis creates a new AOF file to replace the existing AOF file. The old and new AOF files hold the same data, but the new AOF file does not contain redundant commands, so the new AOF file is usually much smaller than the old AOF file.

AOF rewrite process

AOF rewrite does not require manipulation of the current AOF file.

  1. The rewriteAof command gives birth to a child of the Redis server process, which reads the current server data and generates a command for each key/value pair to write to a new AOF file.
  2. Aof_rewrite_buf_blocks The server saves the write commands executed during the child process rewrite into the AOF rewrite buffer redisserver. aof_rewrite_buf_blocks, that is, every write command executed during the AOF rewrite process is saved into the AOF buffer and AOF rewrite buffer.
  3. When the child process completes the AOF rewrite, it sends a signal to the parent process.
  4. The parent process receives the signal from the child process, adds all the contents in the AOF rewrite buffer to the new AOF file, renames the new AOF file, and overwrites the existing AOF file; (The parent process is blocked from receiving client requests.)

Compare AOF and RDB

  1. The RDB stores the snapshot of the data at a certain moment and backs up the data in full.
  2. RDB’s BGsave command implements persistence through a multi-process copy-on-write mechanism.
  3. AOF saves write commands through buffer and then writes files, incremental backup;
  4. AOF updates more frequently than RDB. If AOF is enabled on the server, AOF files are used to restore data preferentially.