The second part is the realization of single machine database

The database


Database on the server

  • All databases of the Redis server are stored in the redisServer.db array, and the number of databases is stored using the redisServer.dbnum property

Switching databases

  • The client switches between different databases by modifying the target database pointer to point to different elements in the redisServer.db array

Database key space

  • The database consists of two dictionary fields, dict, which holds key-value pairs, and Expires, which holds the expiration date of keys
  • Because databases consist of dictionaries, operations on databases are based on operations on dictionaries
  • The database key is always a string object, and the value can be any Redis object type, including string object, hash table object, collection object, list object, ordered collection object.

Set the expiration time of the key

  • The expires dictionary key points to a key in the database, and the value records the expiration time of the database key, a UNIX timestamp in milliseconds

Expiration key Deletion policy

  • Three different deletion strategies

    • Time to delete
      • When setting a key, create a timer to delete the key when the timer expires
      • Advantages: Memory friendly, can quickly release the memory occupied by expired keys
      • Disadvantages: Not time friendly, if there are many expired keys, it can take up a lot of CPU time, affecting server response time and throughput
    • Lazy to delete
      • Ignore the expired key, but each time a key is retrieved from the key space, the retrieved key is checked to see if it is expired, and if so, the key is deleted. If not, the key is returned
      • Advantages: Time – friendly, only when the expiration key is removed, the key will be deleted
      • Disadvantages: space unfriendly, large number of expired useless keys occupy memory, by the risk of memory leaks
    • Periodically delete
      • Every period of time, the program checks the database and deletes expired keys. The algorithm decides how many expired keys to delete and how many databases to check
      • Advantages: A compromise between the two strategies. Memory friendly, space friendly
      • The key is how do you decide how often and how often to perform a delete operation
  • Redis expiration key deletion policy

    • Redis uses periodic delete + lazy delete to ensure that the expiration key will be deleted. Reasonable use of CPU time and avoid memory space waste

    • Lazy delete: An expiration check is performed on the entered key before the command is executed

    • Periodic deletion: In a specified period of time, iterate multiple databases in the server, randomly check the expiration time of some keys from the expires dictionary in the database, and delete the expired keys.

Processing of expired keys by AOF, RDB, and replication functions

  • New RDB files generated by executing the SAVE command or BGSAVE command do not contain expired keys
  • The rewrite AOF file produced by executing the BGREWRITEAOF command does not contain expired keys
    • When an expired key is removed, the server appends a DEL command to the end of the existing AOF file to explicitly remove the expired key
  • When loading an RDB or AOF file, the saved keys are checked and expired keys are ignored.
  • The secondary server does not delete expired keys, but waits for the DEL command from the primary node. This unified and centralized deletion policy ensures data consistency between the primary and secondary servers.

Database notification

  • When the Redis command changes the database, the server sends database notifications to the client based on the configuration (PUB/SUB)
    • Key space notification: what command was executed for a key (SET/EXPIRE/DEL)
  • Key event notification: which keys executed a command (KEY1 / KEY2 / KEY3)

RDB persistence


Preknowledge: processes and child processes

As you can see, the code areas of the child and parent are shared, while the data areas and PCB blocks are copies of the parent.

The PID field in the child PCB is the NEW assigned sub-process PID, and the data set field is the data set address.

Parent and child processes can be executed in parallel. Don’t interfere.

RDB file creation and loading

  • RDB persistence records the state of the database by saving key-value pairs in the database, generating compressed binaries.

  • The creation process

    • The SAVE command blocks the server because the SAVE operation is performed directly by the server process
    • BGSAVE is performed by the child process, so the command does not block the server
  • The loading process

    • If the server has AOF persistence enabled, the server will use AOF files to restore the database state in preference
    • If AOF is closed, the server will use the RDB file to restore the database state (the former will lose less data).

Automatic interval saving

  • All save conditions set with the save option are saved in the server state. When any of the save conditions are met, the server automatically executes the BGSAVE command.

    #redis.conf format: Save time change times save 900 1 (change time within 900 seconds) save 300 10 save 60 10000 (change time within 60 seconds)Copy the code
    struct redisServer{
        struct saveparam *saveparams;	// Record the data of the save condition
        long long dirty;	// Modify the counter
        time_t lastsave;	// The last time the save was performed
    }
    Copy the code

RDB file structure

  • For different types of key-value pairs, RDB files use different ways to store them

AOF persistence


AOF (Append Only File) persistence implementation

  • RDB persistence records database state differences by saving key-value pairs in the database

  • AOF persistence records the state of the database by saving write commands executed by the Redis server

  • All commands in the AOF file are saved in the Redis command request protocol format (text protocol)

  • Command requests are stored in the AOF buffer and then periodically written and synchronized to the AOF file

    • Data is written to the buffer first because of a mismatch between memory and disk input/output speeds. The system providesfsync, fdatasyncTwo synchronization functions (system calls) that allow the operating system to immediately write buffer data to disk, reducing the impact of buffer loss due to downtime
  • An invalid appendfsync option has a significant impact on the security of AOF persistence and the performance of the Redis server

    • Always: Each event loop writes the contents of the AOF_buf buffer to a synchronized AOF file
    • Everysec (default) : After each event loop, determine if the last AOF was 1S, if so, write the contents of the AOF_buf buffer to the AOF file. So even if the outage fails, only 1S of data is lost in the cache.
    • No: It is up to the operating system when to synchronize buffer contents to AOF files

AOF file loading and data restoration

  • The server simply loads and re-executes the commands saved in the AOF file (using a pseudo-client) to restore the database to its original state.

AOF rewrite

  • In order to solve the problem of AOF volume expansion, AOF rewriting mechanism is provided. AOF overwriting can create a new AOF file that holds the same database state as the original AOF file, but is smaller
  • AOF override is an ambiguous name that requires the program to do nothing to load, analyze, or write existing AOF files. It is done by reading key-value pairs in the database.
  • The AOF rewrite is executed in a child process, at which point the server process can continue processing command requests
  • The child process has a copy of the server process data (data consistency issue), so if a new write request changes the database state during the rewrite, the current database state will be inconsistent with the rewritten AOF file state.
  • When the BGREWRITEAOF command is executed, the Redis server maintains an AOF rewrite buffer that records all write commands executed by the server while the child process creates a new AOF file. When the child process finishes creating a new AOF file, the server appends all the contents of the rewrite buffer to the end of the new AOF file, making the state of the data stored in the old and new AOF files consistent. The AOF file rewrite is then completed by replacing the old AOF file with the new one
    • The execution of the server during AOF rewrite
      • Execute client commands
      • Append the executed write command to the AOF buffer (keep the old AOF file intact)
      • Append the executed write command to the AOF rewrite buffer (to resolve data inconsistencies)

The event

Redis server is an event driver. The events handled by the server are classified into file events and time events

File events

  • File event handlers are based onReactor modelImplementation of network communication procedures
  • File event handlerIO multiplexingProgram to listen for multiple sockets at the same time. And associate the socket with different tasks based on what the socket is currently performingEvent handler
  • When the socket being listened to is ready to accept, read, write, or close, file events corresponding to the operation are generated, and the file event handler calls the event handler associated with the socket to handle these events.
  • File events are abstractions of socket operations that are generated each time a socket becomes acceptable, writeable, or reable
  • File events are classified into AE_READABLE events (read events) and AE_WRITEABLE events (write events)

  • A complete example of a client-server connection event

    • When the Redis server runs, associate the connection reply handler with the AE_READABLE event

    • When the Redis client initiates a connection, the listening socket will generate an AE_READABLE event that triggers the connection reply handler to execute. Handles establishing a connection to the client and associating the AE_READABLE event for the client socket with the command request handler

    • When a client makes a request to Redis, the client socket will generate AE_READABLE events, which will then be processed by the corresponding command request handler. Read the client command content, and pass to the corresponding program to execute.

    • When Redis is ready to give the client response data, the server associates the AE_WRITEABLE event with the command reply handler. When the client is ready to attempt to read the response data, the client socket generates an AE_WRITEABLE event that triggers the command reply processor to perform processing and return the prepared data to the client. When the reply is complete, the server disassociates the AE_WRITABLE event of the client socket from the command reply handler.

Time event

  • Time events are divided into periodic events and periodic events. Timed events arrive only once at a specified time, while periodic events arrive at intervals.

  • The server normally only executes the serverCorn function as a time event, and this is periodic (100ms at a time)

    The event implementation has three attributes: ID: time event global ID, when: event arrival time timeProc: event handler and an unordered linked list of event nodesCopy the code

Scheduling and execution of events

  • The relationship between file events and time events is cooperative, and the server takes turns handling both events without preemption.

  • The actual processing event of the time event usually arrives later than the set (because there is no way to interrupt the file event)

  • ServerCron is the primary Redis function for periodic events. Its work mainly includes

    • Update server statistics such as time and memory usage
    • Clean up database stale key-value pairs
    • Try AOF and RDB operations and so on

The client

  • The server state structure uses the Clients list to indicate how many client states are connected, and the newly added client states are placed at the end of the chain

  • The client state Flags property uses different flags to indicate the role of the client and the current state of the client

  • The input buffer records command requests sent by the client and is no larger than 1GB

  • The client uses argv and argc attributes to record the parameters and number of a command, while the CMD attribute records the implementation function of the command to be executed by the client

  • The maximum size of a fixed-size buffer is 16KB, while the maximum size of a variable-size buffer (consisting of multiple buffers linked by a linked list) cannot exceed a hard limit set by the server

  • There are two types of output buffer limits. If the output buffer size exceeds the hard limit set by the server, the client is shut down immediately. Besides; If the client consistently exceeds the soft limit set by the server for a certain period of time, the client will also be shut down.

    # set hard, Soft link command name client role hard link soft link soft link length client-output-buffer-limit normal 0 0 0 client-output-buffer-limit slave 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60Copy the code
  • The client is shut down because the network connection is closed. A command request was sent in case of a failure; Become the CLIENT KILL target; Idle time timeout; The size of the output buffer exceeds the limit.

The service side

  • The steps a command request goes through from send to completion:
    • The client sends the command request to the server
    • The server reads the command request and parses the command parameters
    • The command executor looks for the implementation function of the command based on the arguments, then executes the implementation function and returns the command reply
      • Perform preparatory actions, such as checking the format of a command; Whether the memory is enough; Whether the command is valid at this time; Check whether the transaction is enabled
      • Call the command implementation function
      • Perform subsequent operations: Change statistics, such as time spent; If YOU have AOF on you have to write to the buffer; If it is master, then data is also synchronized to the slave server
    • The server returns the command reply to the client
  • ServerCron function (executed every 100ms to maintain server-related resources and do statistics)
    • Update the server time cache
    • Update the LRU clock (idle time = LRU clock – the last time a key was accessed)
    • Update the number of commands executed by the server per second (throughput statistics); Update memory peak
    • Processing SIGTERM signal (interrupt signal)
    • Managing database resources (Check expiration key)
    • Write the contents of the AOF buffer to the AOF (checked each time the event loop occurs)
  • The steps the server goes through from being started to being able to handle client requests
    • Initialize the server state
    • Loading the server configuration
    • Initialize the server data structure
    • Restoring database State
    • Execute event loop