Redis basic data type

String

Can be used to store ints, floats, and strings.

The storage model

Take set Hello World as an example

key

Key is a string, and Redis implements a string type of its own: SDS. Hello is stored in SDS.

Advantages of SDS:

  1. Don’t worry about running out of memory, expand if needed.
  2. The time to get the string length is O(1) because the len attribute is defined.
  3. Space preallocation and lazy space release prevent multiple memory reallocation.
  4. Deciding whether to end is the len attribute, which can contain ‘\0’.

value

The value is stored in the RedisObject object. The value of the five basic data types in Redis is stored in the RedisObject object.

Underlying data structure

You can use the type key to query the storage type of a value

  1. Int: long integer that stores 8 bytes (long,2^63-1)
  2. Embstr: One of the SDS formats that stores strings of less than 44 bytes
  3. Raw: One of the SDS formats that stores strings larger than 44 bytes

Embstr, raw:

  1. Embstr RedisObject and SDS are contiguous and only need to allocate memory space in sequence. Raw RedisObject and SDS are not contiguous, so need to allocate memory space twice.
  2. Embstr is read-only and converted to RAW for storage if modified.

When the length is less than the threshold, whether the encoding will be automatically converted: No, encoding conversion is completed when the data is written, and the conversion process is irreversible, only from small memory encoding to large memory encoding (unless the value is reset).

Hash

Hash stores key-value pairs. The maximum value is 2^32-1

Differences between Hash and String:

  1. Hash aggregates all key-related values into one key to save storage space.
  2. Use only one key to reduce key conflicts.
  3. To obtain data in batches, only one command is required, reducing I/O consumption.

Disadvantages of Hash:

  1. Field cannot set an expiration time independently.
  2. Because fields are clustered together, they cannot be distributed to multiple nodes.

Underlying data structure

Ziplist (Compressed list)

A bidirectional linked list of contiguous memory

When to use Ziplist?

  1. The number of key-value pairs is less than 512.
  2. All key-value pairs have key and value strings of less than 64bytes.

If there is more than one conditional encoding type, it becomes hashTable

Hashtable

Array + linked list structure

As shown in the figure above, there are two hash tables in a value. Ht [0] stores data, but HT [1] is null. Redis uses HT [0] by default,h[1] does not initialize and allocate space. H [1] is used for rehash expansion

Rehash capacity
  1. Allocates memory space for character HT [1].
  2. Rehash all nodes on HT [0] to HT [1], recalculate the hash value and index, and place them in the corresponding position.
  3. When all data migration is complete, clear HT [0], set HT [1] to HT [0], and create a new HT [1].

Conclusion under

  1. The bottom layer of String uses INT,embstr,row
  2. Hash uses ziplist and Hashtable

list

Earlier versions: Ziplist storage is used when data is low, and linkList storage is converted to linkList storage when critical. After version 3.2, QuickList storage is used to store ordered data.

quicklist

Quicklist is a bidirectional list + array structure.

  1. Head: points to the head of a two-way list;
  2. Tail: points to the tail of a bidirectional list.
  3. Count: How many elements are stored in all ziplists;
  4. Len: The length of the bidirectional list, i.e. the number of nodes;

Set

It doesn’t store duplicate, unordered data if all the elements are integers it’s inset, more than 512 it’s hashTable if they’re not integers it’s HashTable

Application scenario: Like, check-in, punch card, commodity label

Zset

Ordered non-repeating sets. Each element has a score attribute, which can be sorted from small to large according to score. If score is the same, it can be sorted according to ASCLL code of key.

Data structure comparison

Underlying data structure

Ziplist is used by default. If the number of elements is greater than 128, or any member length is greater than 64 bytes, it is converted to skiplist+dict.

skiplist

An element is randomly selected as a level element with the next node pointing to the corresponding level

Application scenario: Baidu hot search, weibo hot search

Bitmap

Is a bit operation defined on a string. A byte consists of eight binaries.

Hyperloglogs

An imprecise cardinality statistic used to count the number of non-repeating elements in a set.

Geo

Store location information and have corresponding location operation API

Streams

A multicast, persistent message queue for publish-subscribe functionality.

conclusion

The transaction

Redis provides transaction capabilities that allow a group of commands to be executed together.

Features:

  1. When a transaction is enabled, the command sent by the client is not executed immediately, but placed in a queue.
  2. Execute in the order in which they are queued.
  3. It will not be affected by requests from other clients.
  4. Transactions cannot be nested. Multiple multi commands have the same effect.

Related commands:

  1. Multi (Start transaction)
  2. Exec (executes transactions)
  3. Discard (cancel transaction)
  4. Watch (watch)/unwatch(unwatch) : Monitors one or more keys to prevent them from being modified by other clients. If they are modified, the transaction is cancelled.

Rollback when an exception occurs:

  1. Exception occurred before exec execution: Command exception exists, transaction will not execute, and all commands will not be executed.
  2. Exception after exec execution: If an exception occurs at runtime, the transaction will not be rolled back and the command already executed will take effect. Does not satisfy the definition of atomicity.

The Lua script

Lightweight scripting language, batch command execution, atomicity

Advantages:

  1. Send multiple commands at once.
  2. Redis will execute the script as a whole and will not be affected by other client requests, guaranteedatomic.
  3. For complex combined commands, lua can be used to reuse commands.

Lua script cache

script load “return ‘hello world'”

>evalsha

Script timeout

The script has a default timeout of 5 seconds

Suspend the script

  1. Script kill: No script changes (set,del)
  2. Shutdown nosave: script has changed operation (set,del)

Why is Redis so fast

  1. Pure memory structure, time complexity O(1);
  2. Request single thread;
  3. Synchronous non-blocking I/O: multiplexing mechanism;

Single thread

After version 4.0, Redis introduced other threads to handle other things, such as cleaning up dirty data, useless connections, cleaning up big keys, etc.

Advantages:

  1. There are no costs associated with creating and destroying threads;
  2. Avoid CPU consumption associated with context switching
  3. Avoid contention issues before the thread, such as lock release, deadlock, etc.

Note: Since requests are single-threaded, do not run long commands in production, such as save,keys *,flushall,flushdb, which will block requests

multiplexing

Multiplexing: Multiplexing multiple TCP connections (sockets or channels) : multiplexing one or more threads

Q: What is the difference between SELECT and epoll?

A:

  1. Select uses undifferentiated polling for all FDS, and the number of FDS is limited (1024 32 bits, 2048 64 bits). Epoll is triggered by event listening. The number of FDS is unlimited. Epoll is efficient when the number of FDS is large, and select is better when the number of FDS is small.
  2. Select must be copied from the kernel space to the user space, which consumes a lot of performance. Epoll data is stored in memory shared by kernel space and user space, which is efficient.

Redis memory reclamation mechanism

LRU: deletes the least recently used LFU: deletes the least frequently used LFU. Deletes the LFU based on usage frequency. Random: deletes the LRU based on usage frequency

The LRU Redis

Each object in Redis has an LRU attribute of 24-bit bytes LRU_BITS, which records the last time the current object was accessed. The object is assigned a value when it is created and updated when it is accessed.

The time recorded by this LRU is not the current system time, but the redis server.lruclock global variable (self-timed update time), mainly to improve efficiency.

The LFU Redis

LRU_BITS When used as an LFU, the highest 16 bits record the access time and the lowest 8 bits record the access frequency. There is also a timer that is reduced when the object has not been accessed for a while.

Redis persistence mechanism

RDB

RDB is the default Redis persistence scheme (AOF is preferred when AOF is enabled). When certain conditions are met, data in the current memory is written to disks and a snapshot file dump. RDB is generated. Redis will use dup. RDB to recover data.

Automatic trigger

  1. Configured rules
  2. shutdown
  3. flushall

Manual trigger

  1. Save: generates the current memory snapshot, which blocks Redis. Redis cannot process other commands and blocks for a long time if there is a large amount of data.
  2. Bgsave :fork A child process produces a snapshot and does not record data after the fork. The blocking occurs during the fork phase, half of which is short

advantage

  1. RDB is a very compact file that is ideal for backup and recovery.
  2. When the RDB file is generated, the child process is forked. The main process does not need to do a lot of I/O operations.
  3. RDB recovers data faster than AOF

disadvantage

  1. RDB has no way to do real-time persistence, and BGSave requires forking child processes, making frequent execution expensive.
  2. If Redis fails, the latest unbacked data will be lost

If the data is important, use AOF to back it up.

AOF

It is disabled by default. Append the command for each change to the file. When Redis restarts, each command is executed backwards and forwards.

Mechanism for synchronizing to disk

Rewriting mechanism

To prevent AOF files from getting bigger, call bgrewriteAof

  1. When the AOF file exceeds the number of percent of the previous AOF file to be overwritten.
  2. When the AOF file reaches the minimum file size.

advantage

Aof synchronization frequency is much higher than RDB

disadvantage

  1. Aof files are larger than RDB files
  2. RDB performs better than AOF at high concurrency

Both mechanisms are used together. Redis uses AOF to recover data preferentially because AOF data is more complete.

Redis distributed

A master-slave replication

The primary node reads and writes data, while the secondary node can only read but cannot write data

disadvantages

Fixed data backup and some performance issues, but not high availability (the primary node is down and not available).

Sentinel

Start an odd number of Sentinel monitoring Redis nodes and monitor each other.

The basic process of electing a master: In Sentinel, a leader is elected based on raft protocol. The leader then selects a node to become the master according to the following conditions.

  1. If the node is disconnected from the sentry for too long, the election is directly excluded.
  2. View the priority. The item with a higher priority is selected.
  3. Priority Consistency View the replication offset.
  4. Offset Indicates the process ID. The minimum ID is selected.

Master sends the slaveof no one command to other nodes to make them independent, and then slaveof X.X.X.X.

disadvantages

  1. Data will be lost during the primary/secondary switchover.
  2. The single write function does not solve the problem of horizontal expansion.

shard

ShardingJedis

Consistency hash (hash ring)

Put all of the hash space into a circle, and the whole space goes clockwise, because it’s circular, 0 minus 2^32 minus 1. Put the data on the first node that has a hash value larger than the key.

Advantages:

  1. Consistent hashing solves the problem of redistributing the data of dynamically adding and subtracting nodes. It only affects the next adjacent node, and does not affect other nodes.
  2. However, when the number of nodes is small, the data distribution is not even, but after the introduction of virtual nodes, this problem is solved.
  3. Independent of other middleware, partitioning logic is customizable.

Concrete implementation: the node is put into the red black tree, when accessing the key value, calculate the key hash value, and then from the red black tree to find a larger than the value of the first node

The agent

The shard policy code is extracted into a common service, and all clients are connected to this proxy, which requests and forwards requests.

Twemproxy

Advantages: Stable and high availability Disadvantages: Dependent on other components and cannot be automatically transferred when a fault occurs.

Redis Cluster

Highly available, decentralized, clients can connect to any node. Nodes interact with each other in pairs, sharing information such as data fragments and node status. The client doesn’t care which node the data exists on, just the whole collection.

The data distribution

Redis Cluster does not use hashing or consistent hashing, but virtual slots.

Concrete implementation:

  1. Redis creates 16,384 virtual slots. Each node is responsible for a certain range of slots.
  2. When the object comes in, it first executes the crC16384 module on the key to obtain a slot value, and then drops the data into the redis node responsible for the slot.

Q: How do I get related data to fall on the same node? A: Add {hash tag} to the key. Redis calculates slot numbers by obtaining only the characters between {}.

Data migration

The relationship between keys and slots does not change. When a new node is created, data corresponding to this node slot is also migrated to this node.

High availability and principle of master/slave switchover

When a master hang occurs:

  1. The slave finds that its master is down.
  2. Increment currentEpoch and broadcast failover_request.
  3. When other nodes receive the message, only the master responds to check the validity and returns failover_ACK, sending only one ACK for each epoch.
  4. Initiated slave collection ACK;
  5. More than half become masters;
  6. Broadcast Pong to inform other cluster nodes.

conclusion

  1. No central architecture.
  2. Data is distributed to multiple nodes based on slot, and data is directly shared among nodes to dynamically adjust data distribution.
  3. Scalability, linear expansion to 1000 nodes, nodes can be added and removed dynamically.
  4. High availability, some nodes are unavailable, the cluster is still available. The slave can be added as a data copy of the standby to implement automatic failover.
  5. Reduce o&M costs and improve system scalability and availability.

Redis client

Jedsi

Prior to Spring Boot2. x, Jedsi was used by default

Disadvantages: Multithreading is not secure when using a connection. Solution: Connection pooling, creating a different connection for each request.

Connection pools: JedisPool(common connection pool),ShardedJedisPool(sharded connection pool),JedisSentinelPool(Sentinelpool)

Cluster Connection Principle

When using Jedis to connect to a Cluster, we only need to connect to any one or more redis group instance addresses. To avoid redirection errors during GET and set, we need to save the relationship between slot and Redis node, locally calculate the slot in which the key should be stored, and then obtain the corresponding Redis node.

How do I save the relationship between slot and Redis connection pool?

  1. When the program starts to initialize the cluster environment, it reads the node configuration in the configuration file. No matter how many nodes are in the configuration file, it only takes one and obtains the corresponding Redis instance.
  2. Obtain the virtual slot information corresponding to the Redis connection instance.
  3. Obtain all slot values based on virtual slot information
  4. Get the corresponding master node and build the hostAndPost object.
  5. Get the corresponding JedisPool information from the cache according to hostAndPort. If not, create and save.
  6. Store slot values and JedisPool into Map

    (key is the subscript of slot, value is the connection pool).
    ,jedispool>

When obtaining the slot value, the corresponding slot value is calculated according to the key, the corresponding jedisPool instance is obtained from the map, and the corresponding Jedis instance is obtained according to the jedisPool to complete the access work.

Redis distributed lock implementation

Lock: Use Hash(key lock name, value(thread Id, expiration time, wait time to acquire lock, reentrant times) to store the thread Id, expiration time, wait time to acquire lock. Check whether the key of the corresponding thread Id exists. If it does not, the lock is created directly; if it does, it checks whether it is a thread Id; if it is, the thread Id is reentrant; if it is not, the thread Id is waiting.

Release lock: Checks whether the thread Id of the key is the same as the current one. If the thread Id is the same, the reentrant count is reduced by one to 0 and the lock is released (DEL). If not, it doesn’t matter.

Lua scripts are used to implement lock adding and lock releasing to prevent unsafe problems in high concurrency scenarios.

Pipleline

The client needs to send multiple commands in batches. These commands are cached. When the size of these commands exceeds 8 meters, they are sent together. The server executes one by one and returns the results together.

Some scenarios, such as batch writing of data, have low requirements for real-time and success of results. Pipelines can be used.

Lettuce

Springboot 2.x Default client. No thread safety issues, Netty based, support all functions.

Redission

Provides a distributed lock API, implementation ideas as above. And has a watchdog function (periodic key renewal).

Data consistency

There are no transactions between Redis and mysql. When cached data needs to be modified, should the database or cache be modified first?

Update the database first, then delete the cache

Exception: The database is updated successfully but the cache fails to be deleted. Solution: Provide a retry mechanism

  1. If the cache fails to be deleted, the key to be deleted is sent to the message queue for deletion by the consumer. Disadvantages: Can cause code intrusion.
  2. Because database updates are written to binglog, we listen for binlog changes through a service (such as Ali’s Canal) and then delete the key on the client side. If the deletion fails, it is sent to the message queue.

Delete the cache first, then update the database

Exception: ABA problem occurs in concurrent condition. When USER A deletes the cache and does not update the database, user B queries the database again and writes the value to the cache. Solution: Delete the key again after data is written.

High concurrency problem

How do I find hot data

The client

Number of key counts

Question:

  1. Intrusion into client code
  2. You don’t know how many keys are stored, which could cause a memory leak
  3. Only the hotspot key of the current client can be collected

To create a statistics server, the client sends key information to the server to collect statistics.

The service side

Redis has a monitor command that monitors all commands executed by Redis.

Question:

  1. High concurrency has performance issues and is not suitable for long-term use
  2. You can only listen on one Redis node

The machine level

TCP packet capture.

Cache avalanche

A large amount of hot data expires at the same time due to the same expiration time, and the concurrency is high, causing all requests to fall into mysql.

Solution:

  1. Add mutex or use queues to allow only one thread to query the database for the same key.
  2. The cache is regularly updated in advance to avoid simultaneous invalidation.
  3. Random number is added to avoid simultaneous invalidation.
  4. No expiration time is set.

The cache to penetrate

Frequently query nonexistent data.

  1. The problem with caching non-existent values in Redis is that if it is a nasty attack, it will query different values every time. If it is stored in Redis, it will cause Redis to store a lot of junk data.
  2. Bloom filter. When the project starts, the data will be stored in the Bloom filter. When querying redIS, the bloom filter will be checked first to determine whether there is any existence. If not, the redis and database will not be queried.

Bloom filter

An array of 0,1, and gano doing a hash function.

How to quickly determine the existence of an element in a large amount of data (1 billion, disordered, non-repetitive, variable length)?

The principle of

If you have three hash functions, hash each element three times to get three positions, and mark the three positions as 1.

To determine whether an element e exists, suppose you have three hash functions. If you hash e three times to the length of the % array, you get three positions. If the three positions are all 1’s, then the element e probably exists, otherwise it certainly does not exist.

conclusion

  1. If the bloem filter determines that the element does not exist, it definitely does not exist.
  2. If a Bloom filter exists, it may exist

Disadvantage: No deletion, if an element is deleted in the database, there is no way to delete it in the Bloom filter. Workaround: Add a counter to the subscript of each location, which is 2 if it hits twice and -1 if it drops. The cuckoo filter already does this.