Redis basic data type
String
Can be used to store ints, floats, and strings.
The storage model
Take set Hello World as an example
key
Key is a string, and Redis implements a string type of its own: SDS. Hello is stored in SDS.
Advantages of SDS:
- Don’t worry about running out of memory, expand if needed.
- The time to get the string length is O(1) because the len attribute is defined.
- Space preallocation and lazy space release prevent multiple memory reallocation.
- Deciding whether to end is the len attribute, which can contain ‘\0’.
value
The value is stored in the RedisObject object. The value of the five basic data types in Redis is stored in the RedisObject object.
Underlying data structure
You can use the type key to query the storage type of a value
- Int: long integer that stores 8 bytes (long,2^63-1)
- Embstr: One of the SDS formats that stores strings of less than 44 bytes
- Raw: One of the SDS formats that stores strings larger than 44 bytes
Embstr, raw:
- Embstr RedisObject and SDS are contiguous and only need to allocate memory space in sequence. Raw RedisObject and SDS are not contiguous, so need to allocate memory space twice.
- Embstr is read-only and converted to RAW for storage if modified.
When the length is less than the threshold, whether the encoding will be automatically converted: No, encoding conversion is completed when the data is written, and the conversion process is irreversible, only from small memory encoding to large memory encoding (unless the value is reset).
Hash
Hash stores key-value pairs. The maximum value is 2^32-1
Differences between Hash and String:
- Hash aggregates all key-related values into one key to save storage space.
- Use only one key to reduce key conflicts.
- To obtain data in batches, only one command is required, reducing I/O consumption.
Disadvantages of Hash:
- Field cannot set an expiration time independently.
- Because fields are clustered together, they cannot be distributed to multiple nodes.
Underlying data structure
Ziplist (Compressed list)
A bidirectional linked list of contiguous memory
When to use Ziplist?
- The number of key-value pairs is less than 512.
- All key-value pairs have key and value strings of less than 64bytes.
If there is more than one conditional encoding type, it becomes hashTable
Hashtable
Array + linked list structure
As shown in the figure above, there are two hash tables in a value. Ht [0] stores data, but HT [1] is null. Redis uses HT [0] by default,h[1] does not initialize and allocate space. H [1] is used for rehash expansion
Rehash capacity
- Allocates memory space for character HT [1].
- Rehash all nodes on HT [0] to HT [1], recalculate the hash value and index, and place them in the corresponding position.
- When all data migration is complete, clear HT [0], set HT [1] to HT [0], and create a new HT [1].
Conclusion under
- The bottom layer of String uses INT,embstr,row
- Hash uses ziplist and Hashtable
list
Earlier versions: Ziplist storage is used when data is low, and linkList storage is converted to linkList storage when critical. After version 3.2, QuickList storage is used to store ordered data.
quicklist
Quicklist is a bidirectional list + array structure.
- Head: points to the head of a two-way list;
- Tail: points to the tail of a bidirectional list.
- Count: How many elements are stored in all ziplists;
- Len: The length of the bidirectional list, i.e. the number of nodes;
Set
It doesn’t store duplicate, unordered data if all the elements are integers it’s inset, more than 512 it’s hashTable if they’re not integers it’s HashTable
Application scenario: Like, check-in, punch card, commodity label
Zset
Ordered non-repeating sets. Each element has a score attribute, which can be sorted from small to large according to score. If score is the same, it can be sorted according to ASCLL code of key.
Data structure comparison
Underlying data structure
Ziplist is used by default. If the number of elements is greater than 128, or any member length is greater than 64 bytes, it is converted to skiplist+dict.
skiplist
An element is randomly selected as a level element with the next node pointing to the corresponding level
Application scenario: Baidu hot search, weibo hot search
Bitmap
Is a bit operation defined on a string. A byte consists of eight binaries.
Hyperloglogs
An imprecise cardinality statistic used to count the number of non-repeating elements in a set.
Geo
Store location information and have corresponding location operation API
Streams
A multicast, persistent message queue for publish-subscribe functionality.
conclusion
The transaction
Redis provides transaction capabilities that allow a group of commands to be executed together.
Features:
When a transaction is enabled, the command sent by the client is not executed immediately, but placed in a queue.
- Execute in the order in which they are queued.
- It will not be affected by requests from other clients.
- Transactions cannot be nested. Multiple multi commands have the same effect.
Related commands:
- Multi (Start transaction)
- Exec (executes transactions)
- Discard (cancel transaction)
- Watch (watch)/unwatch(unwatch) : Monitors one or more keys to prevent them from being modified by other clients. If they are modified, the transaction is cancelled.
Rollback when an exception occurs:
- Exception occurred before exec execution: Command exception exists, transaction will not execute, and all commands will not be executed.
- Exception after exec execution: If an exception occurs at runtime, the transaction will not be rolled back and the command already executed will take effect. Does not satisfy the definition of atomicity.
The Lua script
Lightweight scripting language, batch command execution, atomicity
Advantages:
- Send multiple commands at once.
- Redis will execute the script as a whole and will not be affected by other client requests, guaranteed
atomic
. - For complex combined commands, lua can be used to reuse commands.
Lua script cache
script load “return ‘hello world'”
>evalsha
Script timeout
The script has a default timeout of 5 seconds
Suspend the script
- Script kill: No script changes (set,del)
- Shutdown nosave: script has changed operation (set,del)
Why is Redis so fast
- Pure memory structure, time complexity O(1);
- Request single thread;
- Synchronous non-blocking I/O: multiplexing mechanism;
Single thread
After version 4.0, Redis introduced other threads to handle other things, such as cleaning up dirty data, useless connections, cleaning up big keys, etc.
Advantages:
- There are no costs associated with creating and destroying threads;
- Avoid CPU consumption associated with context switching
- Avoid contention issues before the thread, such as lock release, deadlock, etc.
Note: Since requests are single-threaded, do not run long commands in production, such as save,keys *,flushall,flushdb, which will block requests
multiplexing
Multiplexing: Multiplexing multiple TCP connections (sockets or channels) : multiplexing one or more threads
Q: What is the difference between SELECT and epoll?
A:
- Select uses undifferentiated polling for all FDS, and the number of FDS is limited (1024 32 bits, 2048 64 bits). Epoll is triggered by event listening. The number of FDS is unlimited. Epoll is efficient when the number of FDS is large, and select is better when the number of FDS is small.
- Select must be copied from the kernel space to the user space, which consumes a lot of performance. Epoll data is stored in memory shared by kernel space and user space, which is efficient.
Redis memory reclamation mechanism
LRU: deletes the least recently used LFU: deletes the least frequently used LFU. Deletes the LFU based on usage frequency. Random: deletes the LRU based on usage frequency
The LRU Redis
Each object in Redis has an LRU attribute of 24-bit bytes LRU_BITS, which records the last time the current object was accessed. The object is assigned a value when it is created and updated when it is accessed.
The time recorded by this LRU is not the current system time, but the redis server.lruclock global variable (self-timed update time), mainly to improve efficiency.
The LFU Redis
LRU_BITS When used as an LFU, the highest 16 bits record the access time and the lowest 8 bits record the access frequency. There is also a timer that is reduced when the object has not been accessed for a while.
Redis persistence mechanism
RDB
RDB is the default Redis persistence scheme (AOF is preferred when AOF is enabled). When certain conditions are met, data in the current memory is written to disks and a snapshot file dump. RDB is generated. Redis will use dup. RDB to recover data.
Automatic trigger
- Configured rules
- shutdown
- flushall
Manual trigger
- Save: generates the current memory snapshot, which blocks Redis. Redis cannot process other commands and blocks for a long time if there is a large amount of data.
- Bgsave :fork A child process produces a snapshot and does not record data after the fork. The blocking occurs during the fork phase, half of which is short
advantage
- RDB is a very compact file that is ideal for backup and recovery.
- When the RDB file is generated, the child process is forked. The main process does not need to do a lot of I/O operations.
- RDB recovers data faster than AOF
disadvantage
- RDB has no way to do real-time persistence, and BGSave requires forking child processes, making frequent execution expensive.
- If Redis fails, the latest unbacked data will be lost
If the data is important, use AOF to back it up.
AOF
It is disabled by default. Append the command for each change to the file. When Redis restarts, each command is executed backwards and forwards.
Mechanism for synchronizing to disk
Rewriting mechanism
To prevent AOF files from getting bigger, call bgrewriteAof
- When the AOF file exceeds the number of percent of the previous AOF file to be overwritten.
- When the AOF file reaches the minimum file size.
advantage
Aof synchronization frequency is much higher than RDB
disadvantage
- Aof files are larger than RDB files
- RDB performs better than AOF at high concurrency
Both mechanisms are used together. Redis uses AOF to recover data preferentially because AOF data is more complete.
Redis distributed
A master-slave replication
The primary node reads and writes data, while the secondary node can only read but cannot write data
disadvantages
Fixed data backup and some performance issues, but not high availability (the primary node is down and not available).
Sentinel
Start an odd number of Sentinel monitoring Redis nodes and monitor each other.
The basic process of electing a master: In Sentinel, a leader is elected based on raft protocol. The leader then selects a node to become the master according to the following conditions.
- If the node is disconnected from the sentry for too long, the election is directly excluded.
- View the priority. The item with a higher priority is selected.
- Priority Consistency View the replication offset.
- Offset Indicates the process ID. The minimum ID is selected.
Master sends the slaveof no one command to other nodes to make them independent, and then slaveof X.X.X.X.
disadvantages
- Data will be lost during the primary/secondary switchover.
- The single write function does not solve the problem of horizontal expansion.
shard
ShardingJedis
Consistency hash (hash ring)
Put all of the hash space into a circle, and the whole space goes clockwise, because it’s circular, 0 minus 2^32 minus 1. Put the data on the first node that has a hash value larger than the key.
Advantages:
- Consistent hashing solves the problem of redistributing the data of dynamically adding and subtracting nodes. It only affects the next adjacent node, and does not affect other nodes.
- However, when the number of nodes is small, the data distribution is not even, but after the introduction of virtual nodes, this problem is solved.
- Independent of other middleware, partitioning logic is customizable.
Concrete implementation: the node is put into the red black tree, when accessing the key value, calculate the key hash value, and then from the red black tree to find a larger than the value of the first node
The agent
The shard policy code is extracted into a common service, and all clients are connected to this proxy, which requests and forwards requests.
Twemproxy
Advantages: Stable and high availability Disadvantages: Dependent on other components and cannot be automatically transferred when a fault occurs.
Redis Cluster
Highly available, decentralized, clients can connect to any node. Nodes interact with each other in pairs, sharing information such as data fragments and node status. The client doesn’t care which node the data exists on, just the whole collection.
The data distribution
Redis Cluster does not use hashing or consistent hashing, but virtual slots.
Concrete implementation:
- Redis creates 16,384 virtual slots. Each node is responsible for a certain range of slots.
- When the object comes in, it first executes the crC16384 module on the key to obtain a slot value, and then drops the data into the redis node responsible for the slot.
Q: How do I get related data to fall on the same node? A: Add {hash tag} to the key. Redis calculates slot numbers by obtaining only the characters between {}.
Data migration
The relationship between keys and slots does not change. When a new node is created, data corresponding to this node slot is also migrated to this node.
High availability and principle of master/slave switchover
When a master hang occurs:
- The slave finds that its master is down.
- Increment currentEpoch and broadcast failover_request.
- When other nodes receive the message, only the master responds to check the validity and returns failover_ACK, sending only one ACK for each epoch.
- Initiated slave collection ACK;
- More than half become masters;
- Broadcast Pong to inform other cluster nodes.
conclusion
- No central architecture.
- Data is distributed to multiple nodes based on slot, and data is directly shared among nodes to dynamically adjust data distribution.
- Scalability, linear expansion to 1000 nodes, nodes can be added and removed dynamically.
- High availability, some nodes are unavailable, the cluster is still available. The slave can be added as a data copy of the standby to implement automatic failover.
- Reduce o&M costs and improve system scalability and availability.
Redis client
Jedsi
Prior to Spring Boot2. x, Jedsi was used by default
Disadvantages: Multithreading is not secure when using a connection. Solution: Connection pooling, creating a different connection for each request.
Connection pools: JedisPool(common connection pool),ShardedJedisPool(sharded connection pool),JedisSentinelPool(Sentinelpool)
Cluster Connection Principle
When using Jedis to connect to a Cluster, we only need to connect to any one or more redis group instance addresses. To avoid redirection errors during GET and set, we need to save the relationship between slot and Redis node, locally calculate the slot in which the key should be stored, and then obtain the corresponding Redis node.
How do I save the relationship between slot and Redis connection pool?
- When the program starts to initialize the cluster environment, it reads the node configuration in the configuration file. No matter how many nodes are in the configuration file, it only takes one and obtains the corresponding Redis instance.
- Obtain the virtual slot information corresponding to the Redis connection instance.
- Obtain all slot values based on virtual slot information
- Get the corresponding master node and build the hostAndPost object.
- Get the corresponding JedisPool information from the cache according to hostAndPort. If not, create and save.
- Store slot values and JedisPool into Map
,jedispool>
(key is the subscript of slot, value is the connection pool).
When obtaining the slot value, the corresponding slot value is calculated according to the key, the corresponding jedisPool instance is obtained from the map, and the corresponding Jedis instance is obtained according to the jedisPool to complete the access work.
Redis distributed lock implementation
Lock: Use Hash(key lock name, value(thread Id, expiration time, wait time to acquire lock, reentrant times) to store the thread Id, expiration time, wait time to acquire lock. Check whether the key of the corresponding thread Id exists. If it does not, the lock is created directly; if it does, it checks whether it is a thread Id; if it is, the thread Id is reentrant; if it is not, the thread Id is waiting.
Release lock: Checks whether the thread Id of the key is the same as the current one. If the thread Id is the same, the reentrant count is reduced by one to 0 and the lock is released (DEL). If not, it doesn’t matter.
Lua scripts are used to implement lock adding and lock releasing to prevent unsafe problems in high concurrency scenarios.
Pipleline
The client needs to send multiple commands in batches. These commands are cached. When the size of these commands exceeds 8 meters, they are sent together. The server executes one by one and returns the results together.
Some scenarios, such as batch writing of data, have low requirements for real-time and success of results. Pipelines can be used.
Lettuce
Springboot 2.x Default client. No thread safety issues, Netty based, support all functions.
Redission
Provides a distributed lock API, implementation ideas as above. And has a watchdog function (periodic key renewal).
Data consistency
There are no transactions between Redis and mysql. When cached data needs to be modified, should the database or cache be modified first?
Update the database first, then delete the cache
Exception: The database is updated successfully but the cache fails to be deleted. Solution: Provide a retry mechanism
- If the cache fails to be deleted, the key to be deleted is sent to the message queue for deletion by the consumer. Disadvantages: Can cause code intrusion.
- Because database updates are written to binglog, we listen for binlog changes through a service (such as Ali’s Canal) and then delete the key on the client side. If the deletion fails, it is sent to the message queue.
Delete the cache first, then update the database
Exception: ABA problem occurs in concurrent condition. When USER A deletes the cache and does not update the database, user B queries the database again and writes the value to the cache. Solution: Delete the key again after data is written.
High concurrency problem
How do I find hot data
The client
Number of key counts
Question:
- Intrusion into client code
- You don’t know how many keys are stored, which could cause a memory leak
- Only the hotspot key of the current client can be collected
To create a statistics server, the client sends key information to the server to collect statistics.
The service side
Redis has a monitor command that monitors all commands executed by Redis.
Question:
- High concurrency has performance issues and is not suitable for long-term use
- You can only listen on one Redis node
The machine level
TCP packet capture.
Cache avalanche
A large amount of hot data expires at the same time due to the same expiration time, and the concurrency is high, causing all requests to fall into mysql.
Solution:
- Add mutex or use queues to allow only one thread to query the database for the same key.
- The cache is regularly updated in advance to avoid simultaneous invalidation.
- Random number is added to avoid simultaneous invalidation.
- No expiration time is set.
The cache to penetrate
Frequently query nonexistent data.
- The problem with caching non-existent values in Redis is that if it is a nasty attack, it will query different values every time. If it is stored in Redis, it will cause Redis to store a lot of junk data.
- Bloom filter. When the project starts, the data will be stored in the Bloom filter. When querying redIS, the bloom filter will be checked first to determine whether there is any existence. If not, the redis and database will not be queried.
Bloom filter
An array of 0,1, and gano doing a hash function.
How to quickly determine the existence of an element in a large amount of data (1 billion, disordered, non-repetitive, variable length)?
The principle of
If you have three hash functions, hash each element three times to get three positions, and mark the three positions as 1.
To determine whether an element e exists, suppose you have three hash functions. If you hash e three times to the length of the % array, you get three positions. If the three positions are all 1’s, then the element e probably exists, otherwise it certainly does not exist.
conclusion
- If the bloem filter determines that the element does not exist, it definitely does not exist.
- If a Bloom filter exists, it may exist
Disadvantage: No deletion, if an element is deleted in the database, there is no way to delete it in the Bloom filter. Workaround: Add a counter to the subscript of each location, which is 2 if it hits twice and -1 if it drops. The cuckoo filter already does this.