Trail source Java notes – Redis server
preface
Some time ago, I sorted out the knowledge points of Mysql database. In this blog, I sorted out the relevant knowledge points of Redis server. Redis can be a sharp tool for us to improve performance in Web applications.
For more information about Mysql, see the Java notes in my blog – Mysql Database
The message queue can be referenced in my blog post source Java notes – Redis Server
The body of the
Redis
Common application scenarios of Redis:
- Implement cache system and in-memory database: session cache and full page cache
- use
redis
To set up the message queue - Leaderboards/counters,
redis
It is better to increment or decrement numbers in memory. - Publish/subscribe, pass
redis
To achieve the circle of friends function
Advantages of Redis:
- The vast majority of requested operations are purely memory operations.
- The single-thread mode is adopted to avoid unnecessary context switches and race conditions. Here, single-thread means that the network request module only uses one thread (so there is no need to worry about concurrency safety), that is, one thread handles all network requests, while other modules still use multiple threads.
- Using a non-blocking I/O multiplexing mechanism,
Redis
Pipeline technology prevents requests from being blocked. - Using dynamic strings (
SDS
), reserving certain space for strings to avoid performance loss caused by memory reallocation caused by string concatenation and interception.
Cache invalidation policy
There are three main algorithms:
- FIFO:
First In First Out
First in, first out. Determine how long the data has been stored, and the data farthest away from the present is eliminated first. - LRU:
Least Recently Used
, least recently used. Determine the most recently used data, and the most recent data is eliminated first. (Distance in time) - LFU:
Least Frequently Used
And used least often. The data that has been used the least over a period of time will be eliminated first. (How many times)
Redis offers six data elimination strategies:
- Volatile – LRU: Selects the least recently used data from a set with expiration time
- Volatile – TTL: Selects data to be obsolete from a set with an expiration time
- Volatile -random: Randomly selects data from a set whose expiration time has been set
- Allkeys-lru: Cull the least recently used data from the dataset
- Allkeys-random: Randomly select data from the dataset for elimination
- No-enviction: disables data expulsion by default
There are three expiration strategies:
- Time to delete: in the Settings
key
The expiration time of thekey
Create a timer and set the timer tokey
When the expiration date comes, yeskey
To delete - Lazy to delete:
key
Do not delete when expired, each time from the databasekey
To check whether the expiration date, if expired, delete, returnnull
. - Periodically delete: The deletion expires once in a while
key
operation
Redis typically uses timed deletion, periodic deletion, and Mercached uses lazy deletion only.
Redis persistence
Method 1: Snapshot RDB (default) Persistence Generates point-in-time snapshots of data sets within a specified interval. The following Settings will automatically save the data set once Redis meets the condition that at least 1000 keys have been changed within 60 seconds. Advantages and disadvantages of RDB:
- Disadvantages: If the storage conditions are not met during the storage interval, a power failure or system crash may cause data loss.
- Advantages: Fast data recovery.
Method 2: Synchronize all write operation commands executed by the AOF persistent record server to the data file and restore the data set by re-executing these commands when the server starts. (all instructions are stored in a text file) advantages and disadvantages of AOF
- disadvantages: Logging every instruction to a text file is a huge drag
redis
The data recovery speed is slow - advantage: Execution period ratio
rdp
Short to prevent data loss caused by abnormal interval
Mode 3: Use the virtual memory
How to handle the outage
- Create a regular task, one per hour
RDB
Backup files to a folder, and one per dayRDB
The files are backed up to another folder. - Ensure that snapshots are backed up with the corresponding date and time. This parameter is used each time a periodic task script is executed
find
Command to delete expired snapshots: for example, you can keep hourly snapshots within the last 48 hours and daily snapshots within the last month or two. - At least once a day, will
RDB
Back up outside of your data center, or at least back up to where you’re runningRedis
Server outside of the physical machine.
Redis transactions
A Redis transaction is a collection of commands. Redis transactions can execute multiple commands at once with the following guarantees:
- Batch operations are being sent
EXEC
The command is placed in the queue cache. - received
EXEC
After the command is executed, the transaction is executed. Any command in the transaction fails to execute, and the remaining commands are still executed (without atomicity). - During transaction execution, command requests submitted by other clients are not inserted into the transaction execution command sequence.
A transaction goes through the following three phases from inception to execution:
- Start the transaction: The transaction starts first
Redis
Server sendingMULTI
The command - Command enqueue: Then send the commands that need to be processed in this transaction
- Perform transactions: Send last
EXEC
Command indicates the end of the transaction command
Commands are not executed immediately; they are executed one by one only when the EXEC command is executed.
Transaction-related commands:
- MULTI: enables a transaction
- EXEC: Commits a transaction
- DISCARD: DISCARD a transaction
- WATCH: monitoring
- QUEUED: Queue for adding commands to execution
Differences between Redis transactions and Mysql transactions
Redis Distributed lock There are three behaviors of a Redis distributed lock:
- lockUse:
setnx
To rob the lock, set the lock identifier to 1 to indicate that the lock is occupied. - unlockUse:
setnx
To release the lock and set the lock identifier to 0 to indicate that the lock has been released. - Lock expired:
expire
Add an expiration date to the lock in case it forgets to release,expire
Time expiration returns 0.
Setnx and EXPIRE are atomic operations, and in practice lua scripts are used to ensure atomicity.
Redis publishes subscriptions
Redis publish subscription (PUB/SUB) is a message communication mode:
- The sender (
pub
) Send a message - The subscriber (
sub
) Receiving messages
The Redis client can subscribe to any number of channels. The following figure shows channel Channel1 and the relationship between the three clients that subscribe to this channel — Client2, Client5, and Client1:
When a new message comes throughPUBLISH
Command sent to channelchannel1
The message is sent to the three clients that subscribed to it:
Data types supported by Redis databases
- String: a String, integer, or floating point number
- Hash: An unordered Hash table containing key-value pairs
- List: a linked List in which each node is a string
- Set: unordered collector
- Zset: Ordered set
String Redis uses dynamic strings:
- There are no memory overflow problems caused by string changes
- The time complexity of obtaining string length is 1
- Space pre-allocated, inert space released
free
By default, enough space is left to prevent multiple memory reallocations
Application Scenarios:
String
Cache structure user information, count
Hash array + linked list based on some rehash optimization;
Reids
theHash
The chain address method is used to handle collisions, and it does not use red-black tree optimization.- Hash table node adopts single – linked list structure.
rehash
Optimization (Take a divide-and-conquer approach and divide the huge migration effort into each oneCURD
To avoid busy service.)
Rehash indicates that when the hash table load factor reaches the upper limit, the hash table automatically doubles the capacity (number of buckets) and redistributes the original objects to new buckets.
Application Scenarios:
- Saving structure information can be partially retrieved without serializing all fields
List
Application Scenarios:
- Such as
twitter
Follow list, fan list, etcRedis
thelist
Structure to implement List
Is implemented as a bidirectional linked list, that is, can support reverse lookup and traversal
The internal implementation of a Set is a HashMap with a value of null, which actually computes the hash to quickly sort out the weights, which is why a Set provides the ability to determine whether a member is in the Set.
Application Scenarios:
- Deduplicated scene, intersection (
sinter
), union (sunion
), difference set (sdiff
) - Such as common concern, common preferences, two degrees of friends and other functions
Zset uses HashMap and SkipList internally to keep data stored and ordered:
HashMap
It puts the members inscore
The map,score
Is the basis for sorting.- Skip list is stored in all the members, it maintains multiple Pointers to other nodes in each node, so as to achieve the purpose of fast access to nodes
Application Scenarios:
- Implement delay queue
Jump table
- Skip list is an extension based on ordered linked list.
- Jumping represents the sacrifice of storage space for performance by building indexes (possibly multilevel indexes) in linked lists.
- Indexes take up memory. The original linked list may store very large objects, while the index node only needs to store key values and a few Pointers, and does not need to store objects. Therefore, when the node itself is large or the number of elements is large, its advantages will be amplified, while its disadvantages can be ignored.
Redis cluster
Master-slave backup
在Redis
You can run theSLAVEOF
Commands or Settingsslaveof
Option for a server to replicate (replicate
) the other server, we call the replicated server the master server (master
), and servers that replicate the master are called slave servers (slave
).
Format: secondary saveof primary server
Redis can use master/slave synchronization and slave/slave synchronization:
- RDB mirror synchronization: Perform this operation once for the active node
bgsave
And records subsequent modification operations to the memorybuffer
, will be completed afterRDB
The file is fully synchronized to the replication node, and the replication node accepts the fileRDB
The image is loaded into memory. - AOP file synchronization: The synchronization process is completed by notifying the master node to synchronize the action records modified during the synchronization to the replication node for replay.
Bgsave The bgsave command is used to asynchronously save data of the current database to disks in the background. Both the save and bgsave commands call the rdbSave function, but each calls it differently:
save
Direct callrdbSave
Blocked,Redis
The main process, until the save is complete. The server cannot process any requests from the client while the main process is blocked.bgsave
thefork
Produces a child process, which is responsible for callingrdbSave
And sends a signal to the main process to inform the main process that the save is complete.Redis
The serverbgsave
Client requests can continue to be processed during execution.
The underlying principles of Redis clustering
- shard: Automatically fragments data, each
master
Put some data on it. - Consistent hash algorithm: provides 16384 slot points with the help of consistency
hash
Algorithm to determine which slot to place the data fragment in.
There are three stages of Redis clustering:
- Primary/secondary replication: Reads and writes are separated.
- Sentinel mode: Master and slave can be switched automatically, the system is more robust, higher availability.
- Redis-cluster:
redis
Distributed storage, data decentralization.
Redis clustering solution:
- The official
Redis-cluster
plan - Twemproxy proxy solution,
twemproxy
It’s a single point, it’s easy to put a lot of pressure on it, so it’s usually bondedkeepalived
To realtwemproy
The high availability of codis
Sharding is done based on the client
Redis clustering features
- High availabilityIn:
master
Downtime will automaticallyslave
Promoted tomaster
To continue to provide services. - scalability: when a single
redis
This parameter is used when the memory is insufficientcluster
Fragment storage.
Redis clustering does not guarantee strong data consistency
- Under certain conditions,
Redis
The cluster may lose write commands that have been executed. Redis
Asynchronous replication: The primary node returns to the client immediately after processing the write command without waiting for the secondary node to complete the replication.
Redis Cluster Redis Cluster is an official Redis multi-node deployment solution. Six Redis Cluster instances are recommended, including three active nodes and three slave nodes.
In the Redis Cluster framework:
Redis cluster
The node will passmeet
Operation to share information, each node knows which node is responsible for which range of slots.- The default is
Redis cluster
In theredis-master
Used to receive reads and writes, whileredis-slave
Is used for backup when a request is being made toslave
When initiated, it redirects directly to its counterpartkey
Where themaster
To deal with. - If there is
redis-cluster
When the requirements for real-time data are not high, the system can passreadonly
That will beslave
Make it readable and passslave
Get directly relevantkey
To achieve read and write separation.
Cache and database consistency issues
CAP Principle The CAP principle means that a storage system that provides data services cannot simultaneously meet the following requirements:
- C Data consistency: All applications can access the same data.
- A Data availability: Any application can read and write access at any time.
- P zone tolerance: The system can scale linearly across network partitions. (In layman’s terms, the scale of data is scalable)
In large sites, it’s often the case that you sacrifice C and choose AP. In order to minimize the impact of data inconsistency, various measures are taken to ensure data consistency.
- Strong data consistency: Data from copies is always consistent in physical storage.
- Data user consistency: Data copies stored in the physical storage may be inconsistent, but a consistent and correct data is returned to the user through error correction and verification.
- Data consistency: Data stored physically may be inconsistent, and end user access may be inconsistent, but data will be consistent over time.
Redis is the ultimate data consistency because of the asynchronous replication nature of Redis.
Solutions to cache consistency:
- Delayed dual-delete policy
- Update the cache via message queues
- through
binlog
To synchronizemysql
The database toredis
In the
Delayed dual-delete Policy A write operation performs the following operations:
- Make caching obsolete
- Rewrite database
- Hibernate for 1 second to flush the cache again
Next, we need to figure out why we should use the strategy of eliminating the cache first and writing the database later.
Write to the database and then update the cache: 1. Write to the database and then update the cache.
- Thread A updates the database;
- Thread B updates the database;
- Thread B updates the cache;
- Thread A updates the cache; (A network fluctuation)
A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This leads to dirty data. In addition, this situation can only be resolved when the cache is invalid. In this case, services will be greatly affected.
2. Business direction (why phase-out cache instead of updating cache)
- The purpose of caching is to improve the performance of read operations. If you frequently update the cache without a read operation, the performance will be wasted. Therefore, the cache generation should be triggered by the read operation, and the cache elimination strategy should be adopted during the write operation.
- Sometimes we may do some other conversion while caching, but if it is changed immediately, it will also waste performance.
It is not a perfect solution, but it is the most reasonable method. It has the following special cases:
- Write request A performs write operations and deletes the cache.
- Read request B finds that the cache does not exist.
- Read request B to the database query to get the old value;
- Read request B writes the old value to the cache;
- Write request A writes the new value to the database. (Taking no action here will cause the database to be inconsistent with the cached data)
This leads to inconsistencies.
The delayed double deletion policy is used to solve the problem that data inconsistency may occur when the cache is eliminated before the database is written. In this case, write request A should sleep for one second and then the cache is eliminated again:
- If the above method is adopted, data inconsistency will occur for nearly 1 second (less than 1 second – read request operation time) during the first write operation. After 1 second, cache flushing will be performed again. After the next read operation, database and cache data consistency will be guaranteed.
- The 1 second mentioned here is used to ensure that the read request ends (usually several hundred ms), and the write request can delete the cache dirty data caused by the read request.
In addition, there is an extreme case: if the cache fails to flush the second time, the data and cache will always be inconsistent, so:
- The cache is set to expire
- Set up a retry mechanism or use message queuing to ensure that the cache is eliminated.
Updating cache through message queue The message queue middleware ensures the consistency between database data and cache data.
- Asynchronous update cache is implemented to reduce the coupling of the system
- But it destroys the timing of data changes
- The cost is relatively high
Any changes made to the Mysql database at any time are recorded in the binlog; When data is added, deleted, or modified, the created database object is recorded in the binlog. Database replication is also based on the binlog to synchronize data:
- in
mysql
When the pressure is not high, the delay is low; - Completely decoupled from the business;
- The timing problem is solved.
- The cost is relatively large
Caching mechanisms
A high performance website usually uses a caching architecture, caching means high performance, space for time.
The difference between storage and caching
- Storage requires that data be persistent and cannot be easily lost
- Storage ensures the integrity of data structures, so data is required to support more data types
Four levels of cache
- Client-side caching based on devices such as browsers
- Based on the
CDN
Accelerated network layer cache throughCDN
Can realize the page cache - Based on the
Ngnix
Routing layer cache of load balancing components - Based on the
Redis
Business layer cache, etc
The business layer cache can be subdivided into three levels of cache:
- Level 1 cache (session-level cache) : When a session is maintained, the data obtained by the query is stored in level 1 cache and retrieved from the cache for the next use.
- Level 2 cache (application level cache) : Data from level 1 cache is stored in level 2 cache when the session is closed.
- Level 3 cache (database level cache) : Can be implemented across
jvm
And realize data synchronization through remote call.
Common problems in the cache Cache avalanche A cache avalanche is a cache set that expires during a certain period of time. Solutions:
- Set different failure periods for different records based on service characteristics.
- In the case that the concurrency is not very high, lock queuing is used.
- Add a corresponding cache flag to each cached data, record whether the cache is invalid, and update the data cache if the cache flag is invalid.
Cache preheating problem If a newly started cache system does not have any data, the system performance and database load will be stressed during the process of rebuilding the cache data. Solutions:
- When the cache system starts, it loads hot data, such as metadata – a list of city names, category information, and so on.
Cache penetration Problem Cache penetration refers to querying data that does not necessarily exist in a database. If have maliciously attack, can use this loophole, cause pressure to database, crush database even. Even with UUID, it’s easy to find a KEY that doesn’t exist and attack it. Solutions:
- again
web
When the server is started, data that is likely to be accessed concurrently frequently is written to the cache in advance. - If the object queried from the database is empty, it is also cached, but the cache expiration time is set to be shorter, such as 60 seconds (avoid large nulls)
key
Take up space in the cache). - specification
key
For some of the well defined names, use the Bloom filter waykey
The specification detects and filters malicious access requests.
Cache breakdown problem Cache breakdown refers to the fact that a key is very hot, and a large number of concurrent requests are concentrated on this point. When the key fails, the continuous large number of concurrent requests will Pierce the cache and directly request the database. Solutions:
- Set a long life cycle or never expire for hot data.
Concurrency contention is a problem where multiple subsystems set a key at the same time. There are two main solutions:
- A distributed lock: in this paper,
redis
thesetnx
The command - Leveraging message queues: Serialize parallel reads and writes for processing through message middleware