Trail source Java notes – Redis server

preface

Some time ago, I sorted out the knowledge points of Mysql database. In this blog, I sorted out the relevant knowledge points of Redis server. Redis can be a sharp tool for us to improve performance in Web applications.

For more information about Mysql, see the Java notes in my blog – Mysql Database

The message queue can be referenced in my blog post source Java notes – Redis Server

The body of the

Redis

Common application scenarios of Redis:

  • Implement cache system and in-memory database: session cache and full page cache
  • useredisTo set up the message queue
  • Leaderboards/counters,redisIt is better to increment or decrement numbers in memory.
  • Publish/subscribe, passredisTo achieve the circle of friends function

Advantages of Redis:

  • The vast majority of requested operations are purely memory operations.
  • The single-thread mode is adopted to avoid unnecessary context switches and race conditions. Here, single-thread means that the network request module only uses one thread (so there is no need to worry about concurrency safety), that is, one thread handles all network requests, while other modules still use multiple threads.
  • Using a non-blocking I/O multiplexing mechanism,RedisPipeline technology prevents requests from being blocked.
  • Using dynamic strings (SDS), reserving certain space for strings to avoid performance loss caused by memory reallocation caused by string concatenation and interception.

Cache invalidation policy

There are three main algorithms:

  • FIFO:First In First OutFirst in, first out. Determine how long the data has been stored, and the data farthest away from the present is eliminated first.
  • LRU:Least Recently Used, least recently used. Determine the most recently used data, and the most recent data is eliminated first. (Distance in time)
  • LFU:Least Frequently UsedAnd used least often. The data that has been used the least over a period of time will be eliminated first. (How many times)

Redis offers six data elimination strategies:

  • Volatile – LRU: Selects the least recently used data from a set with expiration time
  • Volatile – TTL: Selects data to be obsolete from a set with an expiration time
  • Volatile -random: Randomly selects data from a set whose expiration time has been set
  • Allkeys-lru: Cull the least recently used data from the dataset
  • Allkeys-random: Randomly select data from the dataset for elimination
  • No-enviction: disables data expulsion by default

There are three expiration strategies:

  • Time to delete: in the SettingskeyThe expiration time of thekeyCreate a timer and set the timer tokeyWhen the expiration date comes, yeskeyTo delete
  • Lazy to delete:keyDo not delete when expired, each time from the databasekeyTo check whether the expiration date, if expired, delete, returnnull.
  • Periodically delete: The deletion expires once in a whilekeyoperation

Redis typically uses timed deletion, periodic deletion, and Mercached uses lazy deletion only.

Redis persistence

Method 1: Snapshot RDB (default) Persistence Generates point-in-time snapshots of data sets within a specified interval. The following Settings will automatically save the data set once Redis meets the condition that at least 1000 keys have been changed within 60 seconds. Advantages and disadvantages of RDB:

  • Disadvantages: If the storage conditions are not met during the storage interval, a power failure or system crash may cause data loss.
  • Advantages: Fast data recovery.

Method 2: Synchronize all write operation commands executed by the AOF persistent record server to the data file and restore the data set by re-executing these commands when the server starts. (all instructions are stored in a text file) advantages and disadvantages of AOF

  • disadvantages: Logging every instruction to a text file is a huge dragredisThe data recovery speed is slow
  • advantage: Execution period ratiordpShort to prevent data loss caused by abnormal interval

Mode 3: Use the virtual memory

How to handle the outage

  • Create a regular task, one per hourRDBBackup files to a folder, and one per dayRDBThe files are backed up to another folder.
  • Ensure that snapshots are backed up with the corresponding date and time. This parameter is used each time a periodic task script is executedfindCommand to delete expired snapshots: for example, you can keep hourly snapshots within the last 48 hours and daily snapshots within the last month or two.
  • At least once a day, willRDBBack up outside of your data center, or at least back up to where you’re runningRedisServer outside of the physical machine.

Redis transactions

A Redis transaction is a collection of commands. Redis transactions can execute multiple commands at once with the following guarantees:

  • Batch operations are being sentEXECThe command is placed in the queue cache.
  • receivedEXECAfter the command is executed, the transaction is executed. Any command in the transaction fails to execute, and the remaining commands are still executed (without atomicity).
  • During transaction execution, command requests submitted by other clients are not inserted into the transaction execution command sequence.

A transaction goes through the following three phases from inception to execution:

  1. Start the transaction: The transaction starts firstRedisServer sendingMULTIThe command
  2. Command enqueue: Then send the commands that need to be processed in this transaction
  3. Perform transactions: Send lastEXECCommand indicates the end of the transaction command

Commands are not executed immediately; they are executed one by one only when the EXEC command is executed.

Transaction-related commands:

  • MULTI: enables a transaction
  • EXEC: Commits a transaction
  • DISCARD: DISCARD a transaction
  • WATCH: monitoring
  • QUEUED: Queue for adding commands to execution

Differences between Redis transactions and Mysql transactions

Redis Distributed lock There are three behaviors of a Redis distributed lock:

  • lockUse:setnxTo rob the lock, set the lock identifier to 1 to indicate that the lock is occupied.
  • unlockUse:setnxTo release the lock and set the lock identifier to 0 to indicate that the lock has been released.
  • Lock expired:expireAdd an expiration date to the lock in case it forgets to release,expireTime expiration returns 0.

Setnx and EXPIRE are atomic operations, and in practice lua scripts are used to ensure atomicity.

Redis publishes subscriptions

Redis publish subscription (PUB/SUB) is a message communication mode:

  • The sender (pub) Send a message
  • The subscriber (sub) Receiving messages

The Redis client can subscribe to any number of channels. The following figure shows channel Channel1 and the relationship between the three clients that subscribe to this channel — Client2, Client5, and Client1:

When a new message comes throughPUBLISHCommand sent to channelchannel1The message is sent to the three clients that subscribed to it:

Data types supported by Redis databases

  • String: a String, integer, or floating point number
  • Hash: An unordered Hash table containing key-value pairs
  • List: a linked List in which each node is a string
  • Set: unordered collector
  • Zset: Ordered set

String Redis uses dynamic strings:

  • There are no memory overflow problems caused by string changes
  • The time complexity of obtaining string length is 1
  • Space pre-allocated, inert space releasedfreeBy default, enough space is left to prevent multiple memory reallocations

Application Scenarios:

  • StringCache structure user information, count

Hash array + linked list based on some rehash optimization;

  1. ReidstheHashThe chain address method is used to handle collisions, and it does not use red-black tree optimization.
  2. Hash table node adopts single – linked list structure.
  3. rehashOptimization (Take a divide-and-conquer approach and divide the huge migration effort into each oneCURDTo avoid busy service.)

Rehash indicates that when the hash table load factor reaches the upper limit, the hash table automatically doubles the capacity (number of buckets) and redistributes the original objects to new buckets.

Application Scenarios:

  • Saving structure information can be partially retrieved without serializing all fields

List

Application Scenarios:

  • Such astwitterFollow list, fan list, etcRedisthelistStructure to implement
  • ListIs implemented as a bidirectional linked list, that is, can support reverse lookup and traversal

The internal implementation of a Set is a HashMap with a value of null, which actually computes the hash to quickly sort out the weights, which is why a Set provides the ability to determine whether a member is in the Set.

Application Scenarios:

  • Deduplicated scene, intersection (sinter), union (sunion), difference set (sdiff)
  • Such as common concern, common preferences, two degrees of friends and other functions

Zset uses HashMap and SkipList internally to keep data stored and ordered:

  • HashMapIt puts the members inscoreThe map,scoreIs the basis for sorting.
  • Skip list is stored in all the members, it maintains multiple Pointers to other nodes in each node, so as to achieve the purpose of fast access to nodes

Application Scenarios:

  • Implement delay queue

Jump table

  • Skip list is an extension based on ordered linked list.
  • Jumping represents the sacrifice of storage space for performance by building indexes (possibly multilevel indexes) in linked lists.
  • Indexes take up memory. The original linked list may store very large objects, while the index node only needs to store key values and a few Pointers, and does not need to store objects. Therefore, when the node itself is large or the number of elements is large, its advantages will be amplified, while its disadvantages can be ignored.

Redis cluster

Master-slave backup

RedisYou can run theSLAVEOFCommands or SettingsslaveofOption for a server to replicate (replicate) the other server, we call the replicated server the master server (master), and servers that replicate the master are called slave servers (slave).

Format: secondary saveof primary server

Redis can use master/slave synchronization and slave/slave synchronization:

  • RDB mirror synchronization: Perform this operation once for the active nodebgsaveAnd records subsequent modification operations to the memorybuffer, will be completed afterRDBThe file is fully synchronized to the replication node, and the replication node accepts the fileRDBThe image is loaded into memory.
  • AOP file synchronization: The synchronization process is completed by notifying the master node to synchronize the action records modified during the synchronization to the replication node for replay.

Bgsave The bgsave command is used to asynchronously save data of the current database to disks in the background. Both the save and bgsave commands call the rdbSave function, but each calls it differently:

  • saveDirect callrdbSaveBlocked,RedisThe main process, until the save is complete. The server cannot process any requests from the client while the main process is blocked.
  • bgsavetheforkProduces a child process, which is responsible for callingrdbSaveAnd sends a signal to the main process to inform the main process that the save is complete.RedisThe serverbgsaveClient requests can continue to be processed during execution.

The underlying principles of Redis clustering

  • shard: Automatically fragments data, eachmasterPut some data on it.
  • Consistent hash algorithm: provides 16384 slot points with the help of consistencyhashAlgorithm to determine which slot to place the data fragment in.

There are three stages of Redis clustering:

  • Primary/secondary replication: Reads and writes are separated.
  • Sentinel mode: Master and slave can be switched automatically, the system is more robust, higher availability.
  • Redis-cluster:redisDistributed storage, data decentralization.

Redis clustering solution:

  • The officialRedis-clusterplan
  • Twemproxy proxy solution,twemproxyIt’s a single point, it’s easy to put a lot of pressure on it, so it’s usually bondedkeepalivedTo realtwemproyThe high availability of
  • codisSharding is done based on the client

Redis clustering features

  • High availabilityIn:masterDowntime will automaticallyslavePromoted tomasterTo continue to provide services.
  • scalability: when a singleredisThis parameter is used when the memory is insufficientclusterFragment storage.

Redis clustering does not guarantee strong data consistency

  • Under certain conditions,RedisThe cluster may lose write commands that have been executed.
  • RedisAsynchronous replication: The primary node returns to the client immediately after processing the write command without waiting for the secondary node to complete the replication.

Redis Cluster Redis Cluster is an official Redis multi-node deployment solution. Six Redis Cluster instances are recommended, including three active nodes and three slave nodes.

In the Redis Cluster framework:

  • Redis clusterThe node will passmeetOperation to share information, each node knows which node is responsible for which range of slots.
  • The default isRedis clusterIn theredis-masterUsed to receive reads and writes, whileredis-slaveIs used for backup when a request is being made toslaveWhen initiated, it redirects directly to its counterpartkeyWhere themasterTo deal with.
  • If there isredis-clusterWhen the requirements for real-time data are not high, the system can passreadonlyThat will beslaveMake it readable and passslaveGet directly relevantkeyTo achieve read and write separation.

Cache and database consistency issues

CAP Principle The CAP principle means that a storage system that provides data services cannot simultaneously meet the following requirements:

  • C Data consistency: All applications can access the same data.
  • A Data availability: Any application can read and write access at any time.
  • P zone tolerance: The system can scale linearly across network partitions. (In layman’s terms, the scale of data is scalable)

In large sites, it’s often the case that you sacrifice C and choose AP. In order to minimize the impact of data inconsistency, various measures are taken to ensure data consistency.

  • Strong data consistency: Data from copies is always consistent in physical storage.
  • Data user consistency: Data copies stored in the physical storage may be inconsistent, but a consistent and correct data is returned to the user through error correction and verification.
  • Data consistency: Data stored physically may be inconsistent, and end user access may be inconsistent, but data will be consistent over time.

Redis is the ultimate data consistency because of the asynchronous replication nature of Redis.

Solutions to cache consistency:

  • Delayed dual-delete policy
  • Update the cache via message queues
  • throughbinlogTo synchronizemysqlThe database toredisIn the

Delayed dual-delete Policy A write operation performs the following operations:

  1. Make caching obsolete
  2. Rewrite database
  3. Hibernate for 1 second to flush the cache again

Next, we need to figure out why we should use the strategy of eliminating the cache first and writing the database later.

Write to the database and then update the cache: 1. Write to the database and then update the cache.

  • Thread A updates the database;
  • Thread B updates the database;
  • Thread B updates the cache;
  • Thread A updates the cache; (A network fluctuation)

A should update the cache earlier than B, but B updates the cache earlier than A because of network reasons. This leads to dirty data. In addition, this situation can only be resolved when the cache is invalid. In this case, services will be greatly affected.

2. Business direction (why phase-out cache instead of updating cache)

  • The purpose of caching is to improve the performance of read operations. If you frequently update the cache without a read operation, the performance will be wasted. Therefore, the cache generation should be triggered by the read operation, and the cache elimination strategy should be adopted during the write operation.
  • Sometimes we may do some other conversion while caching, but if it is changed immediately, it will also waste performance.

It is not a perfect solution, but it is the most reasonable method. It has the following special cases:

  1. Write request A performs write operations and deletes the cache.
  2. Read request B finds that the cache does not exist.
  3. Read request B to the database query to get the old value;
  4. Read request B writes the old value to the cache;
  5. Write request A writes the new value to the database. (Taking no action here will cause the database to be inconsistent with the cached data)

This leads to inconsistencies.

The delayed double deletion policy is used to solve the problem that data inconsistency may occur when the cache is eliminated before the database is written. In this case, write request A should sleep for one second and then the cache is eliminated again:

  • If the above method is adopted, data inconsistency will occur for nearly 1 second (less than 1 second – read request operation time) during the first write operation. After 1 second, cache flushing will be performed again. After the next read operation, database and cache data consistency will be guaranteed.
  • The 1 second mentioned here is used to ensure that the read request ends (usually several hundred ms), and the write request can delete the cache dirty data caused by the read request.

In addition, there is an extreme case: if the cache fails to flush the second time, the data and cache will always be inconsistent, so:

  • The cache is set to expire
  • Set up a retry mechanism or use message queuing to ensure that the cache is eliminated.

Updating cache through message queue The message queue middleware ensures the consistency between database data and cache data.

  • Asynchronous update cache is implemented to reduce the coupling of the system
  • But it destroys the timing of data changes
  • The cost is relatively high

Any changes made to the Mysql database at any time are recorded in the binlog; When data is added, deleted, or modified, the created database object is recorded in the binlog. Database replication is also based on the binlog to synchronize data:

  • inmysqlWhen the pressure is not high, the delay is low;
  • Completely decoupled from the business;
  • The timing problem is solved.
  • The cost is relatively large

Caching mechanisms

A high performance website usually uses a caching architecture, caching means high performance, space for time.

The difference between storage and caching

  • Storage requires that data be persistent and cannot be easily lost
  • Storage ensures the integrity of data structures, so data is required to support more data types

Four levels of cache

  • Client-side caching based on devices such as browsers
  • Based on theCDNAccelerated network layer cache throughCDNCan realize the page cache
  • Based on theNgnixRouting layer cache of load balancing components
  • Based on theRedisBusiness layer cache, etc

The business layer cache can be subdivided into three levels of cache:

  • Level 1 cache (session-level cache) : When a session is maintained, the data obtained by the query is stored in level 1 cache and retrieved from the cache for the next use.
  • Level 2 cache (application level cache) : Data from level 1 cache is stored in level 2 cache when the session is closed.
  • Level 3 cache (database level cache) : Can be implemented acrossjvmAnd realize data synchronization through remote call.

Common problems in the cache Cache avalanche A cache avalanche is a cache set that expires during a certain period of time. Solutions:

  • Set different failure periods for different records based on service characteristics.
  • In the case that the concurrency is not very high, lock queuing is used.
  • Add a corresponding cache flag to each cached data, record whether the cache is invalid, and update the data cache if the cache flag is invalid.

Cache preheating problem If a newly started cache system does not have any data, the system performance and database load will be stressed during the process of rebuilding the cache data. Solutions:

  • When the cache system starts, it loads hot data, such as metadata – a list of city names, category information, and so on.

Cache penetration Problem Cache penetration refers to querying data that does not necessarily exist in a database. If have maliciously attack, can use this loophole, cause pressure to database, crush database even. Even with UUID, it’s easy to find a KEY that doesn’t exist and attack it. Solutions:

  1. againwebWhen the server is started, data that is likely to be accessed concurrently frequently is written to the cache in advance.
  2. If the object queried from the database is empty, it is also cached, but the cache expiration time is set to be shorter, such as 60 seconds (avoid large nulls)keyTake up space in the cache).
  3. specificationkeyFor some of the well defined names, use the Bloom filter waykeyThe specification detects and filters malicious access requests.

Cache breakdown problem Cache breakdown refers to the fact that a key is very hot, and a large number of concurrent requests are concentrated on this point. When the key fails, the continuous large number of concurrent requests will Pierce the cache and directly request the database. Solutions:

  • Set a long life cycle or never expire for hot data.

Concurrency contention is a problem where multiple subsystems set a key at the same time. There are two main solutions:

  • A distributed lock: in this paper,redisthesetnxThe command
  • Leveraging message queues: Serialize parallel reads and writes for processing through message middleware