The most commonly used Redis data types are String, Hash, List, Set, SortedSet (fractional ordered Set).

Redis common data types

Today we are going to introduce the data types of Redis. By the way, we will talk about the ping pong of Redis:

String: the most basic data type, binary safety, the maximum storage length of a String of 1 gb, String can save any object, whether JPG images, or serialized objects can be saved!

If you want to count the number of daily visits to your website, how should you count them? If UserId+ date is used as the Key and the value is set to 0, the user will add the value of the Key to +1 every time the user accesses the Key.

String data structures:

Struct SDSHDR {// int len; //buf free space int free; // Data space char buf[]; }Copy the code

Hash: Look at the Hash data type, which consists of String elements suitable for storing objects

List: Lists, sorted by String insertion order, can store about 4 billion members. Lists can be used to display the latest messages. The more new messages are, the more immediately they are displayed

Set: an unordered Set of String elements, implemented by a Hash table, which cannot be repeated

Redis provides the operation of seeking intersection, union and difference sets, which can be very convenient to achieve such functions as common concerns and common preferences

SortedSet: Sort the members of the collection from smallest to largest by score

In fact, Redis also supports storage of many types, for counting HyperLogLog, for supporting the storage of geographic location information Geo

Redis massive data query for a fixed prefix Key

Let’s start with a script that inserts 20 million pieces of data into Redis:

for((i=1; i<=20000000; i++)); do echo "set k$i v$i" >> /tmp/redisTest.txt ; done;Copy the code

Key =kn,value=vn; / TMP; redisTest: TXT;

Vim/TMP /redisTest. TXT :set fileformat= DOS #Copy the code

In the form of pipe provided by Redis, it takes about 10 minutes to run redis, and the command passed into the file will fill data in batches:

The cat/TMP/redisTest. TXT | redis host IP - cli - h - p port - pipeCopy the code

The keys command looks like this:

Keys PATTERN: Searching for all keys that match a given pattern will return all matched keys at one time. If the number of keys is too large, the service will be jammed

So what to do?

SCAN commands are used to iterate over the database KEYS in the current database. They return only a few elements each time they are executed, so these commands can be used in a production environment without the problems associated with KEYS, which are used to deal with a large database. They can block the server for several seconds.

A cursor based iterator needs to continue the previous iteration based on the previous cursor. Start a new iteration with 0 as the cursor until a traversal is completed when the command returns 0. There is no guarantee that each execution will return a given number of elements, and fuzzy queries are supported. The number of returns at a time is not controllable and can only meet the count parameter with a high probability.

Although the keys may be duplicated, they only need to be duplicated in the application.

How does Redis implement distributed locking

This problem has already been discussed in my previous blog: “Implementing distributed Locks based on Redis”. Now let’s review, it is mainly a few instructions: SETNX key value: if the key does not exist, the system creates the key and assigns a value. The time complexity is O(1). The system returns 1 if the setting succeeds, and 0 if the setting fails.

If one thread successfully sets a Key, doesn’t the Key always exist, and no other thread can set it at all? EXPIRE Key seconds the EXPIRE Key seconds command is used to specify an expiration time for the Key. When the Key expires, it will be deleted automatically

As a result, we can get the pseudo-code as follows:

int status = redisService.setnx(key, "1");
if(status == 1){
    redisService.expire(key, expire);
    //TODO...
}
Copy the code

The problem with this is that if the program hangs after executing setnx, then the timeout is not set and a deadlock is created, so this is not desirable because the sexnx and timeout operations must be atomic. Since Redis2.2.6, The set command can merge the original SET and EXPIRE commands into one atomic operation:

SET key value[EX seconds][PX millisecond][NX][XX]
Copy the code

EX seconds: Sets the key expiration time to seconds

PX millisecond: Set the expiration time of the key to millisecond

NX: The key is set only if it does not exist

XX: The operation is performed only when the key already exists

SET: OK is returned when the operation is complete, otherwise nil is returned

Redis avalanche problem

Current electricity and hot data would do home page cache, the cache is generally regular task to refresh, or not to update the cache, timing task refresh, there is a problem, if the front page all the Key failure time is 12 hours, 12 noon break, I have a big zero activities to promote a large number of users, Let’s say 6,000 requests per second. The cache could have handled 5,000 requests per second, but all the keys in the cache were invalidated. At this point, all 6000 requests per second fall on the database, the database must not be able to withstand, the real situation may not react to the DBA directly hang. At this point, if there is no special solution to handle, the DBA is anxious to restart the database, but the database is immediately killed by new traffic. This is what I understand as a cache avalanche.

In simple terms, a large cache failure occurs at the same time, and suddenly Redis is not the same, so that this order of magnitude of requests directly to the database is almost catastrophic.

So how do you deal with that? When storing data in batches to Redis, it is good to add a random value to the expiration time of each Key, so as to ensure that data will not fail in a large area at the same time.

SetRedis (key, value, time+ math.random ()*10000);Copy the code

If Redis is clustered, evenly distributing hotspot data among different Redis libraries can also prevent all failures.

Or set hot data will never expire, update the operation on the update of the cache (such as operation and maintenance updated the home page goods, then you brush the cache, do not set the expiration time), e-commerce home page data can also use this operation, more safe practice.

Cache penetration and breakdown

Cache penetration is when data is not in the cache or the database and the user (hacker) keeps making requests. For example, the ID of our database is increased from 1. If we launch data with ID =-1 or data with a particularly large ID that does not exist, such continuous attacks will lead to great pressure on the database, which will seriously crash the database.

There are two solutions to cache penetration:

Method 1: Add verification at the interface layer, for example, user authentication. Verify parameters. If the verification is invalid, return directly.

Redis also uses an advanced Bloom Filter to prevent cache penetration. Its principle is also very simple, is to use efficient data structure and algorithm to quickly determine whether your Key exists in the database, if not, you return, you go to check DB refresh KV return;

A cache breakdown is a bit like a cache avalanche, but a cache avalanche is a massive cache failure that destroys the DB. The difference between cache breakdown and cache breakdown is that a Key is very hot, and it is constantly carrying a large number of requests and a large number of concurrent accesses to this point. When the Key fails at the moment, the continuous large concurrent accesses directly fall on the database, and the cache is broken at this Key point.

Cache breakdown solution:

Set the hotspot data to never expire. Or add a mutex and you’re done:

Public static String getData(String key) throws InterruptedException {// Query data from Redis. String result = getDataByKV(key); If (stringutils.isblank (result)) {try {if (reenlock. tryLock()) {result = getDataByDB(key); // check if (stringutils.isnotBlank (result)) {// insert into the cache setDataToKV(key, result); }} else {// get thread.sleep (100L); result = getData(key); }} finally {// unlock reenlock. unlock(); } } return result; }Copy the code

How does Redis do asynchronous queues

Using a list as a queue, RPUSH produces messages and LPOP consumes messages.

As shown in the figure, RPUSH produces messages and LPOP consumes messages, but when the messages are consumed, LPOP does not wait. Instead, LPOP returns immediately. The usual practice is to let the thread Sleep for a while before attempting LPOP. Is there a better way? Some:

BLPOP instruction: block until the queue has a message or times out

BLPOP key [key...]  timeoutCopy the code

However, this approach also has a disadvantage that only one consumer can be provided for consumption, so how to solve this problem?

Pub/SUB Theme subscriber model:

  • The sender (pub) sends the message and the subscriber (sub) receives the message
  • Subscribers can subscribe to a defined number of channels

Here’s a demonstration:

Disadvantages of the PUB/Sub subscriber pattern: the publication of messages is stateless and cannot be guaranteed to be reachable. To solve this problem, specialized message queue middleware such as Kafka must be used.

How does Redis persist

To ensure efficiency, Redis stores data in memory, but periodically writes updated data to disk or writes modification operations to additional record files to ensure data persistence.

Redis has two persistence strategies:

  • **RDB: ** Snapshot format is to directly save memory data to a dump file, periodic saving, saving policies.
  • **AOF: ** Saves all the commands to modify the Redis server in a file, a collection of commands. Redis is the snapshot RDB persistence mode by default.

When Redis restarts, it will use AOF files to restore datasets in preference, because AOF files usually hold more complete datasets than RDB files. You can even turn off persistence so that data is stored only while the server is running.

How RDB works: By default Redis is a binary file dump. RDB that persists data to disk in the form of snapshot “RDB”. When Redis needs to persist, it forks a child process that writes data to a temporary RDB file on disk. When the child process finishes writing the temporary file, it replaces the original RDB, which has the advantage of copy-on-write. RDB is perfect for disaster recovery. The downside of RDB is that if you need to minimize data loss in the event of a server failure, RDB is not for you. In addition, RDB performs full synchronization of memory data, which severely affects performance due to I/O

How AOF works:

Appendfsync yes appendfsync always # Write to the AOF file every time a data change occurs. Appendfsync everysec # synchronizes once per second, which is the default policy for AOF.Copy the code

AOF can be persistent throughout by enabling appendOnly Yes in the configuration. Every time Redis executes a command to modify data, it will be added to the AOF file. When Redis restarts, the AOF file will be read and played back to the last moment before Redis was shut down. The advantage of using AOF is that it makes Redis very durable. Different Fsync policies can be set. The default policy for AOF is Fsync once per second, and with this configuration, data loss is limited to one second in the event of an outage. The disadvantage is that the AOF file size is usually larger than the RDB file size for the same data set. Depending on the Fsync strategy used, AOF may be slower than RDB.

Comparison of the two types of persistence:

1. If you care deeply about your data, but can still afford to lose it for a few minutes, use RDB persistence only.

2. AOF appends every command executed by Redis to disk. Handling large writes will degrade Redis performance

Database backup and disaster recovery: Periodically generating RDB snapshots is very convenient for database backup, and RDB can recover data sets faster than AOF. Redis supports RDB and AOF at the same time. After the system restarts, Redis preferentially uses AOF to recover data to minimize data loss.

The save section in the configuration file above is mainly the condition that automatically triggers the backup, actively triggers the RDB persistence command:

SAVE instruction: block the Redis server process until the RDB file is created

BGSAVE directive: Fork out a child process to create the RDB file without blocking the server process

Log rewriting solves the problem of increasing AOF file sizes as follows

  • Call fork() to create a child process
  • The child process writes the new AOF to a temporary file, independent of the original AOF file
  • The main process keeps writing new changes to both memory and the original AOF
  • The master process receives the completion signal of the child’s rewriting of the AOF and synchronizes incremental changes to the new AOF
  • Replace the old AOF file with the new AOF file

The recovery process when RDB and AOF files coexist

Redis default persistence mode: RDB and AOF mixed persistence mode flow

BGSAVE for full persistence, AOF for incremental persistence. Because BGSAVE takes a long time, is not real-time enough, and can cause a lot of data loss problems, AOF is needed for incremental persistence.

Pipeline and master-slave synchronization

A Pipeline is similar to a Linux Pipeline, remember the 20 million data inserts we did earlier?

The cat/TMP/redisTest. TXT | redis host IP - cli - h - p port - pipeCopy the code

Redis is based on the request/response model, single request processing needs one response, so if batch data operation is needed, each data operation needs the process of request and response, then the IO load will become very high, in order to improve efficiency, Pipeline will batch execute instructions, that is, send multiple instructions at a time. Save multiple IO round trips (but only if there are no dependencies between batch instructions).

Principle of master-slave synchronization:

The BGSAVE image file is synchronized first, and the incremental data is synchronized during the period.

Fully synchronized process:

1. Salve sends sync to the Master

2. The Master starts a background process to save a snapshot of the data in Redis to a file

3. The Master caches write commands received during the snapshot

4. After writing the file, the Master sends the file to Salve

5. Replace the old AOF file with the new AOF file

6. The Master sends the collected incremental write commands to the Salve

Incremental synchronization process:

1. The Master receives the user’s operation command and determines whether to transmit it to the Slave

2. Append the operation record to AOF file

3, to propagate the operation to other slaves: ② Write instructions to the response cache

4. Send cached data to Slave

Redis Sentinel is used to solve the primary/secondary switchover problem when the Master fails. 1. Monitoring: Check whether the primary/secondary servers are running properly

2. Alerts: Send fault notifications to administrators or other applications through the API

3. Automatic failover: master/slave switchover

Gossip protocol

Find consistency in clutter

  • Each node communicates randomly with the other, and eventually all nodes agree on their states
  • Seed nodes periodically send a list of nodes to other nodes randomly along with messages to propagate
  • There is no guarantee that information will be passed to all nodes, but it will eventually converge

This protocol is used in the decentralized implementation of blockchain.

Redis cluster and consistent Hash

How to quickly find what you need from massive data? Sharding: Data is divided according to certain rules and stored on multiple nodes. The dynamic addition and subtraction of nodes cannot be achieved by conventional partitioning according to hash.

For example, userId mode 2 can distribute user data to two different database servers, but it is easy to have uneven data distribution, and it is difficult to achieve the dynamic increase and decrease of nodes.

What is a consistent Hash? In fact, modulo 2^32 and organize the hash space into virtual circles:

Use the Hash function with the same data Key to compute the Hash value

To find the corresponding service, just Hash the data and find the nearest node clockwise:

So what are the benefits of this?

Now assume that Node C is down, as shown below:

Even if Node C goes down, the nearest Node D will be found to maximize the stop loss.

What about adding servers?

If a new server is added, only a small part of the data is sent to change, because you still only need to find the nearest node store

Here are some of the drawbacks of consistent hashes:At this point, we will introduce virtual nodes to solve the problem of data skew