preface
For the distributed series of topics, the overall purpose is to write about distributed messaging middleware, distributed storage middleware, and distributed frameworks. Distributed messaging middleware selects two of the most commonly used ones, which have been written previously for interested friends to look at.
- Distributed Messaging Middleware (1) : Getting started with Rabbitmq in high availability combat!
- Distributed messaging middleware (2) : Kafka system learning – cluster building and use, replication mechanism and real-time log statistics process
This article is to write distributed storage middleware Redis, about Redis must not be any programmer will feel strange, in fact, online writing Redis articles have been a lot, even can be said to be rampant, but the total feeling is too fragmented, so I still want to write about Redis system, of course, space is limited, This article is certainly not written about Redis, but I will try my best to carry out the important things systematically, so that the friends who do not know about Redis can be simple to start knocking code after reading, so that Redis has some friends who know about the missing.
I sorted out some relevant learning materials can be directly click to receive
-
The full Redis study Notes
-
Redis frequent interview questions 33
-
Redis little White starter guide
-
Redis of actual combat
All right, no more words, just sit tight and let’s go!
First, Redis data structure
Redis has five data structures as follows:
To Redis, all keys are strings.
1.String The value is a String
Is the most basic data type in Redis, one key corresponds to one value.
The String type is binary safe, meaning that a Redis String can contain any data. Such as numbers, strings, JPG images or serialized objects.
Use: get, set, del, INCr, decr, etc
127.0.0.1:6379> set hello world
OK
127.0.0.1:6379> get hello
"world"
127.0.0.1:6379> del hello
(integer) 1
127.0.0.1:6379> get hello
(nil)
127.0.0.1:6379> get counter
"2"
127.0.0.1:6379> incr counter
(integer) 3
127.0.0.1:6379> get counter
"3"
127.0.0.1:6379> incrby counter 100
(integer) 103
127.0.0.1:6379> get counter
"103"
127.0.0.1:6379> decr counter
(integer) 102
127.0.0.1:6379> get counter
"102"
Copy the code
Actual combat scene:
1. Cache: The classic usage scenario is to put common information, strings, pictures or videos and other information into Redis. Redis acts as the cache layer and mysql acts as the persistence layer to reduce the read and write pressure of mysql.
2. Counter: Redis is a single-thread model, one command is executed before the next one is executed, and the data can be dropped to other data sources one step at a time.
3. Session: Common scheme Spring Session + Redis realizes session sharing.
2.Hash
Is a Mapmap, indicating that the value itself is a key-value pair structure, such as value={{field1,value1},…… fieldN,valueN}}
All hash commands are hget, hset, and hdel starting with H
127.0.0.1:6379> hset user name1 hao
(integer) 1
127.0.0.1:6379> hset user email1 [email protected]
(integer) 1
127.0.0.1:6379> hgetall user
1) "name1"
2) "hao"
3) "email1"
4) "[email protected]"
127.0.0.1:6379> hget user user
(nil)
127.0.0.1:6379> hget user name1
"hao"
127.0.0.1:6379> hset user name2 xiaohao
(integer) 1
127.0.0.1:6379> hset user email2 [email protected]
(integer) 1
127.0.0.1:6379> hgetall user
1) "name1"
2) "hao"
3) "email1"
4) "[email protected]"
5) "name2"
6) "xiaohao"
7) "email2"
8) "[email protected]"
Copy the code
Actual combat scene:
1. Cache: intuitive, more space saving than String, maintenance of cache information, such as user information, video information, etc.
3. The linked list
List is simply a linked List (Redis uses the List of double-ended linked lists), which is ordered. Values can be repeated, and corresponding values can be extracted by subscript. Data can be inserted and deleted on the left and right sides.
The technique of using lists
- Lpush + lpop = Stack (Stack)
- Lpush +rpop=Queue
- Lpush + LTRIM =Capped Collection
- Lpush + brPOP =Message Queue
Use:
127.0.0.1:6379> lpush myList 12 ll ls mem (integer) 5 127.0.0.1:6379> lrange myList 0-1 1) "mem" 2) "ls" 3) "ll" 4) 5) "1" "2" 127.0.0.1:6379 >Copy the code
Actual combat scene:
1. Timeline: For example, the timeline of weibo. When someone posts a microblog, lpush is used to add to the timeline to display the new list information.
4. Set the Set
A collection type is also an element that holds multiple strings, but unlike a list, a collection of 1. Duplicate elements are not allowed. 2. Elements in the collection are unordered and cannot be obtained by index subscript. Support the operation between sets, you can take the intersection of multiple sets, union set, difference set.
Use: Commands start with s sset, srem, scard, smembers, sismember
127.0.0.1:6379> sadd mySet hao1 xiaohao Hao (integer) 3 127.0.0.1:6379> SMEMBERS mySet 1) "xiaohao" 2) "hao1" 3) "Hao" 127.0.0.1:6379> SISMEMBER mySet Hao (integer) 1Copy the code
Actual combat scene;
1. Tag: Add a tag to the user, or the user adds a tag to the message, so that those with the same tag or similar tag can recommend things or people to follow.
2. Like, or click on, favorites, etc., can be put in the set to achieve
5. Zset ordered set
The ordered set is necessarily related to the set and retains the feature that the set cannot have repeated members. The difference is that the elements in the ordered set can be sorted. It sets a score for each element as the basis of sorting.
(The elements of an ordered set cannot be repeated, but the score can be repeated, just as the student numbers of a class cannot be repeated, but the test scores can be the same).
Use: Ordered collection commands all start with Z zadd, zrange, zscore
127.0.0.1:6379> zadd myscoreset 100 hao 90 xiaohao (integer) 2 127.0.0.1:6379> ZRANGE myscoreset 0-1 1) "xiaohao" 2) "Hao" 127.0.0.1:6379> myscoreset hao" 100"Copy the code
Actual combat scene:
1. Leaderboards: An orderly collection of classic usage scenarios. For example, websites such as novel videos need to make a ranking of novel videos uploaded by users. The ranking can be scored according to the number of users’ attention, update time, word count and so on.
- The full Redis study Notes
2. Redis persistence mechanism
Redis has two persistence schemes: Redis DataBase (RDB) and Append Only File (AOF). If you want to quickly learn about and use RDB and AOF, you can skip to the bottom of the article to see the summary. This chapter will learn the key knowledge of Redis persistence through configuration files, the way to trigger snapshots, data recovery operations, command operation demonstration, and advantages and disadvantages.
1, RDB details
RDB is the default Redis persistence scheme. If a specified number of write operations are performed within a specified period of time, data in the memory is written to the disk. A dump. RDB file is generated in the specified directory. The Redis restart restores data by loading the dump. RDB file.
RDB from the configuration file Open the redis.conf file and find the corresponding content of SNAPSHOTTING
1.1 Configuring RDB Core Rules (Key)
save <seconds> <changes>
# save ""
save 900 1
save 300 10
save 60 10000
Copy the code
Explanation: Save < specified interval > < execute specified number of update operations >. If the conditions are met, the data in memory will be synchronized to the hard disk. The official factory default is 1 change in 900 seconds, 10 changes in 300 seconds, and 10000 changes in 60 seconds to write a snapshot of the data in memory to disk. If you don’t want to use the RDB scheme, you can open the comment save “”, the following three comments.
1.2 Specify the file name of the local database. The default file name is dump. RDB
dbfilename dump.rdb
Copy the code
1.3 Specify the directory for storing the local database. The default value is usually used
dir ./
Copy the code
1.4 Data compression is enabled by default
rdbcompression yes
Copy the code
Description: Configure whether to compress data when storing data to the local database. The default value is yes. Redis uses LZF compression, but it takes a little CPU time. If this option is disabled, the database file becomes large. You are advised to enable this function.
2. Trigger the RDB snapshot
2.1 perform a specified number of write operations at a specified time interval. 2.2 run the save (block, save snapshots only, wait for others) or bgsave (asynchronous) command. 2.3 run the flushall command to flushall data in the database. 2.4 running the shutdown command to shutdown the server and ensure that no data is lost Also is not big.
3. Restore data through the RDB file
Copy the dump. RDB file to the bin directory of the redis installation directory and restart the Redis service. In actual development, backup dump. RDB is usually selected considering the damage of physical machine disks. This can be seen in the following operation demonstration.
4. Advantages and disadvantages of RDB
Advantages: 1 Suitable for large-scale data recovery. 2 If services do not have high requirements on data integrity and consistency, RDB is a good choice.
Disadvantages: 1 Data integrity and consistency is not high because the RDB may have gone down during the last backup. 2 Backup takes up memory, because Redis will create a separate subprocess during backup, write data to a temporary file (at this time, the data in memory is twice as large as the original oh), and then replace the temporary file with the previous backup file. Therefore, it is reasonable for Redis persistence and data recovery to be carried out in the dead of night.
Operation demo
[root@itdragon bin]# vim redis.conf save 900 1 save 120 5 save 60 10000 [root@itdragon bin]# ./redis-server redis.conf [root@itdragon bin]#./redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> keys * (empty list or set) 127.0.0.1:6379> set Key1 Value1 OK 127.0.0.1:6379> set key2 value2 OK 127.0.0.1:6379> Set Key3 value3 OK 127.0.0.1:6379> set Key4 Value4 OK 127.0.0.1:6379> set key5 value5 OK 127.0.0.1:6379> set key6 value6 OK 127.0.0.1:6379> SHUTDOWN not connected> QUIT [root@itdragon bin]# cp dump.rdb dump_bk.rdb [root@itdragon bin]# ./redis-server redis.conf [root@itdragon bin]# /redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> FLUSHALL OK 127.0.0.1:6379> keys * (empty list or set) 127.0.0.1:6379> SHUTDOWN not connected> QUIT [root@itdragon bin]# cp dump_bk.rdb dump.rdb cp: overwrite `dump.rdb'? Y [root@itdragon bin]#./redis-server redis.conf [root@itdragon bin]#./redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> keys * 1) "key5" 2) "key1" 3) "key3" 4) "key4" 5) "key6" 6) "key2"Copy the code
Step 1: Vim modifies the persistence configuration time. If it is modified five times within 120 seconds, it will persist once. Step 2: Restart the service for the configuration to take effect. Step 3: Set 5 keys respectively. After two minutes, a dump. RDB file is automatically generated in the current bin directory. Set key6 to verify that shutdown can trigger an RDB snapshot. Step 4: Make a backup of the current dump. RDB (simulating online work). Step 5: Run the FLUSHALL command to flush the database (to simulate data loss). Step 6: Restart the Redis service and restore data….. Yi ???? (‘ ◔ ‸ ◔ `). The data is empty ???? This is because FLUSHALL also has the ability to trigger RDB snapshots. Step 7: Replace dump_bk. RDB with dump_bk. RDB and then Redis.
The RDB snapshot is triggered by the SHUTDOWN and FLUSHALL commands.
Other commands:
- Keys * matches all keys in the database
- Save Blocking triggers an RDB snapshot to back up data
- FLUSHALL flushes the entire Redis server (rarely used)
- (rarely used)
2, AOF details
AOF: Redis is disabled by default. It was invented to compensate for RDB’s shortcomings (data inconsistencies), so it takes the form of a log to record every write operation and appends it to a file. Redis will execute write instructions from front to back according to log file contents to complete data recovery.
1. Learn about AOF from the configuration file
Conf file and find the corresponding content of APPEND ONLY MODE. 1.1 Redis is disabled by default, you need to manually change no to yes
appendonly yes
Copy the code
1.2 Specify the local database file name. The default value is appendone.aof
appendfilename "appendonly.aof"
Copy the code
1.3 Specify log update conditions
# appendfsync always
appendfsync everysec
# appendfsync no
Copy the code
Description: Always: Synchronous persistence. Every time data changes, it is immediately written to the disk. Poor performance When data integrity is good (slow and secure) Everysec: it is recommended by factory default that data is recorded asynchronously once per second (default value) No: data is not synchronized
1.4 Configuring the Override triggering mechanism
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
Copy the code
This is triggered when the AOF file is twice the size after rewrite and the file is larger than 64M. Generally set to 3G, 64M is too small.
2. Trigger the AOF snapshot
Triggered according to the configuration file, it can be triggered every time, it can be triggered every second, and it can not be synchronized.
3. Restore data according to AOF files
In normal cases, copy the appendone. aof file to the bin directory of the redis installation directory and restart the Redis service. However, in actual development, data restoration fails because the appendone. aof file format is abnormal. You can run the redis-check-aof –fix appendone. aof command to rectify the fault. Experience from the following operation demonstration.
4. Rewriting mechanism of AOF
As mentioned earlier, AOF works by appending writes to a file, which becomes more and more redundant. So Redis is smart enough to add rewriting. When the AOF file size exceeds a set threshold, Redis compresses the contents of the AOF file.
How it works: Redis forks a new process, reads the data in memory, and writes it back to a temporary file. Instead of reading old files (you are so big, I still read you?? O (゚ д ゚) FLYING silly!) . Finally, replace the old AOF file.
Triggering mechanism: Triggers when the AOF file is twice the size of the last one after rewrite and the file is larger than 64M. Here “double” and “64M” can be changed through the configuration file.
5. Advantages and disadvantages of AOF
Advantages: Higher data integrity and consistency Disadvantages: As AOF records more content, files become larger and data recovery becomes slower.
Operation demo
[root@itdragon bin]# vim appendonly.aof appendonly yes [root@itdragon bin]# ./redis-server redis.conf [root@itdragon Bin]#./redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> keys * (empty list or set) 127.0.0.1:6379> set keyAOf valueAof OK 127.0.0.1:6379> FLUSHALL OK 127.0.0.1:6379> SHUTDOWN not connected> QUIT [root@itdragon bin]#./redis-server redis.conf [root@itdragon bin]#./redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> keys * 1) "keyAOf" 127.0.0.1:6379> SHUTDOWN not connected> QUIT [root@itdragon bin]# vim appendonly.aof fjewofjwojfoewifjowejfwf [root@itdragon bin]# ./redis-server Redis. Conf [root@itdragon bin]#./redis-cli -h 127.0.0.1 -p 6379 Could not connect to redis at 127.0.0.1:6379: Connection refused not connected> QUIT [root@itdragon bin]# redis-check-aof --fix appendonly.aof 'x 3e: Expected prefix '*', got: ' AOF analyzed: size=92, ok_up_to=62, diff=30 This will shrink the AOF from 92 bytes, with 30 bytes, to 62 bytes Continue? [y/N]: y Successfully truncated AOF [root@itdragon bin]# ./redis-server redis.conf [root@itdragon bin]# ./redis-cli -h 127.0.0.1 -p 6379 127.0.0.1:6379> keys * 1) "keyAOf"Copy the code
Step 1: Modify the configuration file to enable AOF persistent configuration. Step 2: Restart the Redis service and access the built-in Redis client. Step 3: Save the value, then simulate data loss and shut down the Redis service. Step 4: Restart the service and find that the data is recovered. One extra point: the tutorial shows that the FLUSHALL command is written to the AOF file, causing data recovery to fail. I installed Redis-4.0.2 and didn’t experience this problem). Step 5: Modify appendone. aof to simulate file exceptions. Step 6: The Redis service fails to restart. This also shows that RDB and AOF can exist at the same time, and AOF files are loaded first. Step 7: Verify the appendone.aof file. The Redis service is normal after being restarted.
Redis-check-aof: RDB: redis-check-rDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB: RDB
summary
-
Redis enables RDB persistence by default. If a specified number of write operations are performed within a specified period of time, data in the memory is written to the disk.
-
RDB persistence is suitable for large-scale data recovery but has poor data consistency and integrity.
-
Redis needs to manually enable AOF persistence. By default, write operation logs are appended to AOF files every second.
-
The data integrity of AOF is higher than that of RDB, but the record content is too much, which will affect the efficiency of data recovery.
-
Redis provides a slimming mechanism for rewriting large AOF files.
-
If you only intend to use Redis for caching, you can turn persistence off.
-
If you plan to use Redis persistence. It is recommended that both RDB and AOF be enabled. In fact, RDB is more suitable for data backup, leave a back-up. AOF is broken, RDB is broken.
-
Redis frequent interview questions 33
Three, four models of Redis
1. Single-machine mode
This is the easiest one to read.
Is to install a Redis, start up, business call. Specific installation steps and start steps do not need to be described, the online casually search to have.
Standalone is also useful in many scenarios, such as one where high availability is not necessarily guaranteed.
In fact, our service uses redis single machine mode, so let me change to Sentinel mode.
Talk about the pros and cons of single player.
Advantages:
- Simple deployment and zero cost.
- Low cost, no standby nodes, no additional overhead.
- High performance, no need to synchronize data, data consistency.
Disadvantages:
- The reliability is not very good, and a single node may break down.
- Single machine performance is limited by the processing power of the CPU, redis is single threaded.
Select the single-machine mode based on your service scenario. If high performance and reliability are required, the single-machine mode is not suitable.
2. Master/slave replication
Primary/secondary replication refers to the replication of data from one Redis server to other Redis servers.
The former is called the master node and the latter is called the slave node. The replication of data is one-way and can only go from the master node to the slave node.
The master/slave mode configuration is simple. You only need to configure the IP address and port number of the master node on the slave node.
Slaveof <masterip> <masterport> # exampleCopy the code
Start all services on the primary and secondary nodes. You can view the service connections between the primary and secondary nodes by viewing logs.
It is easy to think of a problem from the above, since master slave replication means that the data of master and slave are the same, there is a data redundancy problem.
In programming, redundancy is allowed for high availability and high performance. I hope you will take this into account when designing the system, so as not to save resources for the company.
For products that pursue extreme user experience, downtime is absolutely not allowed.
The master/slave mode is considered in many system designs. A master is attached to multiple slave nodes. When the master service breaks down, a new master node is elected to ensure high service availability.
Advantages of master-slave mode:
-
Once the master node goes down, a backup from the master node can come in at any time.
-
Expand the read capability of the primary node to share the read load of the primary node.
-
High availability cornerstone: In addition to the above, master-slave replication is the foundation upon which sentinel and cluster patterns can be implemented, so master-slave replication is the cornerstone of High availability in Redis.
There are also disadvantages, such as the data redundancy problem I just mentioned:
- Once the master node is down, the master node is promoted from the master node to the master node, and the master node address of the application side needs to be changed, and all the slave nodes need to be ordered to copy the new master node. The whole process requires manual intervention.
- The write capability of the primary node is limited by the stand-alone node.
- The storage capacity of the primary node is limited by the single node.
3. Sentinel mode
As mentioned earlier, in the master-slave mode, when the master node goes down, the slave node can take over from the master node and continue to provide services.
However, there is a problem. The IP address of the primary node has changed, and the application service still accesses the address of the original primary node.
Thus, the sentinel concept was introduced in Redis 2.8.
Based on replication, Sentry implements automated fault recovery.
As shown in the figure, sentinel node consists of two parts, sentinel node and data node:
- Sentinel node: The sentinel system consists of one or more sentinel nodes, which are special Redis nodes that do not store data.
- Data node: Both primary and secondary nodes are data nodes.
All data access to the Redis cluster is through the Sentinel cluster, which monitors the entire Redis cluster.
Once the redis cluster is found to have a problem, such as the primary node just mentioned, the secondary node will be overhead. However, when the address of the master node changes, the application service is not aware of it and does not need to change the access address, because it is the sentry that interacts with the application service.
Sentinel handles failover well and is a step up in high availability, but Sentinel also has other features.
Such as master node survival detection, master/slave health detection, master/slave switchover.
Redis Sentinel minimum configuration is one master, one slave.
Let’s talk about sentinel mode monitoring
Each Sentinel sends a PING command once per second to all its master servers, slave servers, and other Sentinel instances.
If an instance has taken longer than the value specified for down-after-milliseconds, it will be flagged as subjective offline by Sentinel.
If a primary server is marked as subjectively offline, all Sentinel nodes that are monitoring the primary server confirm once per second that the primary server is in fact subjectively offline.
If a master server is marked as subjective offline and a sufficient number of Sentinels (at least as many as specified in the configuration file) agree with this judgment within the specified time frame, then the master server is marked as objective offline.
In general, each Sentinel sends INFO commands to all master and slave servers it knows about every 10 seconds.
When a primary server is marked as objectively offline by Sentinel, Sentinel sends the INFO command once per second to all secondary servers that are offline, instead of once every 10 seconds.
Sentinel and other Sentinels negotiate the status of the master node, and if the master node is in SDOWN ‘state, the vote automatically selects the new master node. The remaining slave nodes are pointed to the new master node for data replication.
The objective offline status of the primary server is removed when a sufficient number of Sentinels agree that the primary server is offline. When the primary server returns a valid reply to the PING command for Sentinel, the subjective offline status of the primary server is removed.
Pros and cons of the Sentinel model
Advantages:
- The Sentinel mode is based on the master-slave mode, which has all the advantages of the master-slave mode.
- The master and slave can be switched automatically, making the system more robust and more available.
- Sentinel continuously checks whether the primary and secondary servers are running properly. Sentinel sends notifications to administrators or other applications via API scripts when a monitored Redis server has a problem.
Disadvantages:
- Redis is difficult to support online capacity expansion. For a cluster, online capacity expansion becomes complicated when the capacity reaches its maximum.
My task
I deployed the Redis service as shown in the figure above, with three sentinel nodes and three master/slave replication nodes.
Using Java jedis to access my Redis service, here is a simple demonstration code (not a project code) :
Public static void testSentinel() throws Exception {// MasterName Obtains an environment variable from the configuration. String masterName = "master"; Set<String> sentinels = new HashSet<>(); // The IP of sentinels.add("192.168.200,213:26379") is usually obtained from configuration files or environment variables. Sentinels. Add (192.168.200.214: "26380"); Sentinels. Add (192.168.200.215: "26381"); JedisSentinelPool pool = new JedisSentinelPool(masterName, sentinels); Jedis = pool.getResource(); // write the value to redis jedis.set("key1", "value1"); Jedis.get ("key1"); }Copy the code
The configuration file of the specific deployment is too long. If you need it, you can reply [Redis configuration] to get it.
Sounds like you’re having a hard time getting deployed on your second day.
In fact, now seems to be a so easy task, apply for a Redis cluster, configure their own. In the project, the use of Redis is changed, the previous use is one and two standalone nodes.
Knock it off.
Although the task of leadership is accomplished, it does not mean the road to learning Redis is over. Uncle Long, who loves learning, continued to study the cluster mode of Redis.
4. Cluster mode
Master/slave can’t fix fail-over, sentry can fix fail-over fail-over, so why cluster?
There are other issues that both master and slave and sentinel have not solved. There is a limit to the storage capacity of a single node, and there is a limit to the access capacity.
The Redis Cluster Cluster mode features high availability, scalability, distribution, and fault tolerance.
Cluster Working principle of the Cluster mode
Data is sharded to share data, and data replication and failover are provided.
In the previous two modes, data is stored on a single node, and there is a limit to the storage capacity of a single node. The cluster mode stores data in fragments. When a fragment reaches the upper limit, it is divided into multiple fragments.
How are data fragments divided?
The cluster’s key space is divided into 16384 slots (hash slots), which hash data into different shards.
HASH_SLOT = CRC16(key) & 16384
Copy the code
CRC16 is a cyclic checksum algorithm, here is not the focus of our research, you can have a look.
Bit operations are used here to get the modulo result, and bit operations are faster than modulo operations.
There is a very important question, why is it divided into 16384 slots? This question may be asked casually by the interviewer
How to check the data after the fragmentation, how to write?
Read requests are allocated to the slave node, and write requests are allocated to the master node. Data is synchronized from the master node to the slave node.
Read/write separation improves concurrency and performance.
How do you scale horizontally?
Master nodes can be expanded, and data migration is automatically completed in Redis.
When you add a master node, data migration is required, and the Redis service does not need to go offline.
For example, there are three master nodes, which means that the redis slot is divided into three segments, assuming that the three segments are 0 7000,7001 12000, 12001~16383 respectively.
Now a master node is added for service needs, and the four nodes together occupy 16,384 slots.
Slots need to be reassigned and data relocated, but services do not need to go offline.
Resharding of redis clusters is performed by redis internal management software, Redis – Trib. Redis provides all the commands for resharding, which redis-trib does by sending commands to nodes.
How do I do failover?
If the red node fails en route, the slave nodes under Master3 are elected to produce a master node. Replace the original faulty node.
This process is the same as sentinel mode failover.
- Redis little White starter guide
Redis penetration, avalanche and failure
1. Redis cache penetration
understand
- Focusing on
through
Access via Redis directly through mysql, usually a non-existent onekey
, in the database query asnull
. Each request falls on the database and is highly concurrent. The database will fail.
The solution
- Null can be set to the cache object for the key.
- Of course, you can also use the logical layer based on the obviously wrong key
validation
. - You can also analyze user behavior, whether it’s intentional requests or crawlers or attackers. Restrict user access.
- Others, such as using a Bloom filter (very large HashMap) first.
2. Redis cache avalanche
understand
- Avalanches, you know
Things are pouring in
Like an avalanche. In this case, the Redis cache collectiveMassive collective failure
In the case of high concurrency, the key suddenly accesses mysql on a large scale, causing the database to crash. Imagine the stateAging of population
. After that, when people are in their 70s or 80s, no one will work. The national labor force creates pressure.
The solution
- The usual solution is to add an expiration date after the key
The random number
, let the key fail evenly. - Consider using queues or locks to keep your program execution under pressure, although this option may affect concurrency.
- Hotspot data can be considered invalidation
Redis cache breakdown
understand
Cache breakdown refers to a key is very hot, in the continuous carrying large concurrency, large concurrency focused on this point to access, when the key in the moment of failure, continuous large concurrency will Pierce the cache, direct request database, like brute force breakdown.
- Penetration is different from penetration. Penetration means ideas
bypass
Redis makes the database crash. And breakdown you can think of asPositive just
Breakdown, a large-scale read and write operation on a key, usually for a large number of concurrent operations. This key makes so many requests to the database during a cache failure that it puts so much pressure on the database that it crashes. itSuch as
In the second kill scenario, the MAC of 10,000 yuan and the MAC of 100 yuan, the order of 100 yuan will certainly be overwhelmed, continuous requests (of course, the specific second kill has its own processing methods here is just an example). So a cache breakdown is a database crash caused by a large number of requests for a commonly used key.
The solution
- Mutex can be used to avoid a large number of requests falling on db at the same time.
- Bloom filter that determines whether a container is in a collection
- You can set the cache to never expire (for some cases)
- Do a good job of circuit breaker, downgrade, prevent system collapse.
- Redis of actual combat
Five, Bloom filter
1. Application scenario of Bloom filter
For example, there are the following requirements:
1, originally there were 1 billion numbers, now there are 100,000 numbers, to quickly determine whether the 100,000 numbers are in the 1 billion number library?
Solution 1: put a billion numbers into the database, database query, accuracy has, but the speed will be slower.
Solution 2: put 1 billion numbers into memory, such as Redis cache, here we calculate the memory size: 1 billion *8 bytes =8GB, through memory query, accuracy and speed have, but about 8GB of memory space, quite a waste of memory space.
②, contact with the crawler, there should be such a demand, need to crawler thousands of websites, for a new website URL, how do we judge whether we have crawled this URL?
There are two solutions, and obviously, neither is good.
③, the same as spam mailbox filtering.
So for such a large data set, how to accurately and quickly determine whether a certain data is in the large data set and does not occupy memory, Bloom filter emerged.
2. Introduction to Bloom filter
With these questions in mind, let’s take a look at what a Bloom filter is.
Bloom filter: A data structure consisting of a long string of binary vectors, thought of as a binary array. If it’s binary, it’s either a 0 or a 1, but the default value is always 0.
As follows:
① Add data
When we introduced the concept, we said that you can think of a Bloom filter as a container. How do you add data to a bloom filter?
When we add a key to the Bloom filter, we compute a value using multiple hash functions, and then set the square of the value to 1.
For example, if hash1(key)=1, change 0 to 1 in the second cell (the array is counted from 0), hash2(key)=7, set the eighth cell to 1, and so on.
(2) Determine whether the data exists?
Now that we know how to add data to a Bloom filter, how do we determine if the new data is in the Bloom filter?
It is simple, we just need to calculate the new data through the above custom hash function, and then see if the corresponding place is 1. If there is a case that is not 1, then we can say that the new data must not exist in the Bloom filter.
If, on the other hand, the values computed by the hash function are all 1’s, can we be certain that the data must exist in the Bloom filter?
The answer is no, because the results of multiple different data computed by the hash function will be repeated, so there will be a position where other data is set to 1 by the hash function.
We can draw a conclusion: The Bloom filter can determine that a certain data does not exist, but it cannot determine that certain data does exist.
(3) Advantages and disadvantages of Bloom filter
Advantages: The advantages are obvious, binary array, very little memory, and inserts and queries are fast enough.
Disadvantages: With the increase of data, the misjudgment rate will increase; There is also the inability to determine that the data must exist; Another important disadvantage is that you cannot delete data.
3, Redis to achieve bloom filter
(1), bitmaps,
We know that computers use binary bits as the basic unit of underlying storage, with one byte equal to eight bits.
For example, the “big” string consists of three characters. The corresponding ASCII codes of these three characters are 98, 105, and 103. The corresponding binary storage is as follows:
In Redis, Bitmaps provides a set of commands to manipulate each bit in a string like the one above.
1. Set values
setbit key offset value
Copy the code
We know that the binary representation of “b” is 0110 0010, we set the seventh bit (starting from 0) to 1, so 0110 0011 represents the character “C”, so the final character “big” becomes “cig”.
Second, get the value
gitbit key offset
Copy the code
Get the number of bitmaps whose range is 1
bitcount key [start end]
Copy the code
If not specified, it gets all the values of 1.
Note: Start and end specify the number of bytes, not the bit array subscripts.
(2), Redisson
Redis to achieve the bottom layer of bloom filter is through the bitmap data structure, as to how to achieve, here will not repeat the wheel, introduce the industry is more useful a client tool – Redisson.
Redisson is a library for manipulating Redis in Java programs. With Redisson, we can easily use Redis in programs.
Let’s construct a Bloom filter by Redisson.
package com.ys.rediscluster.bloomfilter.redisson; import org.redisson.Redisson; import org.redisson.api.RBloomFilter; import org.redisson.api.RedissonClient; import org.redisson.config.Config; public class RedissonBloomFilter { public static void main(String[] args) { Config config = new Config(); Config. UseSingleServer (.) setAddress (" redis: / / 192.168.14.104:6379 "); config.useSingleServer().setPassword("123"); // Construct Redisson RedissonClient Redisson = redisson.create (config); RBloomFilter<String> bloomFilter = redisson.getBloomFilter("phoneList"); Bloomfilter.tryinit (100000000L,0.03); bloomfilter.tryinit (100000000L,0.03) // Insert the number 10086 into the bloomfilter.add ("10086"); System.out.println(bloomfilter. contains("123456"))); //false System.out.println(bloomFilter.contains("10086")); //true } }Copy the code
This is a single-node Redis implementation, if the amount of data is relatively large, the expected error rate is very low, the memory provided by the single node is unable to meet, at this time you can use the distributed Bloom filter, also can use Redisson to achieve, here I will not do the code demonstration, you can have a try.
Guava tools
Finally, how to implement bloom filters without Redis.
The Guava toolkit, which I’m sure you’ve all used, is provided by Google, which also provides an implementation of bloom filters.
package com.ys.rediscluster.bloomfilter; import com.google.common.base.Charsets; import com.google.common.hash.BloomFilter; import com.google.common.hash.Funnel; import com.google.common.hash.Funnels; public class GuavaBloomFilter { public static void main(String[] args) { BloomFilter<String> bloomFilter = BloomFilter. Create (Funnels. StringFunnel (Charsets. UTF_8), 100000,0.01); bloomFilter.put("10086"); System.out.println(bloomFilter.mightContain("123456")); System.out.println(bloomFilter.mightContain("10086")); }}Copy the code
This article wrote this, about Redis usually used in the article are basically written, I hope to help you learn Redis.
All the stuff for the boys is in here
-
The full Redis study Notes
-
Redis frequent interview questions 33
-
Redis little White starter guide
-
Redis of actual combat
So, “like” and “follow” isn’t too much, is it, guys?
end