Persistence mechanism

What is persistence

The working mechanism of using permanent storage media to store data and restore the saved data at a specific time is called persistence.

To ensure the security of the memory data, Redis saves the data in the memory as a file to the hard disk. After the server restarts, the data on the hard disk is automatically restored to the REDis.

Redis supports two types of persistence:

Default snapshotting mode append-only file(aof)Copy the code

Snapshotting snapshot

This persistence is enabled by default. All data in redis is stored in a single file on hard disk (the backup file name is dump.rdb by default).

If the data is very large (10-20GB) it is not suitable to do this persistence operation frequently.

You can set the save location, and backup file name.

Dbfilename Specifies the snapshot filename. The default value is dump. RDB
```
Dir./ Storage location of the snapshot fileCopy the code
```

Other Configuration Items

Rdbchecksum yes Rdbchecksum yes Rdbchecksum yes Rdbchecksum yes Write stop-writing-on-bgsave-error yes if the main process stopsCopy the code

Manually Initiating a Snapshot

Method 1: Login status

Simply execute bgSave

Method 2: No login status

/redis-cli -a Password bgsave

Automatic execution (configuration files)

Data is dumped from the memory every N minutes or N write operations to form RDB files and compressed to the backup directory.
How to enable (default enabled, has its own trigger conditions)

Save 900 1 if more than one key is changed within 900 seconds, the snapshot is saved

Save 300 10 300 seconds When more than 10 keys are modified, snapshots are initiated

Save 60 10000 60 seconds More than 10000 keys are modified and snapshots are initiated

Note: You can disable the snapshot mode by masking the trigger condition.
Advantages of RDB

RDB is a compact binary file with high storage efficiency

RDB stores redis data snapshots at a certain point in time, which is very suitable for data backup, full replication and other scenarios

RDB can recover data much faster than AOF

Application: Perform bgSave backups on the server every X hours and copy RDB files to a remote machine for disaster recovery

RDB shortcomings

In RDB mode, whether the command is executed or the configuration is used, because the snapshot mode is performed at a certain interval, so if redis unexpectedly goes down, all changes since the last snapshot will be lost.

The BGSave directive forks the child process each time it runs, sacrificing some performance

There is no unified version of RDB file format among many versions of Redis, and it is possible that the data format between different versions of services cannot be compatible with images

append-only-file AOF

Disadvantages of RDB storage

The storage efficiency is low due to a large amount of data

Based on the snapshot concept, all data is read and written each time. When the amount of data is large, the efficiency is very low
The I/O performance of a large amount of data is low
Creating child processes based on fork incurs additional memory consumption
Data loss risks caused by downtime

Essence: To record each “write” command (add, modify, delete) in an independent log, and then execute the command in the AOF file again when restarting to recover data. In contrast to RDB, it can simply be described as the process of changing recorded data to record data generation.

AOF three data writing strategies

Appendfsync always forcibly writes to disk every time a write command is received. Slowest, but ensures full persistence. Not recommended

Appendfsync everysec // forces writes to disk once per second, a good compromise between performance and persistence, recommended.

Appendfsync no // Completely dependent on OS, best performance, no persistence guaranteed

How to open

Appendonly yes appendfsync everysec // Save the command file (the path can be specified) appendfilename appendone.aofCopy the code

Aof file rewrite

Problem: each command overwrites aOF once. If a key operation is performed 100 times, producing 100 lines, the aOF file will be very large.

For example, when the incr number operation is performed multiple times, the aof file stores the incr number command multiple times.

This will increase the size of the AOF file. We can rewrite the AOF file to compress the repeated commands into a single command.

For example, if you do incr number 10 times, compress it into set number 11

As commands write to AOF, files get bigger and bigger. To solve this problem, Redis introduced AOF rewriting to reduce file size.

Aof file rewrite is the process of converting data in the Redis process into write commands to synchronize to a new AOF file.

Simply put, it is to convert the execution results of several commands on the same data into the corresponding instructions of the final result data for recording.

Function:

Reduce disk usage and improve disk utilization

Improves the persistence efficiency, reduces the persistent write time, and improves I/O performance

Reduces the data recovery time and improves the data recovery efficiency

Manually execute the override command:

Login status: Enter bgrewriteaof

Not logged in:./bin/redis-cli -a Password bgrewriteaof
Perform automatic override conditions:

Auto-aof-rewrite-percentage 100 is overwritten when the file size increases by 100% compared to the last rewrite

// If the aof file is at least 64MB in size, rewrite auto-aof-rewrite-min-size 64MB

// Stop synchronizing aof no-appendfsync-on-rewrite yes

Redis transactions

A Redis transaction is a queue of command execution that wraps a series of predefined commands into a whole (a queue). When executing, they are added in sequence at one time without interruption or interference.

Redis vs. mysql transactions

	MySQL	Redis
open	start transaction	multi
statements	Common SQL	Ordinary command
failure	The rollback rollback	This cancellation
successful	commit	exec

Basic operation

Set zhao 1000 set Wang 2000 multi // enable transaction decrby zhao 100 // add transaction queue Actually execute mget Zhao Wang //900 //2100Copy the code

Mget zhao Wang //900 //2100 multi // enable transaction decyby Zhao 200 // Join transaction queue incrby wang 200 // Join transaction queue discard // Cancel transaction, Release transaction queue mget Zhao Wang //900 //2100Copy the code

A statement error can occur in two ways

1. Grammar problems

In this case, an error is reported and all statements cannot be executed

Mget zhao Wang //900 //2100 multi Decrby zhao 200 aghdsajd // Because of syntax error mget zhao Wang //900 //2100Copy the code

2. The grammar itself is correct, but the object of application is problematic.

For example, zadd operates on a list object and, after exec, executes the correct statement and skips inappropriate statements

Mget zhao Wang //900 //2100 multi decrby zhao 200 sadd Wang 200 This statement has no syntax error, but will fail to be executed. Wang is a string and operates on the set syntax. When exec // commits, it will be half successful, which is a bit of a violation of transaction atomicity. mget zhao wang //700 //2100Copy the code

Watch the lock

The watch command monitors one or more keys, and if one of them is changed (or deleted), subsequent transactions will not be executed,

Monitoring continues until the exec command (the commands in the transaction are executed after the exec command, and the monitored keys are automatically unwatched after the exec command is executed)

Scene: I’m buying a ticket

ticket -1 , money -100

There is only one ticket, and if the ticket is bought by someone else after multi and before exec —— becomes 0.

How to observe this situation and do not submit again.

Pessimistic thinking the world is full of danger, there must be someone and I grab, lock the ticket, only I can operate ------ pessimistic thinking no one and I grab, I just need to pay attention, if anyone changes the value of the ticket can ------ optimistic lockCopy the code

In Redis transactions, optimistic locks are enabled and only responsible for monitoring whether the key has been changed

Set ticket 1 Set money 100 watch ticket // Enable ticket monitoring. If the ticket value changes, // ok multi decr ticket decrby money 100 exec //nil NilCopy the code

A distributed lock

Business scenario: How to avoid the last item being purchased by more than one person at the same time? [Oversold problem]

Solution:

Set a public lock using setnx, using the return value characteristics of the setnx command (set failure if there is a value, set success if there is no value)

If the Settings are returned successfully, the system has the right to perform the next service operation

For return Settings that fail, there is no control, queue or wait

Release the lock by performing the DEL operation

$redis = new Redis();
$redis->connect('127.0.0.1');
$redis->auth('321612');

 / / lock
 $lock = $redis -> setnx('lock-num'.1);
 
 if($lock) {$num = $redis->get('num');
 	if($num>0) {$redis->decr('num');
 	}
 	$redis->del('lock-num');
 }
Copy the code

A deadlock

Depending on the distributed lock mechanism, the client breaks down when a user performs operations and the client has obtained the lock. How to solve it?

Analysis:

Because the lock operation is controlled by the user, there is a risk that the lock is not unlocked

The unlocking operation cannot be controlled only by the user. The system must provide the corresponding guarantee solution.

Solution:

Use expire to add a time limit to the lock key

expire key second

pexpire key milliseconds

Deletion policy

Stale data

Redis is a kind of memory level database, all data are stored in memory, memory data can obtain its state through TTL instruction

XX: Time-sensitive data

-1: indicates permanently valid data

-2: indicates expired data, deleted data, or undefined data

Data Deletion Policy

Objectives of the data deletion policy:

Finding a balance between memory usage and CPU usage can cause overall Redis performance to degrade, even causing server outages or memory leaks

Time to delete

Create a timer. When the key expires, the timer task deletes the key immediately

Advantages: saving memory, then delete, quickly release unnecessary memory occupation

Disadvantages: The CPU is under great pressure. No matter how high the CPU load is at this time, it will occupy THE CPU, which will affect the Redis server response time and instruction throughput

Summary: Trading processor performance for storage (time for space)

Lazy to delete

Data is not processed when it reaches the expiration date. The next time you access the data

If not, the data is returned

Delete it and return it does not exist

Advantages: Saves CPU performance and deletes the vm only when the vm must be deleted

Disadvantages: High memory pressure, long – term memory occupying data

Summary: Trading storage for processor performance (trading space for time)

Periodically delete

Periodical polling of time-sensitive data in redis library, random sampling strategy is adopted, and the deletion frequency is controlled by the proportion of expired data

Feature 1: CPU usage has a peak value, and the detection frequency can be customized

Feature 2: The memory pressure is not high. Cold data that occupies memory for a long time is cleared continuously

Summary: Periodically check storage space (randomly and critically)

Principle:

When Redis starts server initialization, it reads the value of configuration server.Hz, which defaults to 10

Server.hz times per second serverCron()

– “databasesCron ()

– “activeExpireCycle ()

ActiveExpireCycle () checks each Expires [*] one by one, 250ms/ server.Hz each time

For a certain Expires [*] test, W keys are randomly selected for detection

If the key times out, delete the key

If the number of keys removed in a round is greater than W*25%, the process is repeated

If the number of keys removed in a round <=W*25%, check the next Expires [*], cycle 0-15

W Value =ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP Attribute value

The current_db parameter is used to record which Expires [*] expires the activeExpireCycle() enters

If the activeExpireCycle() execution time expires, the next execution continues down from current_db

Deleting policy Comparison

	Time to delete	Lazy to delete	Periodically delete
memory	Save memory, no occupation	Heavy Memory usage	The memory is cleared periodically and randomly
CPU	Occupying CPU resources at all times with high frequency	Delay execution, high CPU utilization	It takes fixed CPU resources per second to maintain memory
conclusion	Trade time for space	Trade space for time	Random sampling, key sampling

Redis uses lazy delete internally and deletes periodically

Out of algorithm

What if I run out of memory when new data comes into Redis?

Redis uses memory to store data and calls freeMemoryIfNeeded() to check that memory is sufficient before executing each command.

If the memory does not meet the minimum storage requirements for newly added data, Redis temporarily deletes some data to clear storage space for the current instruction.

The strategy for cleaning up data is called an eviction algorithm.

Note:

The process of evicting data is not 100% likely to clean up enough usable memory and is repeated if unsuccessful.

When all data has been attempted, an error message will appear if memory cleanup requirements are not met.

The configurations related to data expulsion are affected

// Maximum available memory. Maxmemory // The proportion of physical memory occupied. The default value is 0, indicating that there is no limit. Maxmemory-samples maxMemory-samples maxMemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples Therefore, random data acquisition is adopted as the policy for deleting data to be detected // expulsion policy maxmemory-policy // Deletes selected data when the maximum memory is reachedCopy the code

There are three types of eviction strategies:

Detect volatile data (data sets that may expire server.db[I].expires)

Volatile – LRU: Selects the least recently used data for elimination

Volatile – lFU: Selects the data that has been used least recently

Volatile – TTL: Selects data to be obsolete

Volatile -random: Randomly selects data for elimination

Check full database data (all datasets server.db[I].dict)

Allkeys-lru: Pick the least recently used data for elimination

Allkeys-lfu: Select the data that has been used least recently

Allkeys-random: Randomly selects data for elimination

Discarding data expulsion

No-enviction: disables ejection Of data (default policy in Redis4.0), raising OOM (Out Of Memory) error

Example: Specific configuration

maxmemory-policy volatile-lru
Copy the code

Data expulsion policy configuration basis

Run the INFO command to output monitoring information, query the number of hits and misses cached, and adjust the Redis configuration based on service requirements

keyspace_hits

keyspace_misses
Copy the code

Server Configuration

Server setup

/ / set the server running in the form of the daemon daemonize yes | no / / bind the host address bind 127.0.0.1 / / set the server port port 6379 / / set up the database number 16 databasesCopy the code

The log configuration

/ / set to specify the server log level loglevel debug | verbose | notice | warning / / log filename logfile port number. The logCopy the code

Client Configuration

// Set the maximum number of client connections at a time. Maxclients 0 Redis closes new connections when the number of client connections reaches the upper limit. To disable this function, set it to 0 timeout 300Copy the code

Quick configuration of multiple servers

Import and load the specified configuration file information to quickly create redis instance configuration files with many common redis configurations and maintain include /path/ server-port number. ConfCopy the code

The advanced data type bitmaps

Function: Used for information state statistics

Basic operation

Setbit key offset Value Sets the bit value of the offset corresponding to the specified key. The value can be 1 or 0

setbit bits 0 1
Copy the code

Getbit key offset Gets the bit value of the offset corresponding to the specified key

getbit bits 0
//1

getbit bits 10
//0
Copy the code

Extend the operation

Movie website

Statistics on whether a movie is on demand at a given time each day

Count how many movies are shown on demand every day

Count how many movies are on demand per week/month/year

Figure out which movies weren’t on demand this year

Bitcount key [start end] Counts the number of 1s in the specified key

setbit 20200808 11 1
setbit 20200808 333 1
setbit 20200808 1024 1

setbit 20200809 44 1
setbit 20200809 55 1
setbit 20200809 1024 1

bitcount 20200808 
//3
bitcount 20200809
//3

setbit 20200808 6 1
bitcount 20200808 
//4
Copy the code

bitop op destKey key1 [key2…] Perform the join, union, non, xOR operation on the specified key bit by bit and save the result to the destKey

And: /

Or: and

Not:

Xor: xor

bitop or 08-09 20200808 20200809
Copy the code

The advanced data type HyperLogLog

Purpose: Used in radix statistics

Cardinality: The cardinality is the number of elements in the dataset after deduplication

Note:

Used for cardinality statistics, not collections, does not save data, only records data rather than specific data

The core of the algorithm is cardinality estimation, and the final value has some error

Margin of error: The cardinality estimate results in an approximation with a standard error of 0.81%

Space consumption is minimal, with each Hyperloglog key taking up 12K of memory for marking cardinality

The pfmerge command occupies 12 KB of storage space, regardless of the amount of data before merging

Basic operation

pfadd key element [element…] Add data

pfadd hll 1

pfadd hll 1

pfadd hll 1

pfadd hll 1

pfadd hll 2
pfadd hll 2
pfadd hll 3
Copy the code

pfcount key [key…] statistics

pfcount hll
//3
Copy the code

pfmerge destkey sourcekey [sourcekey…] Merge data

The advanced data type GEO

Purpose: It is used in geographical location information calculation

Basic operation

Add the coordinates of the geographic location

geoadd key longitude latitude member [longitude latitude member …]

geoadd geos 1 1 a

geoadd geos 2 2 b
Copy the code

Gets the coordinates of a geographic location

geopos key member [member…]

geopos geos a

geopos geos b
Copy the code

Calculate the distance between two positions

geodist key member1 member2 [unit]

geodist geos a b m

geodist geos a b km
Copy the code

Retrieves a set of geographic locations within a specified range according to the latitude and longitude coordinates given by the user

GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count] [ASC|DESC] [STORE key] [STOREDIST key]
```
Geoadd geos 1 1,1 geoadd geos 1 2,2 geoadd geos 1 3 1,3 geoadd geos 2 1 2 2 geoadd geos 2 3 3 Geoadd geos 3 1 3 1 geoadd geos 3 2 3 2 Geoadd geos 3 3 3 3 geoadd geOS 5 5 5 5 Georadius geos 1.5 1.5 90 km //1,2 /2 2 / / 1, 1 / / 2, 1Copy the code
```
Gets a set of geographic locations within a specified range based on a location stored in the location set

GEORADIUSBYMEMBER key member radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count] [ASC|DESC] [STORE key] [STOREDIST key]
```
Georadiusbymember administrators, 2 km / 180/1, 1/2/2, 1 / / / / / / 2, 1, 2, 2, 1/3 / / / / / 2, 3, 3, 2 / / 3, 3Copy the code
```
Returns the geohash value of one or more location objects

GEOHASH key member [member …]
```
Geohash administrators 2 / / s037ms06g70Copy the code
```

A master-slave replication

In order to reduce the load on each Redis server, you can set several more and use master slave mode

One server load “writes” (adds, modifies, deletes) data, the other server load “reads” data, and the primary server data is “automatically” synchronized to the secondary server.

Redis supports easy-to-use master-slave replication, which allows a slave server to become an exact replica of the master Server.

The role of master-slave replication:

Read/write separation: Master writes data and slave reads data, improving the read/write load of the server

Load balancing: Based on the master-slave structure, with read/write separation, the slave shares the master load and changes the number of slaves according to the change of demand. The data read load is shared by multiple slave nodes, greatly improving the concurrency and data throughput of redis server

Fault recovery: When the master fails, the slave provides services for rapid fault recovery

Data redundancy: Hot data backup is a data redundancy method other than persistence

High availability cornerstone: Based on master/slave replication, build sentinel mode and cluster to realize the high availability solution of Redis

Note:

Redis uses asynchronous replication, which cannot block the master and slave servers

A master server can have multiple slave servers. Not only can the master server have slave servers, but slave servers can also have their own slave servers

There are three phases of master/slave replication:

Establish connection phase

Data synchronization phase

Command propagation phase

Master slave communication process

The configuration steps

Prepare two VMS:

192.168.1.69 main 192.168.1.70 from

Master server configuration:

Bind 127.0.0.1 to #bind 127.0.0.1 protected-mode yes to protected-mode noCopy the code

Slave server configuration:

Use Slaveof to specify your role, master server address and IP

//slaveof Primary server IP port slaveof 192.168.1.69 6379Copy the code

Secondary server read only

// Starting from redis2.6, the slave server supports read-only mode, which is configured using the slave-read-only configuration item. This mode is the default mode of the slave server. slave-read-only yesCopy the code

Specifies the password for the secondary server to connect to the primary server

If the master server has a password set through the RequirePass option, configure the password to connect to the master server through Masterauth to allow the slave synchronization operation to proceed smoothly.
```
Masterauth 321612 (Master server password)Copy the code
```

** Verify ** : Run the info replication command on the secondary server to check whether the configuration is correct. Master_link_status :up If the configuration is successful, if the configuration is down, the configuration fails. ** Cancel ** : > simply mask the above configuration from the slave server. Master: * If the master has a large amount of data, avoid traffic peak hours during data synchronization to avoid blocking the master and affecting normal service execution. * If the size of the replication buffer is not set properly, data overflow may occur. If the full replication period is too long and data is already lost during partial replication, a second full replication must be performed, causing the slave to fall into the dead-loop state. * The memory usage of the master single machine should not be too large. You are advised to use 50%-70% of the memory of the host, and reserve 30%-50% of the memory for bgsave command execution and replication buffer creationCopy the code

// default 1mb repl-backlog-size 1mb

Slave: * You are advised to disable external services during full or partial replication to avoid server response congestion or data synchronizationCopy the code

slave-serve-stale-data yes|no

* During data synchronization, the master sends a message to the slave (ping). It can be understood that the master is a client of the slave and actively sends commands to the slave. * Multiple slaves request data synchronization to the master at the same time, and the number of RDB files sent by the master increases, causing a huge impact on bandwidth. If the master database has insufficient bandwidth, the data synchronization needs to be based on service requirements. ## command propagation phase * When the master database state is changed, the state of the master database is inconsistent with that of the slave database. In this case, the master database must be synchronized to a consistent state. The action of synchronization is called command propagation * The master sends the received data change command to the slave, and the slave executes the command after receiving the command. * Network disconnection occurs during command propagation * Network intermittent disconnection and intermittent connection Ignore * Short duration network interruption Partial replication * Long duration network interruption Full replication * Three core elements of partial replication * Run ID of the server * Replication backlog buffer of the primary server * Replication offset of the primary and secondary servers ## Common problems with master/slave replication ** Frequent full replication ** ** Frequent network interruption ** ** Inconsistent data ** Every time a slave server is disconnected, whether it is an active disconnection or a network fault, the slave server must dump all RDB from the master server, and then aOF; This means that the synchronization process has to be performed all over again, so remember that if you have multiple slave servers, do not start them all at once. Sentinel is a distributed system used to 'monitor' each server in the master/slave structure. In case of failure, the new master is selected by voting mechanism and all slaves are connected to the new master sentinel: * Monitor * Continuously check whether the master and slave are running properly * Master survival detection, Master and slave health detection * notifications (reminders) * Automatic failover * Disconnect the master from the slave, select a slave as the master, Connect other slaves to the new master and inform the client of the new server address note: Sentinel is also a Redis server, but it does not provide data service. Sentinel configuration is usually an odd number of sentinels. ## Enable Sentinel mode * configure one drag-two master-slave structure * Configure three Sentinel.conf (same configuration, different ports).Copy the code

// Port 26379

// Information storage directory dir/TMP

// MyMaster: a custom name 2: How many sentinels think the master has failed, usually set to (number of sentinels /2) +1 Sentinel monitor myMaster 127.0.0.1 6379 2

// How long did the monitor master not respond to Sentinel down-after-milliseconds myMaster 30000

Sentinel PARALLEL – Syncs myMaster 1

Sentinel Failover -timeout myMaster 180000

The sentinel goes through three stages during the master/slave switchover process * monitoring the status information used to synchronize each node * obtaining the status of each sentinel (whether online or not) * Obtain master status * Master properties * runid * role: Master * Details about each slave * Obtain all slave status (based on the master information) * Slave properties * runid * role: Slave * master_host, master_port * offset *...... * Notification * failover * Vote on which Sentinel to choose, Select alternate master * online * slow * Long disconnected from original master * priority * priority * offset * runid * send instructions (sentinel) * Send Slaveof no one to the new master * Slaveof the new masterIP port to other slavesCopy the code

The cluster

Cluster is to use the network to connect a number of computers, and provide a unified management mode, so that the external presentation of a single service effect

Cluster role:

Load balancing is implemented by distributing the access pressure on a single server

The storage pressure of a single server is dispersed to achieve scalability

Reduce service disasters caused by the failure of a single server

Redis cluster structure design

Data storage design

Through algorithm design, calculate the location where the key should be saved
All storage space plans are cut into 16,384 pieces, and each host saves a portion
- Each copy represents a storage space, not a storage space for a key
Put the key into the corresponding storage space according to the calculated result
Enhanced scalability

Design of cluster internal communication

Each database communicates with each other and stores slot number data in each database
Once hit, return directly
One miss, inform the location

Cluster Sets up the cluster structure

Three master-slave (one-to-one) architectures

Configure redis. Conf

Cluster-enabled yes // Cluster configuration file name. This file is automatically generated. Cluster-config-file nodes-6379.conf // Timeout duration of node service response, Cluster-node-timeout 10000 // Master slave Minimum number of connections cluster-migration-barrier <count>Copy the code

Start the cluster

//1: a master is connected to a slave, and the first three are the master. Rb create --replicas 2 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383./redis-trib.rb create --replicas 2 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384Copy the code

operation

Redis -cli -c // Set data set name xiaoming // Get data get nameCopy the code

Enterprise-level solutions

Cache warming

Problem: The server fails quickly after startup

Screen:

1. High number of requests

2. The data throughput between master and slave is large and the data synchronization operation frequency is high

Solution:

Preparatory work:

1. Daily routine data access records and hotspot data with high access frequency

2. Use LRU data deletion strategy to build data retention queue
Preparations:

The data in the statistics result is classified. Redis preferentially loads the hotspot data with a higher level

Using distributed multiple servers to read data at the same time to speed up the data loading process
Implementation:

The data warm-up process is permanently triggered using a script

If possible, use CDN(content delivery Network), the effect is better

Conclusion:

Cache preheating means that relevant cache data is directly loaded to the cache system before system startup. Avoid the problem of querying the database first and then caching the data when the user requests it! Users directly query cached data that has been preheated.

Cache avalanche

Database server crash:

The system runs smoothly, suddenly the database connection quantity surges
The application server cannot process requests in a timely manner
A large number of 408,500 error pages appear
The customer refreshes the page repeatedly for data
Database crash
Application Server Crash
Restarting the application server fails
Redis server crashes
The Redis cluster crashes
After the database is restarted, it is knocked down by instantaneous traffic again

Troubleshoot problems

More sets of keys in the cache expire over a shorter period of time
During this period, redis attempts to access expired data. Redis fails and retrieves data from the database
The database receives a large number of requests simultaneously and cannot process them in a timely manner
Redis has a large number of requests backlogged and starts to time out
The database crashes due to the database traffic surge
No data is available in the cache after the restart
The Redis server resources are heavily occupied and the Redis server crashes
The Redis cluster collapses
The application server fails to receive data and respond to requests in a timely manner. As a result, the number of requests from clients increases, and the application server crashes
The application server, Redis and database are all restarted, but the effect is not ideal

Problem analysis

In a short time frame
A large number of keys expire in a centralized manner

Solution (Channel)

More page static processing
Build a multi-level cache architecture

Nginx cache + Redis cache + EhCache
Optimize services to detect serious Mysql time consuming

Troubleshoot database bottlenecks, such as timeout queries and time-consuming transactions
Disaster warning system

Monitor redis server performance metrics
- CPU usage or CPU usage
- Memory capacity
- Query the average response time
- The number of threads
Traffic limiting and degradation

Sacrifice some customer experience for a short period of time, restrict access to some requests, reduce the pressure on the application server, and gradually release access after services run at a low speed

Solution (technique)

Switch between LRU and LFU
Data validity period policy adjustment
- Staggered peaks are classified according to the validity period of service data: 90 minutes for class A, 80 minutes for class B, and 70 minutes for class C
- Expiration time is in the form of fixed time + random value, diluting the number of expired keys in the set
Super hot data uses a permanent key
Regular maintenance (automatic + manual)

Analyze the traffic volume of the data that is about to expire, confirm whether the data is delayed, and delay the hot data according to the traffic statistics
lock

Careful!

conclusion

A cache avalanche is when the amount of out-of-date data is so large that it causes stress to the database server.

If the expiration time concentration can be effectively avoided, avalanches can be effectively solved (about 40%), and other policies can be used together, and the running data of the server can be monitored and quickly adjusted according to the running records.

Cache breakdown

Database server Crash

The system is running smoothly
The number of database connections shot up
Redis server does not have a large number of keys expired
Redis memory is smooth with no fluctuations
The CPU of the Redis server is normal
Database crash

Troubleshoot problems

A key in Redis expired and the key was heavily accessed
Multiple data requests were pressed directly from the server to Redis, but none was hit
Redis initiates a large number of accesses to the same data in the database in a short period of time

Problem analysis

Hot data of a single key
The key expired

Solution (technique)

preset

Take e-commerce as an example. According to the store level, each merchant designates several main products and increases the expiration time of such information keys during the shopping festival

Note: not only does the shopping festival refer to the day, but also the following days, the peak visits tend to decrease gradually
The adjustment

Monitor the traffic volume and extend the expiration period or set the key as permanent for the data with natural traffic surge
Background Data Refresh

Start scheduled tasks and refresh data validity periods before peak hours to ensure data loss
The second level cache

Set different failure time, guarantee will not be eliminated at the same time on the line
lock

Distributed lock, prevent breakdown, but also pay attention to performance bottleneck, careful!

conclusion

Cache breakdown is the moment when a single hot data expires. There is a large amount of data traffic. After redIS is missed, a large number of database accesses to the same data are initiated, causing pressure on the database server.

The coping strategy should be based on business data analysis and prevention, as well as running monitoring tests and adjusting policies in real time. After all, it is difficult to monitor the expiration of a single key.

The cache to penetrate

Database server Crash

The system is running smoothly
The traffic on the application server increases over time
Redis server hit ratio decreases over time
Redis memory smooth, memory pressure free
Redis server CPU usage increases rapidly
Database server stress spikes
Database crash

Troubleshoot problems

A large area of miss appears in Redis
Abnormal URL access occurred

Problem analysis

The obtained data does not exist in the database, and the corresponding data is not found in the database query
Redis returns null data without persisting it
Repeat the process the next time such data arrives
A hacker attack occurred on the server

Solution (technique)

Cache is null

Cache the data whose query result is null (used for a long time and cleared periodically). Set a short time limit, such as 30-60 seconds and a maximum of 5 minutes
Whitelist policy

Preheat the bitmaps corresponding to various classified data ids in advance. Id is used as the offset of bitmaps, which is equivalent to setting the data whitelist. Release normal data when loading, and directly intercept abnormal data when loading (low efficiency)
Use bloom filters (hit issues with Bloom filters are negligible for current conditions)

monitor

Real-time monitoring of the ratio of redis hit ratio (usually a fluctuation value when business is normal) to NULL data

Fluctuation of non-active period: usually 3-5 times of detection, and more than 5 times of detection will be included in key screening objects
Fluctuation of activity period: usually 10-50 times of detection, and more than 50 times of detection will be included in key screening objects

Start different troubleshooting processes based on the multiple. Then use the blacklist for prevention and control (operation)

The key to encrypt the

When the problem occurs, the disaster prevention service key is temporarily started, the encryption service of the key is transmitted at the business layer, and the verification program is set to verify the incoming key

For example, 60 encryption strings are randomly allocated every day, two or three are selected and mixed into the page data ID. If the access key does not meet the rules, the data access is rejected

conclusion

Cache penetration: Access to non-existent data, skipping the Redis data cache phase of legitimate data, causing stress to the database server each time the database is accessed. Usually the amount of this kind of data is a low value, when this kind of situation occurs, and timely alarm.

The countermeasures should focus more on temporary preplan prevention.

Both blacklist and whitelist are pressure on the whole system, and should be removed as soon as the alarm is cleared.

Performance Indicator Monitoring

Performance indicators: Performance

Name	Description
latency	The time Redis takes to respond to a request
instantaneous_ops_per_sec	Average number of requests processed per second
To hit the right rate (calculated)	Cache hit ratio (calculated)

Memory specifications: Memory

Name	Description
used_memory	Used memory
mem_fragmentation_ratio	Memory fragmentation rate
evicted_keys	The number of keys removed due to the maximum memory limit
blocked_clients	A client that is blocked due to BLPOOP, BRPOP, or BRPOPLPUSH

Basic Activity indicator: Basic Activity

Name	Description
connected_clients	Number of client connections
connected_slaves	Number of Slave
master_last_io_seconds_ago	Number of seconds since last master-slave interaction
keyspace	Total number of keys in the database

Persistence indicator: Persistence

Name	Description
rdb_last_save_time	The timestamp of the last persistent save to disk
rdb_changes_since_last_save	The number of changes to the database since the last persistence

Error indicator: Error

Name	Description
rejected_connections	The number of connections rejected because the maxClient limit was reached
keyspace_misses	Number of key search failures (no match)
master_link_down_since_seconds	Duration of master/slave disconnection in seconds

Commands for monitoring performance indicators

benchmark

The command

redis-benchmark [-h ] [-p ] [-c ] [-n <requests]> [-k ]
Copy the code

// Default: 50 connections, 10000 requests performance redis-benchmark //100 connections, 5000 requests performance redis-benchmark -C 100-N 5000Copy the code

Monitor Displays debugging information about the server

monitor
Copy the code

slowlog [operator]
- Get: Obtains slow query logs
- Len: Obtains the number of slow query log entries
- Reset: Resets slow query logs

slowlog get
Copy the code

The related configuration

Slowlog-log-slower than 1000 # Slowlog-log-slower than 1000 # Slowlog-max-len 100 # Slowlog-max-len 100 # Slowlog-max-lenCopy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Redis senior

Persistence mechanism

Snapshotting snapshot

append-only-file AOF

Other problems

Redis transactions

Basic operation

A statement error can occur in two ways

Watch the lock

A distributed lock

A deadlock

Deletion policy

Stale data

Data Deletion Policy

Out of algorithm

Server Configuration

The advanced data type bitmaps

Basic operation

Extend the operation

The advanced data type HyperLogLog

Basic operation

The advanced data type GEO

Basic operation

A master-slave replication

Master slave communication process

The configuration steps

The cluster

Redis cluster structure design

Cluster Sets up the cluster structure

Enterprise-level solutions

Cache warming

Cache avalanche

Cache breakdown

The cache to penetrate

Performance Indicator Monitoring

Commands for monitoring performance indicators

Redis senior

Persistence mechanism

Snapshotting snapshot

append-only-file AOF

Other problems

Redis transactions

Basic operation

A statement error can occur in two ways

Watch the lock

A distributed lock

A deadlock

Deletion policy

Stale data

Data Deletion Policy

Out of algorithm

Server Configuration

The advanced data type bitmaps

Basic operation

Extend the operation

The advanced data type HyperLogLog

Basic operation

The advanced data type GEO

Basic operation

A master-slave replication

Master slave communication process

The configuration steps

The cluster

Redis cluster structure design

Cluster Sets up the cluster structure

Enterprise-level solutions

Cache warming

Cache avalanche

Cache breakdown

The cache to penetrate

Performance Indicator Monitoring

Commands for monitoring performance indicators

Related Posts

MySQL > add/delete MySQL > add/delete MySQL

How to build mall system with Thinkphp (1)

The persistence layer framework MyBatis