Persistence mechanism

What is persistence

The working mechanism of using permanent storage media to store data and restore the saved data at a specific time is called persistence.

To ensure the security of the memory data, Redis saves the data in the memory as a file to the hard disk. After the server restarts, the data on the hard disk is automatically restored to the REDis.

Redis supports two types of persistence:

Default snapshotting mode append-only file(aof)Copy the code

Snapshotting snapshot

This persistence is enabled by default. All data in redis is stored in a single file on hard disk (the backup file name is dump.rdb by default).

If the data is very large (10-20GB) it is not suitable to do this persistence operation frequently.

  • You can set the save location, and backup file name.

    Dbfilename Specifies the snapshot filename. The default value is dump. RDB

    Dir./ Storage location of the snapshot fileCopy the code

Other Configuration Items

Rdbchecksum yes Rdbchecksum yes Rdbchecksum yes Rdbchecksum yes Write stop-writing-on-bgsave-error yes if the main process stopsCopy the code
  • Manually Initiating a Snapshot

    Method 1: Login status

    Simply execute bgSave

    Method 2: No login status

    /redis-cli -a Password bgsave

  • Automatic execution (configuration files)

    Data is dumped from the memory every N minutes or N write operations to form RDB files and compressed to the backup directory.

  • How to enable (default enabled, has its own trigger conditions)

    Save 900 1 if more than one key is changed within 900 seconds, the snapshot is saved

    Save 300 10 300 seconds When more than 10 keys are modified, snapshots are initiated

    Save 60 10000 60 seconds More than 10000 keys are modified and snapshots are initiated

    Note: You can disable the snapshot mode by masking the trigger condition.

  • Advantages of RDB

RDB is a compact binary file with high storage efficiency

RDB stores redis data snapshots at a certain point in time, which is very suitable for data backup, full replication and other scenarios

RDB can recover data much faster than AOF

Application: Perform bgSave backups on the server every X hours and copy RDB files to a remote machine for disaster recovery

  • RDB shortcomings

In RDB mode, whether the command is executed or the configuration is used, because the snapshot mode is performed at a certain interval, so if redis unexpectedly goes down, all changes since the last snapshot will be lost.

The BGSave directive forks the child process each time it runs, sacrificing some performance

There is no unified version of RDB file format among many versions of Redis, and it is possible that the data format between different versions of services cannot be compatible with images

append-only-file AOF

Disadvantages of RDB storage

  • The storage efficiency is low due to a large amount of data

    Based on the snapshot concept, all data is read and written each time. When the amount of data is large, the efficiency is very low

  • The I/O performance of a large amount of data is low

  • Creating child processes based on fork incurs additional memory consumption

  • Data loss risks caused by downtime

Essence: To record each “write” command (add, modify, delete) in an independent log, and then execute the command in the AOF file again when restarting to recover data. In contrast to RDB, it can simply be described as the process of changing recorded data to record data generation.

  • AOF three data writing strategies

    Appendfsync always forcibly writes to disk every time a write command is received. Slowest, but ensures full persistence. Not recommended

    Appendfsync everysec // forces writes to disk once per second, a good compromise between performance and persistence, recommended.

    Appendfsync no // Completely dependent on OS, best performance, no persistence guaranteed

  • How to open
Appendonly yes appendfsync everysec // Save the command file (the path can be specified) appendfilename appendone.aofCopy the code
  • Aof file rewrite

Problem: each command overwrites aOF once. If a key operation is performed 100 times, producing 100 lines, the aOF file will be very large.

For example, when the incr number operation is performed multiple times, the aof file stores the incr number command multiple times.

This will increase the size of the AOF file. We can rewrite the AOF file to compress the repeated commands into a single command.

For example, if you do incr number 10 times, compress it into set number 11

As commands write to AOF, files get bigger and bigger. To solve this problem, Redis introduced AOF rewriting to reduce file size.

Aof file rewrite is the process of converting data in the Redis process into write commands to synchronize to a new AOF file.

Simply put, it is to convert the execution results of several commands on the same data into the corresponding instructions of the final result data for recording.

Function:

Reduce disk usage and improve disk utilization

Improves the persistence efficiency, reduces the persistent write time, and improves I/O performance

Reduces the data recovery time and improves the data recovery efficiency

  • Manually execute the override command:

    Login status: Enter bgrewriteaof

    Not logged in:./bin/redis-cli -a Password bgrewriteaof

  • Perform automatic override conditions:

    Auto-aof-rewrite-percentage 100 is overwritten when the file size increases by 100% compared to the last rewrite

    // If the aof file is at least 64MB in size, rewrite auto-aof-rewrite-min-size 64MB

    // Stop synchronizing aof no-appendfsync-on-rewrite yes

Other problems

  • When RDB is dumped, will AOF be lost if synchronization is stopped?

No, all operations are cached in the memory queue. After the dump is complete, the operation is unified

  • What does AOF rewrite mean

Aof overwriting refers to writing the data in memory into the. Aof log to solve the problem of large AOF log.

  • If both RDB and AOF files exist, who is the first to recover data

aof

  • Whether the two can be used simultaneously

Yes, and recommended

  • When recovering, which is faster, RDB or AOF

RDB is fast because it is a memory map of data that is loaded directly into memory, whereas AOF is a command that needs to be executed line by line.

Note: If both persistence modes are enabled, AOF prevails. Although the snapshot mode is fast in recovery, aOF overwrites the snapshot mode. Therefore, if both persistence modes are enabled, AOF prevails.

Redis transactions

A Redis transaction is a queue of command execution that wraps a series of predefined commands into a whole (a queue). When executing, they are added in sequence at one time without interruption or interference.

Redis vs. mysql transactions

MySQL Redis
open start transaction multi
statements Common SQL Ordinary command
failure The rollback rollback This cancellation
successful commit exec

Basic operation

Set zhao 1000 set Wang 2000 multi // enable transaction decrby zhao 100 // add transaction queue Actually execute mget Zhao Wang //900 //2100Copy the code
Mget zhao Wang //900 //2100 multi // enable transaction decyby Zhao 200 // Join transaction queue incrby wang 200 // Join transaction queue discard // Cancel transaction, Release transaction queue mget Zhao Wang //900 //2100Copy the code

A statement error can occur in two ways

1. Grammar problems

In this case, an error is reported and all statements cannot be executed

Mget zhao Wang //900 //2100 multi Decrby zhao 200 aghdsajd // Because of syntax error mget zhao Wang //900 //2100Copy the code

2. The grammar itself is correct, but the object of application is problematic.

For example, zadd operates on a list object and, after exec, executes the correct statement and skips inappropriate statements

Mget zhao Wang //900 //2100 multi decrby zhao 200 sadd Wang 200 This statement has no syntax error, but will fail to be executed. Wang is a string and operates on the set syntax. When exec // commits, it will be half successful, which is a bit of a violation of transaction atomicity. mget zhao wang //700 //2100Copy the code

Watch the lock

The watch command monitors one or more keys, and if one of them is changed (or deleted), subsequent transactions will not be executed,

Monitoring continues until the exec command (the commands in the transaction are executed after the exec command, and the monitored keys are automatically unwatched after the exec command is executed)

Scene: I’m buying a ticket

ticket -1 , money -100

There is only one ticket, and if the ticket is bought by someone else after multi and before exec —— becomes 0.

How to observe this situation and do not submit again.

Pessimistic thinking the world is full of danger, there must be someone and I grab, lock the ticket, only I can operate ------ pessimistic thinking no one and I grab, I just need to pay attention, if anyone changes the value of the ticket can ------ optimistic lockCopy the code

In Redis transactions, optimistic locks are enabled and only responsible for monitoring whether the key has been changed

Set ticket 1 Set money 100 watch ticket // Enable ticket monitoring. If the ticket value changes, // ok multi decr ticket decrby money 100 exec //nil NilCopy the code

A distributed lock

Business scenario: How to avoid the last item being purchased by more than one person at the same time? [Oversold problem]

Solution:

Set a public lock using setnx, using the return value characteristics of the setnx command (set failure if there is a value, set success if there is no value)

If the Settings are returned successfully, the system has the right to perform the next service operation

For return Settings that fail, there is no control, queue or wait

Release the lock by performing the DEL operation

$redis = new Redis();
$redis->connect('127.0.0.1');
$redis->auth('321612');

 / / lock
 $lock = $redis -> setnx('lock-num'.1);
 
 if($lock) {$num = $redis->get('num');
 	if($num>0) {$redis->decr('num');
 	}
 	$redis->del('lock-num');
 }
Copy the code

A deadlock

Depending on the distributed lock mechanism, the client breaks down when a user performs operations and the client has obtained the lock. How to solve it?

Analysis:

Because the lock operation is controlled by the user, there is a risk that the lock is not unlocked

The unlocking operation cannot be controlled only by the user. The system must provide the corresponding guarantee solution.

Solution:

Use expire to add a time limit to the lock key

expire key second

pexpire key milliseconds

Deletion policy

Stale data

Redis is a kind of memory level database, all data are stored in memory, memory data can obtain its state through TTL instruction

XX: Time-sensitive data

-1: indicates permanently valid data

-2: indicates expired data, deleted data, or undefined data

Data Deletion Policy

Objectives of the data deletion policy:

Finding a balance between memory usage and CPU usage can cause overall Redis performance to degrade, even causing server outages or memory leaks

  • Time to delete

Create a timer. When the key expires, the timer task deletes the key immediately

Advantages: saving memory, then delete, quickly release unnecessary memory occupation

Disadvantages: The CPU is under great pressure. No matter how high the CPU load is at this time, it will occupy THE CPU, which will affect the Redis server response time and instruction throughput

Summary: Trading processor performance for storage (time for space)

  • Lazy to delete

Data is not processed when it reaches the expiration date. The next time you access the data

If not, the data is returned

Delete it and return it does not exist

Advantages: Saves CPU performance and deletes the vm only when the vm must be deleted

Disadvantages: High memory pressure, long – term memory occupying data

Summary: Trading storage for processor performance (trading space for time)

  • Periodically delete

Periodical polling of time-sensitive data in redis library, random sampling strategy is adopted, and the deletion frequency is controlled by the proportion of expired data

Feature 1: CPU usage has a peak value, and the detection frequency can be customized

Feature 2: The memory pressure is not high. Cold data that occupies memory for a long time is cleared continuously

Summary: Periodically check storage space (randomly and critically)

Principle:

When Redis starts server initialization, it reads the value of configuration server.Hz, which defaults to 10

Server.hz times per second serverCron()

– “databasesCron ()

– “activeExpireCycle ()

ActiveExpireCycle () checks each Expires [*] one by one, 250ms/ server.Hz each time

For a certain Expires [*] test, W keys are randomly selected for detection

If the key times out, delete the key

If the number of keys removed in a round is greater than W*25%, the process is repeated

If the number of keys removed in a round <=W*25%, check the next Expires [*], cycle 0-15

W Value =ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP Attribute value

The current_db parameter is used to record which Expires [*] expires the activeExpireCycle() enters

If the activeExpireCycle() execution time expires, the next execution continues down from current_db

  • Deleting policy Comparison
Time to delete Lazy to delete Periodically delete
memory Save memory, no occupation Heavy Memory usage The memory is cleared periodically and randomly
CPU Occupying CPU resources at all times with high frequency Delay execution, high CPU utilization It takes fixed CPU resources per second to maintain memory
conclusion Trade time for space Trade space for time Random sampling, key sampling

Redis uses lazy delete internally and deletes periodically

Out of algorithm

What if I run out of memory when new data comes into Redis?

Redis uses memory to store data and calls freeMemoryIfNeeded() to check that memory is sufficient before executing each command.

If the memory does not meet the minimum storage requirements for newly added data, Redis temporarily deletes some data to clear storage space for the current instruction.

The strategy for cleaning up data is called an eviction algorithm.

Note:

The process of evicting data is not 100% likely to clean up enough usable memory and is repeated if unsuccessful.

When all data has been attempted, an error message will appear if memory cleanup requirements are not met.

The configurations related to data expulsion are affected

// Maximum available memory. Maxmemory // The proportion of physical memory occupied. The default value is 0, indicating that there is no limit. Maxmemory-samples maxMemory-samples maxMemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples maxmemory-samples Therefore, random data acquisition is adopted as the policy for deleting data to be detected // expulsion policy maxmemory-policy // Deletes selected data when the maximum memory is reachedCopy the code

There are three types of eviction strategies:

  • Detect volatile data (data sets that may expire server.db[I].expires)

Volatile – LRU: Selects the least recently used data for elimination

Volatile – lFU: Selects the data that has been used least recently

Volatile – TTL: Selects data to be obsolete

Volatile -random: Randomly selects data for elimination

  • Check full database data (all datasets server.db[I].dict)

Allkeys-lru: Pick the least recently used data for elimination

Allkeys-lfu: Select the data that has been used least recently

Allkeys-random: Randomly selects data for elimination

  • Discarding data expulsion

No-enviction: disables ejection Of data (default policy in Redis4.0), raising OOM (Out Of Memory) error

Example: Specific configuration

maxmemory-policy volatile-lru
Copy the code

Data expulsion policy configuration basis

Run the INFO command to output monitoring information, query the number of hits and misses cached, and adjust the Redis configuration based on service requirements

keyspace_hits

keyspace_misses
Copy the code

Server Configuration

Server setup

/ / set the server running in the form of the daemon daemonize yes | no / / bind the host address bind 127.0.0.1 / / set the server port port 6379 / / set up the database number 16 databasesCopy the code

The log configuration

/ / set to specify the server log level loglevel debug | verbose | notice | warning / / log filename logfile port number. The logCopy the code

Client Configuration

// Set the maximum number of client connections at a time. Maxclients 0 Redis closes new connections when the number of client connections reaches the upper limit. To disable this function, set it to 0 timeout 300Copy the code

Quick configuration of multiple servers

Import and load the specified configuration file information to quickly create redis instance configuration files with many common redis configurations and maintain include /path/ server-port number. ConfCopy the code

The advanced data type bitmaps

Function: Used for information state statistics

Basic operation

  • Setbit key offset Value Sets the bit value of the offset corresponding to the specified key. The value can be 1 or 0
setbit bits 0 1
Copy the code
  • Getbit key offset Gets the bit value of the offset corresponding to the specified key
getbit bits 0
//1

getbit bits 10
//0
Copy the code

Extend the operation

Movie website

Statistics on whether a movie is on demand at a given time each day

Count how many movies are shown on demand every day

Count how many movies are on demand per week/month/year

Figure out which movies weren’t on demand this year

  • Bitcount key [start end] Counts the number of 1s in the specified key
setbit 20200808 11 1
setbit 20200808 333 1
setbit 20200808 1024 1

setbit 20200809 44 1
setbit 20200809 55 1
setbit 20200809 1024 1

bitcount 20200808 
//3
bitcount 20200809
//3

setbit 20200808 6 1
bitcount 20200808 
//4
Copy the code
  • bitop op destKey key1 [key2…] Perform the join, union, non, xOR operation on the specified key bit by bit and save the result to the destKey

And: /

Or: and

Not:

Xor: xor

bitop or 08-09 20200808 20200809
Copy the code

The advanced data type HyperLogLog

Purpose: Used in radix statistics

Cardinality: The cardinality is the number of elements in the dataset after deduplication

Note:

Used for cardinality statistics, not collections, does not save data, only records data rather than specific data

The core of the algorithm is cardinality estimation, and the final value has some error

Margin of error: The cardinality estimate results in an approximation with a standard error of 0.81%

Space consumption is minimal, with each Hyperloglog key taking up 12K of memory for marking cardinality

The pfmerge command occupies 12 KB of storage space, regardless of the amount of data before merging

Basic operation

  • pfadd key element [element…] Add data
pfadd hll 1

pfadd hll 1

pfadd hll 1

pfadd hll 1

pfadd hll 2
pfadd hll 2
pfadd hll 3
Copy the code
  • pfcount key [key…] statistics
pfcount hll
//3
Copy the code
  • pfmerge destkey sourcekey [sourcekey…] Merge data

The advanced data type GEO

Purpose: It is used in geographical location information calculation

Basic operation

  • Add the coordinates of the geographic location

    geoadd key longitude latitude member [longitude latitude member …]

geoadd geos 1 1 a

geoadd geos 2 2 b
Copy the code
  • Gets the coordinates of a geographic location

    geopos key member [member…]

geopos geos a

geopos geos b
Copy the code
  • Calculate the distance between two positions

    geodist key member1 member2 [unit]

geodist geos a b m

geodist geos a b km
Copy the code
  • Retrieves a set of geographic locations within a specified range according to the latitude and longitude coordinates given by the user

    GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count] [ASC|DESC] [STORE key] [STOREDIST key]

    Geoadd geos 1 1,1 geoadd geos 1 2,2 geoadd geos 1 3 1,3 geoadd geos 2 1 2 2 geoadd geos 2 3 3 Geoadd geos 3 1 3 1 geoadd geos 3 2 3 2 Geoadd geos 3 3 3 3 geoadd geOS 5 5 5 5 Georadius geos 1.5 1.5 90 km //1,2 /2 2 / / 1, 1 / / 2, 1Copy the code
  • Gets a set of geographic locations within a specified range based on a location stored in the location set

    GEORADIUSBYMEMBER key member radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count] [ASC|DESC] [STORE key] [STOREDIST key]

    Georadiusbymember administrators, 2 km / 180/1, 1/2/2, 1 / / / / / / 2, 1, 2, 2, 1/3 / / / / / 2, 3, 3, 2 / / 3, 3Copy the code
  • Returns the geohash value of one or more location objects

    GEOHASH key member [member …]

    Geohash administrators 2 / / s037ms06g70Copy the code

A master-slave replication

In order to reduce the load on each Redis server, you can set several more and use master slave mode

One server load “writes” (adds, modifies, deletes) data, the other server load “reads” data, and the primary server data is “automatically” synchronized to the secondary server.

Redis supports easy-to-use master-slave replication, which allows a slave server to become an exact replica of the master Server.

The role of master-slave replication:

Read/write separation: Master writes data and slave reads data, improving the read/write load of the server

Load balancing: Based on the master-slave structure, with read/write separation, the slave shares the master load and changes the number of slaves according to the change of demand. The data read load is shared by multiple slave nodes, greatly improving the concurrency and data throughput of redis server

Fault recovery: When the master fails, the slave provides services for rapid fault recovery

Data redundancy: Hot data backup is a data redundancy method other than persistence

High availability cornerstone: Based on master/slave replication, build sentinel mode and cluster to realize the high availability solution of Redis

Note:

Redis uses asynchronous replication, which cannot block the master and slave servers

A master server can have multiple slave servers. Not only can the master server have slave servers, but slave servers can also have their own slave servers

There are three phases of master/slave replication:

Establish connection phase

Data synchronization phase

Command propagation phase

Master slave communication process

The configuration steps

Prepare two VMS:

192.168.1.69 main 192.168.1.70 from

Master server configuration:

Bind 127.0.0.1 to #bind 127.0.0.1 protected-mode yes to protected-mode noCopy the code

Slave server configuration:

  • Use Slaveof to specify your role, master server address and IP

    //slaveof Primary server IP port slaveof 192.168.1.69 6379Copy the code
  • Secondary server read only

    // Starting from redis2.6, the slave server supports read-only mode, which is configured using the slave-read-only configuration item. This mode is the default mode of the slave server. slave-read-only yesCopy the code
  • Specifies the password for the secondary server to connect to the primary server

    If the master server has a password set through the RequirePass option, configure the password to connect to the master server through Masterauth to allow the slave synchronization operation to proceed smoothly.

    Masterauth 321612 (Master server password)Copy the code
** Verify ** : Run the info replication command on the secondary server to check whether the configuration is correct. Master_link_status :up If the configuration is successful, if the configuration is down, the configuration fails. ** Cancel ** : > simply mask the above configuration from the slave server. Master: * If the master has a large amount of data, avoid traffic peak hours during data synchronization to avoid blocking the master and affecting normal service execution. * If the size of the replication buffer is not set properly, data overflow may occur. If the full replication period is too long and data is already lost during partial replication, a second full replication must be performed, causing the slave to fall into the dead-loop state. * The memory usage of the master single machine should not be too large. You are advised to use 50%-70% of the memory of the host, and reserve 30%-50% of the memory for bgsave command execution and replication buffer creationCopy the code

// default 1mb repl-backlog-size 1mb

Slave: * You are advised to disable external services during full or partial replication to avoid server response congestion or data synchronizationCopy the code

slave-serve-stale-data yes|no

* During data synchronization, the master sends a message to the slave (ping). It can be understood that the master is a client of the slave and actively sends commands to the slave. * Multiple slaves request data synchronization to the master at the same time, and the number of RDB files sent by the master increases, causing a huge impact on bandwidth. If the master database has insufficient bandwidth, the data synchronization needs to be based on service requirements. ## command propagation phase * When the master database state is changed, the state of the master database is inconsistent with that of the slave database. In this case, the master database must be synchronized to a consistent state. The action of synchronization is called command propagation * The master sends the received data change command to the slave, and the slave executes the command after receiving the command. * Network disconnection occurs during command propagation * Network intermittent disconnection and intermittent connection Ignore * Short duration network interruption Partial replication * Long duration network interruption Full replication * Three core elements of partial replication * Run ID of the server * Replication backlog buffer of the primary server * Replication offset of the primary and secondary servers ## Common problems with master/slave replication ** Frequent full replication ** ** Frequent network interruption ** ** Inconsistent data ** Every time a slave server is disconnected, whether it is an active disconnection or a network fault, the slave server must dump all RDB from the master server, and then aOF; This means that the synchronization process has to be performed all over again, so remember that if you have multiple slave servers, do not start them all at once. Sentinel is a distributed system used to 'monitor' each server in the master/slave structure. In case of failure, the new master is selected by voting mechanism and all slaves are connected to the new master sentinel: * Monitor * Continuously check whether the master and slave are running properly * Master survival detection, Master and slave health detection * notifications (reminders) * Automatic failover * Disconnect the master from the slave, select a slave as the master, Connect other slaves to the new master and inform the client of the new server address note: Sentinel is also a Redis server, but it does not provide data service. Sentinel configuration is usually an odd number of sentinels. ## Enable Sentinel mode * configure one drag-two master-slave structure * Configure three Sentinel.conf (same configuration, different ports).Copy the code

// Port 26379

// Information storage directory dir/TMP

// MyMaster: a custom name 2: How many sentinels think the master has failed, usually set to (number of sentinels /2) +1 Sentinel monitor myMaster 127.0.0.1 6379 2

// How long did the monitor master not respond to Sentinel down-after-milliseconds myMaster 30000

Sentinel PARALLEL – Syncs myMaster 1

Sentinel Failover -timeout myMaster 180000

The sentinel goes through three stages during the master/slave switchover process * monitoring the status information used to synchronize each node * obtaining the status of each sentinel (whether online or not) * Obtain master status * Master properties * runid * role: Master * Details about each slave * Obtain all slave status (based on the master information) * Slave properties * runid * role: Slave * master_host, master_port * offset *...... * Notification * failover * Vote on which Sentinel to choose, Select alternate master * online * slow * Long disconnected from original master * priority * priority * offset * runid * send instructions (sentinel) * Send Slaveof no one to the new master * Slaveof the new masterIP port to other slavesCopy the code

The cluster

Cluster is to use the network to connect a number of computers, and provide a unified management mode, so that the external presentation of a single service effect

Cluster role:

Load balancing is implemented by distributing the access pressure on a single server

The storage pressure of a single server is dispersed to achieve scalability

Reduce service disasters caused by the failure of a single server

Redis cluster structure design

Data storage design

  • Through algorithm design, calculate the location where the key should be saved
  • All storage space plans are cut into 16,384 pieces, and each host saves a portion
    • Each copy represents a storage space, not a storage space for a key
  • Put the key into the corresponding storage space according to the calculated result
  • Enhanced scalability

Design of cluster internal communication

  • Each database communicates with each other and stores slot number data in each database
  • Once hit, return directly
  • One miss, inform the location

Cluster Sets up the cluster structure

Three master-slave (one-to-one) architectures

Configure redis. Conf

Cluster-enabled yes // Cluster configuration file name. This file is automatically generated. Cluster-config-file nodes-6379.conf // Timeout duration of node service response, Cluster-node-timeout 10000 // Master slave Minimum number of connections cluster-migration-barrier <count>Copy the code

Start the cluster

//1: a master is connected to a slave, and the first three are the master. Rb create --replicas 2 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383./redis-trib.rb create --replicas 2 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384Copy the code

operation

Redis -cli -c // Set data set name xiaoming // Get data get nameCopy the code

Enterprise-level solutions

Cache warming

Problem: The server fails quickly after startup

Screen:

1. High number of requests

2. The data throughput between master and slave is large and the data synchronization operation frequency is high

Solution:

  • Preparatory work:

    1. Daily routine data access records and hotspot data with high access frequency

    2. Use LRU data deletion strategy to build data retention queue

  • Preparations:

    The data in the statistics result is classified. Redis preferentially loads the hotspot data with a higher level

    Using distributed multiple servers to read data at the same time to speed up the data loading process

  • Implementation:

    The data warm-up process is permanently triggered using a script

    If possible, use CDN(content delivery Network), the effect is better

Conclusion:

Cache preheating means that relevant cache data is directly loaded to the cache system before system startup. Avoid the problem of querying the database first and then caching the data when the user requests it! Users directly query cached data that has been preheated.

Cache avalanche

Database server crash:

  1. The system runs smoothly, suddenly the database connection quantity surges

  2. The application server cannot process requests in a timely manner

  3. A large number of 408,500 error pages appear

  4. The customer refreshes the page repeatedly for data

  5. Database crash

  6. Application Server Crash

  7. Restarting the application server fails

  8. Redis server crashes

  9. The Redis cluster crashes

  10. After the database is restarted, it is knocked down by instantaneous traffic again

Troubleshoot problems

  1. More sets of keys in the cache expire over a shorter period of time

  2. During this period, redis attempts to access expired data. Redis fails and retrieves data from the database

  3. The database receives a large number of requests simultaneously and cannot process them in a timely manner

  4. Redis has a large number of requests backlogged and starts to time out

  5. The database crashes due to the database traffic surge

  6. No data is available in the cache after the restart

  7. The Redis server resources are heavily occupied and the Redis server crashes

  8. The Redis cluster collapses

  9. The application server fails to receive data and respond to requests in a timely manner. As a result, the number of requests from clients increases, and the application server crashes

  10. The application server, Redis and database are all restarted, but the effect is not ideal

Problem analysis

  • In a short time frame

  • A large number of keys expire in a centralized manner

Solution (Channel)

  1. More page static processing

  2. Build a multi-level cache architecture

    Nginx cache + Redis cache + EhCache

  3. Optimize services to detect serious Mysql time consuming

    Troubleshoot database bottlenecks, such as timeout queries and time-consuming transactions

  4. Disaster warning system

    Monitor redis server performance metrics

    • CPU usage or CPU usage

    • Memory capacity

    • Query the average response time

    • The number of threads

  5. Traffic limiting and degradation

    Sacrifice some customer experience for a short period of time, restrict access to some requests, reduce the pressure on the application server, and gradually release access after services run at a low speed

Solution (technique)

  1. Switch between LRU and LFU

  2. Data validity period policy adjustment

    • Staggered peaks are classified according to the validity period of service data: 90 minutes for class A, 80 minutes for class B, and 70 minutes for class C

    • Expiration time is in the form of fixed time + random value, diluting the number of expired keys in the set

  3. Super hot data uses a permanent key

  4. Regular maintenance (automatic + manual)

    Analyze the traffic volume of the data that is about to expire, confirm whether the data is delayed, and delay the hot data according to the traffic statistics

  5. lock

    Careful!

conclusion

A cache avalanche is when the amount of out-of-date data is so large that it causes stress to the database server.

If the expiration time concentration can be effectively avoided, avalanches can be effectively solved (about 40%), and other policies can be used together, and the running data of the server can be monitored and quickly adjusted according to the running records.

Cache breakdown

Database server Crash

  1. The system is running smoothly

  2. The number of database connections shot up

  3. Redis server does not have a large number of keys expired

  4. Redis memory is smooth with no fluctuations

  5. The CPU of the Redis server is normal

  6. Database crash

Troubleshoot problems

  1. A key in Redis expired and the key was heavily accessed

  2. Multiple data requests were pressed directly from the server to Redis, but none was hit

  3. Redis initiates a large number of accesses to the same data in the database in a short period of time

Problem analysis

  • Hot data of a single key

  • The key expired

Solution (technique)

  1. preset

    Take e-commerce as an example. According to the store level, each merchant designates several main products and increases the expiration time of such information keys during the shopping festival

    Note: not only does the shopping festival refer to the day, but also the following days, the peak visits tend to decrease gradually

  2. The adjustment

    Monitor the traffic volume and extend the expiration period or set the key as permanent for the data with natural traffic surge

  3. Background Data Refresh

    Start scheduled tasks and refresh data validity periods before peak hours to ensure data loss

  4. The second level cache

    Set different failure time, guarantee will not be eliminated at the same time on the line

  5. lock

    Distributed lock, prevent breakdown, but also pay attention to performance bottleneck, careful!

conclusion

Cache breakdown is the moment when a single hot data expires. There is a large amount of data traffic. After redIS is missed, a large number of database accesses to the same data are initiated, causing pressure on the database server.

The coping strategy should be based on business data analysis and prevention, as well as running monitoring tests and adjusting policies in real time. After all, it is difficult to monitor the expiration of a single key.

The cache to penetrate

Database server Crash

  1. The system is running smoothly

  2. The traffic on the application server increases over time

  3. Redis server hit ratio decreases over time

  4. Redis memory smooth, memory pressure free

  5. Redis server CPU usage increases rapidly

  6. Database server stress spikes

  7. Database crash

Troubleshoot problems

  1. A large area of miss appears in Redis

  2. Abnormal URL access occurred

Problem analysis

  • The obtained data does not exist in the database, and the corresponding data is not found in the database query

  • Redis returns null data without persisting it

  • Repeat the process the next time such data arrives

  • A hacker attack occurred on the server

Solution (technique)

  1. Cache is null

    Cache the data whose query result is null (used for a long time and cleared periodically). Set a short time limit, such as 30-60 seconds and a maximum of 5 minutes

  2. Whitelist policy

  • Preheat the bitmaps corresponding to various classified data ids in advance. Id is used as the offset of bitmaps, which is equivalent to setting the data whitelist. Release normal data when loading, and directly intercept abnormal data when loading (low efficiency)

  • Use bloom filters (hit issues with Bloom filters are negligible for current conditions)

  1. monitor

    Real-time monitoring of the ratio of redis hit ratio (usually a fluctuation value when business is normal) to NULL data

  • Fluctuation of non-active period: usually 3-5 times of detection, and more than 5 times of detection will be included in key screening objects

  • Fluctuation of activity period: usually 10-50 times of detection, and more than 50 times of detection will be included in key screening objects

    Start different troubleshooting processes based on the multiple. Then use the blacklist for prevention and control (operation)

  1. The key to encrypt the

    When the problem occurs, the disaster prevention service key is temporarily started, the encryption service of the key is transmitted at the business layer, and the verification program is set to verify the incoming key

    For example, 60 encryption strings are randomly allocated every day, two or three are selected and mixed into the page data ID. If the access key does not meet the rules, the data access is rejected

conclusion

Cache penetration: Access to non-existent data, skipping the Redis data cache phase of legitimate data, causing stress to the database server each time the database is accessed. Usually the amount of this kind of data is a low value, when this kind of situation occurs, and timely alarm.

The countermeasures should focus more on temporary preplan prevention.

Both blacklist and whitelist are pressure on the whole system, and should be removed as soon as the alarm is cleared.

Performance Indicator Monitoring

  • Performance indicators: Performance
Name Description
latency The time Redis takes to respond to a request
instantaneous_ops_per_sec Average number of requests processed per second
To hit the right rate (calculated) Cache hit ratio (calculated)
  • Memory specifications: Memory
Name Description
used_memory Used memory
mem_fragmentation_ratio Memory fragmentation rate
evicted_keys The number of keys removed due to the maximum memory limit
blocked_clients A client that is blocked due to BLPOOP, BRPOP, or BRPOPLPUSH
  • Basic Activity indicator: Basic Activity
Name Description
connected_clients Number of client connections
connected_slaves Number of Slave
master_last_io_seconds_ago Number of seconds since last master-slave interaction
keyspace Total number of keys in the database
  • Persistence indicator: Persistence
Name Description
rdb_last_save_time The timestamp of the last persistent save to disk
rdb_changes_since_last_save The number of changes to the database since the last persistence
  • Error indicator: Error
Name Description
rejected_connections The number of connections rejected because the maxClient limit was reached
keyspace_misses Number of key search failures (no match)
master_link_down_since_seconds Duration of master/slave disconnection in seconds

Commands for monitoring performance indicators

  • benchmark

The command

redis-benchmark [-h ] [-p ] [-c ] [-n <requests]> [-k ]
Copy the code
// Default: 50 connections, 10000 requests performance redis-benchmark //100 connections, 5000 requests performance redis-benchmark -C 100-N 5000Copy the code
  • Monitor Displays debugging information about the server
monitor
Copy the code
  • slowlog [operator]
    • Get: Obtains slow query logs

    • Len: Obtains the number of slow query log entries

    • Reset: Resets slow query logs

slowlog get
Copy the code

The related configuration

Slowlog-log-slower than 1000 # Slowlog-log-slower than 1000 # Slowlog-max-len 100 # Slowlog-max-len 100 # Slowlog-max-lenCopy the code