Java interview summary summary, including Java key knowledge, as well as common open source framework, welcome to read. The article may have wrong place, because individual knowledge is limited, welcome everybody big guy to point out! The article continues to be updated at……

ID The title address
1 Design Mode Interview Questions (Most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
2 Java Basics (most comprehensive interview questions) Juejin. Cn/post / 684490…
3 Java Set interview Questions (the most comprehensive interview questions) Juejin. Cn/post / 684490…
4 JavaIO, BIO, NIO, AIO, Netty Interview Questions (Most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
5 Java Concurrent Programming Interview questions (most comprehensive interview questions) Juejin. Cn/post / 684490…
6 Java Exception Interview Questions (The most comprehensive interview Questions) Juejin. Cn/post / 684490…
7 Java Virtual Machine (JVM) Interview Questions Juejin. Cn/post / 684490…
8 Spring Interview Questions (the most comprehensive interview questions) Juejin. Cn/post / 684490…
9 Spring MVC Interview Questions (The most comprehensive interview Questions) Juejin. Cn/post / 684490…
10 Spring Boot Interview Questions (Most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
11 Spring Cloud Interview Questions (The most comprehensive interview questions) Juejin. Cn/post / 684490…
12 Redis Interview Questions (most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
13 MyBatis Interview Questions (most comprehensive interview questions) Juejin. Cn/post / 684490…
14 MySQL Interview Questions (most comprehensive interview questions) Juejin. Cn/post / 684490…
15 TCP, UDP, Socket, HTTP interview questions Juejin. Cn/post / 684490…
16 Nginx Interview Questions (The most comprehensive interview Questions) Juejin. Cn/post / 684490…
17 ElasticSearch interview questions
18 Kafka interview questions
19 RabbitMQ interview questions (most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
20 Dubbo Interview Questions (the most comprehensive interview questions) Juejin. Cn/post / 684490…
21 ZooKeeper Interview Questions Juejin. Cn/post / 684490…
22 Netty Interview Questions (Most comprehensive summary of interview questions)
23 Tomcat Interview Questions (The most comprehensive interview questions) Juejin. Cn/post / 684490…
24 Linux Interview Questions (Most comprehensive Summary of interview questions) Juejin. Cn/post / 684490…
25 Internet Related interview Questions (the most comprehensive summary of interview questions)
26 Internet Security Questions (Summary of the most comprehensive interview questions)

An overview of the

What is Redis?

  • Redis is a C language, open source, high performance key-value non-relational cache database. It supports storing a relatively larger number of value types, including string, list, set, zset, sorted set, and hash. Redis data is cache-based, so it can handle more than 100,000 read and write operations per second, making it the fastest key-value DB known. Redis can also write data to the disk, ensuring data security and not loss, and Redis operation is atomic.

What are the pros and cons of Redis?

  • advantages

    • Excellent read/write performance, Redis can read 110,000 times /s, write 81,000 times /s.
    • Supports data persistence and AOF and RDB persistence modes.
    • Transactions are supported, all Redis operations are atomic, and Redis also supports atomic execution of several combined operations.
    • In addition to string values, it supports hash, set, zset, and list data structures.
    • Supports master/slave replication. The master machine automatically synchronizes data to the slave machine, enabling read/write separation.
  • disadvantages

    • The database capacity is limited by physical memory and cannot be used for high-performance read and write of massive data. Therefore, Redis is mainly suitable for high-performance operations and operations with small data volume.
    • Redis does not have automatic fault tolerance and recovery functions. The breakdown of the slave host will lead to the failure of some front-end read and write requests, which can be recovered only after the machine restarts or manually changing the front-end IP address.
    • When the host is down, some data cannot be synchronized to the secondary host in a timely manner. After the IP address is switched, data inconsistency may occur, which reduces system availability.
    • Redis is difficult to support online capacity expansion, which can be complicated when the cluster capacity reaches its maximum. To avoid this problem, o&M personnel must ensure sufficient space when the system goes online, which wastes a lot of resources.

What are the benefits of using Redis?

  • (1) Fast speed, because the data is stored in memory, similar to HashMap. The advantage of HashMap is that the time complexity of search and operation is very low
  • (2) Support rich data types, support string, list, set, sorted set, hash
  • (3) Support transactions, operations are atomicity, the so-called atomicity of the data is either all executed, or not all executed
  • (4) Rich features: can be used for cache, message, by key set expiration time, expiration will be automatically deleted

Why Redis/why cache

Look at this in terms of "high performance" and "high concurrency."

  • High performance:

    • Suppose the user accesses some data in the database for the first time. This process is slow because it is being read from the hard disk. The data accessed by the user is stored in the cache so that the next time the data is accessed, it can be retrieved directly from the cache. Operating cache is directly operating memory, so it’s pretty fast. If the corresponding data in the database changes, the corresponding data in the cache can be synchronized to change!
  • High concurrency:

    • The direct operation cache can handle far more requests than the direct access to the database, so we can consider moving some of the data from the database to the cache, so that some of the user’s requests will go directly to the cache without going through the database.

Why Redis and not Map/Guava?

  • Caches are classified into local caches and distributed caches. Taking Java as an example, local cache is realized by using its own Map or Guava, which is lightweight and fast. The life cycle ends with the destruction of JVM. In the case of multiple instances, each instance needs to save its own cache, and the cache is inconsistent.

  • Using something like Redis or memcached is called distributed caching. In the case of multiple instances, each instance shares a cache of data and the cache is consistent. The disadvantage is that the redis or memcached service needs to be highly available and the overall program is architecturally complex.

Why is Redis so fast

  • 1, completely memory based, most requests are pure memory operations, very fast. Data is stored in memory, similar to a HashMap, which has the advantage of O(1) time complexity for both lookup and operation.

  • 2, the data structure is simple, the data operation is also simple, Redis data structure is specially designed;

  • 3, the use of single thread, avoid unnecessary context switch and competition conditions, there is no multi-process or multi-threading caused by the switch and CPU consumption, do not have to consider the problem of various locks, there is no lock release lock operation, there is no performance consumption due to the possibility of deadlock;

  • 4. Use multi-channel I/O multiplexing model, non-blocking IO;

  • 5, the use of the underlying model is different, between them the underlying implementation and communication with the client between the application protocol is not the same, Redis directly built their own VM mechanism, because the general system call system function, will waste a certain amount of time to move and request;

What data types do Redis have

  • Redis mainly has five data types, including String, List, Set, Zset and Hash, which meet most of the requirements
The data type Values that can be stored operation Application scenarios
String A string, integer, or floating point number Perform operations on the entire string or part of a string

Performs incrementing or decrement on integers and floating-point numbers
Do simple key-value caching
List The list of Push in or eject elements from both ends

To trim single or multiple elements,

Keep only one range of elements
Store a list of data structures, such as a list of fans, a list of comments on articles, and so on
Set Unordered collection Add, get, and remove individual elements

Checks if an element exists in the collection

Calculate intersection, union and difference sets

Retrieves elements randomly from the collection
Intersection, union, and difference operations, such as intersection, can combine two people’s fan lists into an intersection
Hash An unordered hash table containing key-value pairs Add, get, and remove a single key-value pair

Gets all key-value pairs

Check whether a key exists
Structured data, such as an object
ZSet An ordered set Add, get, and delete elements

Gets elements based on a range of points or members

Calculates the ranking of a key
De-duplicates but can be sorted, such as getting the top users

Application scenarios of Redis

  • counter

    You can incrementing and subtracting strings to implement counters. Redis is an in-memory database with very high read and write performance, which is ideal for storing the number of frequent reads and writes.

  • The cache

    Hotspot data is put into memory, maximum memory usage is set, and a flushing policy is set to ensure cache hit ratio.

  • Session cache

    Redis can be used to store session information for multiple application servers in a unified manner. When the application server no longer stores user session information, it is no longer stateful, and a user can request any application server, making it easier to achieve high availability and scalability.

  • Full page Cache (FPC)

    In addition to the basic session token, Redis also provides a very simple FPC platform. Magento, for example, provides a plug-in to use Redis as a full-page caching back end. Also, for WordPress users, Pantheon has a great plugin called WP-Redis that will help you load pages you’ve viewed as quickly as possible.

  • A lookup table

    DNS records, for example, are ideal for Redis. Lookup tables are similar to caches in that they take advantage of Redis’s fast lookup features. But the contents of the lookup table cannot be invalidated, and the contents of the cache can be invalidated because the cache is not a reliable source of data.

  • Message queues (publish/subscribe)

    The List is a bidirectional linked List that can write and read messages via Lpush and RPOP. However, it is better to use messaging middleware such as Kafka and RabbitMQ.

  • Distributed lock implementation

    In distributed scenarios, processes on multiple nodes cannot be synchronized using locks in a single-node environment. You can use the SETNX command of Redis to implement distributed locks. In addition, you can also use the RedLock distributed lock implementation provided by the government.

  • other

    Set can realize the intersection, union and other operations, so as to achieve common friends and other functions. ZSet can achieve ordered operations, so as to achieve leaderboards and other functions.

persistence

  • What is Redis persistence? Persistence is to write the data in memory to disk to prevent the loss of memory data when the service is down.

What is the persistence mechanism of Redis? What are their strengths and weaknesses?

  • Redis provides two persistence mechanisms, RDB (default) and AOF:
RDB: Redis DataBase snapshot
  • RDB is the default Redis persistence mode. The memory data is saved as a snapshot to the hard disk at a certain time. The generated data file is dump. RDB. The snapshot period is defined by the save parameter in the configuration file.

  • Advantages:

    1, only one file dump. RDB, convenient persistence.

    2, good disaster recovery, a file can be saved to a safe disk.

    To maximize performance, fork the child process to complete the write operation and let the main process continue processing the command, so maximize IO. A separate sub-process is used for persistence, and the main process does not perform any IO operations, ensuring the high performance of Redis

    4. The startup efficiency is higher than that of AOF when the data set is large.

  • Disadvantages:

    1. Low data security. RDB is persisted for a period of time. If REDis fails during the persistence, data will be lost. So this approach is more suitable when data requirements are not strict.)

    2. AOF (Appends-only file) persistence mode: all command line records are stored in the format of Redis command request protocol (fully persistent) as AOF files.

AOF: persistence:
  • In AOF persistence (Append Only File persistence), each write command executed by Redis is recorded in a separate log File. When Redis is restarted, data in the persistent log File will be restored.

  • When both methods are enabled, Redis preferentially selects AOF for data recovery

  • Advantages:

    Data security: In aOF persistence, you can configure appendfsync and always to record every command operation to the AOF file once.

    2. Write files in Append mode. Even if the server breaks down during the process, you can use the redis-check-aOF tool to solve data consistency problems.

    Rewrite model of AOF mechanism. You can delete some of the AOF files (such as flushall) before they are rewrite (merge commands when the file is too big)

  • Disadvantages:

    1. The AOF file is larger than the RDB file, and the recovery speed is slow.

    2. When the data set is large, the startup efficiency is lower than RDB.

  • What are the pros and cons of both types of persistence?

    • AOF files are updated more frequently than RDB files. AOF files are used to restore data preferentially.
    • AOF is more secure and larger than RDB
    • RDB has better performance than AOF
    • If both are configured with priority loading AOF

How to choose the right persistence method

  • In general, if you want to achieve data security comparable to PostgreSQL, you should use both persistence features. In this case, when Redis restarts, AOF files will be loaded first to recover the original data, because AOF files usually hold more complete data sets than RDB files.

  • If you care deeply about your data, but can still afford to lose it within minutes, you can use RDB persistence only.

  • Many users use AOF persistence only, but this approach is not recommended because periodic SNAPSHOTS are very convenient for database backups and RDB can recover data sets faster than AOF. In addition, RDB can avoid BUGS in AOF programs.

  • If you only want your data to exist as long as the server is running, you can do so without any persistence.

Redis persistent data and cache

  • If Redis is used as a cache, use consistent hashing for dynamic scaling and scaling.

  • If Redis is used as a persistent store, a fixed keys-to-nodes mapping must be used, and the number of nodes cannot be changed once determined. Otherwise (that is, if the Redis nodes need to change dynamically), you must use a system that can rebalance data at run time, which currently only the Redis cluster can do.

Redis delete policy for expired keys

As we all know, Redis is a key-value database. We can set the expiration time of the cached key in Redis. Redis's expiration policy refers to how Redis treats cached keys when they expire.

  • There are three types of expiration policies:

  • Timed expiration: You need to create a timer for each key that is set to expire. The timer is cleared immediately when the key expires. This policy can immediately clear expired data and is memory friendly; However, it takes up a lot of CPU resources to process expired data, which affects the response time and throughput of the cache.

  • Lazy expiration: Only when a key is accessed, the system determines whether the key has expired, and the expired key is cleared. This strategy maximizes CPU savings but is very memory unfriendly. In extreme cases, a large number of expired keys may not be accessed again and thus will not be cleared, occupying a large amount of memory.

  • Periodic expiration: At regular intervals, a certain number of keys in the expires dictionary of a certain number of databases are scanned and the expired keys are cleared. This strategy is a compromise between the first two. By adjusting the interval of periodic scan and the time limit of each scan, you can achieve the optimal balance of CPU and memory resources in different situations. (The Expires dictionary holds expiration data for all keys with an expiration date set, where key is a pointer to a key in the key space and value is the expiration date represented by the millisecond UNIX timestamp for that key. The key space refers to all the keys stored in the Redis cluster.

Redis uses both lazy expiration and periodic expiration strategies.

How to set the expiration time and permanent validity of Redis key respectively?

  • Expire and persist commands.

We know that the expiration time of the key is set through expire, but what about expired data?

  • In addition to the cache invalidation policies provided by the cache server (Redis has 6 policies to choose from by default), we can also customize the cache invalidation policies based on specific business requirements. There are two common policies:
    • 1. Clear expired cache periodically;
    • 2, when there is a user request, and then determine whether the cache used in this request is expired, expired to the underlying system to get new data and update the cache.

Both have their own advantages and disadvantages, the first disadvantage is to maintain a large number of cache keys is more troublesome, the second disadvantage is that each user request to come over to determine cache invalidation, logic is relatively complex! You can weigh which solution to use according to your own application scenario.

MySQL has 2000W data, redis only 20W data, how to ensure that the data in Redis is hot data

  • When the redis memory data set grows to a certain size, a data obsolescence strategy is implemented.

What are Redis’ memory elimination strategies

Redis's memory flushing strategy refers to how to deal with data that needs to be written and require additional space when Redis has insufficient memory for cache.

  • Global key space selective removal

    • Noeviction: New write operations will bug when memory is insufficient to accommodate new write data.
    • Allkeys-lru: Removes the least recently used key from the key space when memory is insufficient to accommodate new writes. (This is the most commonly used)
    • Allkeys-random: Randomly removes a key from the key space when memory is insufficient to accommodate new writes.
  • Key space with expiration time is selectively removed

    • Volatile – lRU: Removes the least recently used key from the expired key space when memory is insufficient to accommodate new writes.
    • Volatile -random: Randomly removes a key from the expired key space when memory is insufficient to accommodate new writes.
    • Volatile – TTL: When the memory is insufficient to accommodate new data, the key whose expiration time is earlier is removed from the key space.
  • conclusion

The selection of Redis memory flushing strategy does not affect the processing of expired keys. The memory flushing policy is used to process the data that needs to apply for extra space when the memory is insufficient. Expiration policies are used to process expired cached data.

What physical resources does Redis consume?

  • Memory.

What happens when Redis runs out of memory?

  • If the upper limit is reached, Redis write commands will return an error message (but read commands will return normally). Or you can configure a memory flushing mechanism to flush out old content when Redis reaches its memory limit.

How does Redis optimize memory?

  • You can make good use of collection types like Hash,list,sorted set,set, etc., because many small key-values can often be grouped together in a more compact way. Use hashes whenever possible. Hashes use very little memory, so you should abstract your data model into a hash as much as possible. For example, if you have a user object in your Web system, do not set a separate key for the user’s name, last name, email address, and password. Instead, store all of the user’s information in a hash table

Threading model

Redis threading model

Redis developed a network event handler based on the Reactor pattern, which is called the File Event Handler. It consists of four parts: multiple sockets, IO multiplexer, file event dispatcher and event handler. Redis is called the single-threaded model because the consumption of the file event dispatcher queue is single-threaded.

  • The file event handler uses the I/O multiplexing program to listen for multiple sockets at the same time and associate the socket with different event handlers based on the task it is currently performing.
  • When the socket being listened to is ready to perform accept, read, write, close, etc., file events corresponding to the operation are generated, and the file event handler invokes the event handler associated with the socket to handle these events.

Although the file event handler runs in a single-threaded manner, by using I/O multiplexing programs to listen for multiple sockets, the file event handler implements a high-performance network communication model and interconnects well with other modules in the Redis server that also run in a single-threaded manner. This keeps the single-threaded design inside Redis simple.

The transaction

What is a transaction?

  • A transaction is a single isolated operation: all commands in the transaction are serialized and executed sequentially. The transaction will not be interrupted by command requests from other clients during execution.

  • A transaction is an atomic operation: all or none of the commands in a transaction are executed.

The concept of Redis transactions

  • The essence of a Redis transaction is a collection of commands through MULTI, EXEC, WATCH, etc. Transactions allow multiple commands to be executed at once, and all commands in a transaction are serialized. During the execution of a transaction, commands in the execution queue are sequentially serialized, and command requests submitted by other clients are not inserted into the transaction execution command sequence.

  • Summary: Redis transaction is a one-time, sequential, exclusive execution of a series of commands in a queue.

The three phases of Redis transactions

  1. Transaction start MULTI
  2. The command team
  3. Transaction execution EXEC

During transaction execution, if the server receives a request other than EXEC, DISCARD, WATCH, or MULTI, the request will be queued

Redis transaction related commands

Redis transaction functionality is implemented through four primitives: MULTI, EXEC, DISCARD and WATCH

Redis serializes all the commands in a transaction and executes them sequentially.

  1. Redis does not support rollback, “Redis does not roll back when a transaction fails, but continues to execute the remaining commands,” so the internal redis can be kept simple and fast.
  2. If a command in a transaction fails, none of the commands are executed.
  3. If a run error occurs in a transaction, the correct command is executed.
  • The WATCH command is an optimistic lock that provides check-and-set (CAS) behavior for Redis transactions. One or more keys can be monitored, and once one of them is modified (or deleted), subsequent transactions are not executed, up to the EXEC command.
  • The MULTI command is used to start a transaction and always returns OK. MULTI after execution, the client can send any number of commands to the server. These commands are not executed immediately, but are placed in a queue. When the EXEC command is invoked, all commands in the queue will be executed.
  • EXEC: Executes all commands within the transaction block. Returns the return value of all commands in the transaction block, in the order in which they were executed. Returns nil when the operation is interrupted.
  • By calling DISCARD, the client can empty the transaction queue, abandon the transaction, and exit from the transaction state.
  • Using the UNWATCH command, you can disable watch from monitoring all keys.

Overview of Transaction Management (ACID)

  • Atomicity refers to the fact that a transaction is an indivisible unit of work in which all or none of the operations occur.

  • Consistency Data integrity must be consistent before and after a transaction.

  • Isolation When multiple transactions are executed concurrently, the execution of one transaction should not affect the execution of other transactions

  • Durability Durability means that once a transaction is committed, its changes to the data in the database are permanent and they should not be affected if the database fails afterwards

Redis transactions always have consistency and isolation in ACID, other features are not supported. Transactions are also durable when the server is running in _AOF_ persistence mode and the appendfsync option has the value always.

Do Redis transactions support isolation

  • Redis is a single-process program, and it guarantees that the transaction will not be interrupted while executing the transaction, and the transaction can run until all the commands in the transaction queue are executed. Therefore, Redis transactions are always isolated.

Do Redis transactions guarantee atomicity and support rollback

  • In Redis, single commands are executed atomically, but transactions are not guaranteed atomicity and there is no rollback. If any command in a transaction fails to execute, other commands will still be executed.

Redis transaction other implementations

  • Based on Lua scripts, Redis ensures that commands within the script are executed once and in sequence. It also does not provide rollback of transaction execution errors. If some commands fail during execution, the remaining commands will continue to run
  • Based on the intermediate marker variable, another marker variable is used to identify whether the transaction is completed. When reading data, the marker variable is read first to judge whether the transaction is completed. But this will need to write extra code implementation, more tedious

Cluster solution

1. Sentinel mode

Introduction to the Sentry

Sentinel is Chinese name for sentinel. Sentinel is a very important component in redis cluster organization. It has the following functions:

  • Cluster monitoring: Monitors whether the Redis master and slave processes are working properly.
  • Message notification: If a Redis instance fails, the sentry is responsible for sending a message as an alarm notification to the administrator.
  • Failover: If the master node fails, it is automatically transferred to the slave node.
  • Configuration center: Notifies the client client of the new master address if failover occurs.

Sentinel is used to achieve high availability of redis cluster, itself is also distributed, as a sentinel cluster to run, work with each other.

  • During failover, determining whether a master node is down requires the agreement of most of the sentinels, which relates to distributed elections.
  • Even if some of the sentinels fail, the sentinels will still work, because if a failover system that is an important part of the high availability mechanism is itself a single point of failure, it will be bad.

The core knowledge of sentinels

  • Sentinels need at least three instances to be robust.
  • Sentinel + Redis master-slave deployment architecture does not guarantee zero data loss, only high availability of redis cluster.
  • For sentry + Redis master-slave complex deployment architectures, try to do adequate testing and practice in both test and production environments.

2. Official Redis Cluster solution (server routing query)

  • Can you explain how redis cluster mode works? How is redis key addressed in clustered mode? What are the algorithms for distributed addressing? Do you know consistent hash algorithms?

Introduction to the

  • Redis Cluster is a server-side Sharding technology available in version 3.0. Instead of using a consistent hash, the Redis Cluster uses the concept of slots, which are divided into 16,384 slots. The request is sent to any node, and the node that receives the request sends the query request to the correct node for execution

Solutions that

  1. Data is divided into hash segments. Each node stores data in a certain range of hash slots (hash values). By default, 16,384 slots are allocated
  2. Each data fragment is stored on multiple nodes that are master and slave to each other
  3. Data is written to the primary node and then synchronized to the secondary node (blocking synchronization can be configured).
  4. Data on multiple nodes in the same fragment is inconsistent
  5. When reading data, redis returns a turn instruction pointing to the correct node if the key that the client operates on is not assigned to that node
  6. During capacity expansion, some data on the existing node must be migrated to the new node
  • In redis Cluster architecture, each Redis should release two port numbers, such as 6379, and add 1W port number, such as 16379.
    • The 16379 port number is used for communication between nodes, that is, cluster Bus, which is used for fault detection, configuration updates, failover authorization. Cluster Bus uses a different binary protocol,gossipA protocol used for efficient data exchange between nodes that consumes less network bandwidth and processing time.

Internal communication mechanism between nodes

  • Basic Communication Principles

  • Cluster metadata can be maintained in two modes: centralized mode and Gossip protocol. Redis Cluster nodes communicate with each other using the Gossip protocol.

Distributed addressing algorithm

  • Hash algorithm (massive cache reconstruction)
  • Consistent Hash algorithm (automatic cache migration) + Virtual Node (automatic load balancing)
  • Hash Slot algorithm of redis cluster

advantages

  • Without a central architecture, it supports dynamic capacity expansion and is transparent to services
  • It has Sentinel monitoring and automatic Failover capability
  • The client does not need to connect to all nodes in the cluster, but to any available node in the cluster
  • High performance, the client directly connects to redis service, eliminating the loss of proxy

disadvantages

  • Operation and maintenance is also complex, and data migration requires manual intervention
  • Only database 0 can be used
  • Batch operations are not supported
  • Distributed logic and storage module coupling, etc

3. Client-based allocation

Introduction to the

  • Redis Sharding is a multi-Redis instance Cluster method commonly used in the industry before The emergence of Redis Cluster. The main idea is to use hash algorithm to hash the key of Redis data. Through hash function, the specific key will be mapped to the specific Redis node. Java Redis client drives Jedis and supports Redis Sharding function, namely ShardedJedis and ShardedJedisPool combined with cache pool

advantages

  • The advantage is that it is very simple, the server Redis instances are independent of each other, no correlation with each other, each Redis instance runs like a single server, very easy to linear expansion, the system is very flexible

disadvantages

  • As Sharding processing is placed on the client side, operation and maintenance will be challenged when the scale is further expanded.
  • Client Sharding does not support dynamic addition and deletion of nodes. When the server Redis instance group topology changes, each client needs to update and adjust. Connections cannot be shared, and as applications scale up, resource waste constrains optimization

4. Sharding based on proxy server

Introduction to the

  • The client sends the request to a proxy component, which parses the client’s data, forwards the request to the correct node, and finally returns the result to the client

Characteristics of the

  • Transparent access, business procedures do not care about the back-end Redis instance, low switching cost
  • The Proxy logic is isolated from the storage logic
  • The proxy layer has one more forwarding, which degrades the performance

Industry open source solutions

  • Twtter open source Twemproxy
  • Pea Pod open source Codis

5. Redis master-slave architecture

  • A single REDis can carry QPS ranging from tens of thousands to tens of thousands. For caches, this is generally used to support high read concurrency. Therefore, the architecture is made into a master-slave architecture, with one master and many slaves. The master is responsible for writing and copying data to other slave nodes, and the slave nodes are responsible for reading. All read requests go from the node. In this way, it is also easy to achieve horizontal expansion and support high read concurrency.

Redis Replication -> Master/slave Architecture -> Read/write Separation -> Horizontal expansion supports high read concurrency

The core mechanism of Redis Replication

  • Redis replicates data to slave nodes asynchronously. However, starting from Redis2.8, slave nodes periodically confirm the amount of data they replicate each time.
  • A master node can be configured with multiple slave nodes.
  • Slave nodes can also connect to other slave nodes.
  • The slave node does not block the normal operation of the master node.
  • The slave node does not block its own query operation when performing replication. Instead, it uses the old data set to provide services. However, when the replication is complete, the old data set needs to be deleted and the new data set needs to be loaded. At this time, the external service will be suspended.
  • The slave node is used for horizontal capacity expansion and read/write separation. The expanded slave node improves read throughput.

Note:

  • If you use a master/slave architecture, it is recommended to enable persistence for master nodes. It is not recommended to use slave nodes as data hot standby for master nodes, because then if you turn off persistence for master nodes, The data may be empty when the master is down and restarted, and may be lost as soon as the slave node is replicated.

  • In addition, the master of various backup schemes, also need to do. If all local files are lost, select an RDB from the backup to restore the master. This ensures that data is available at startup. Even with the high availability mechanism described later, the slave node can automatically take over the master node. However, it is also possible that the Master node automatically restarts before Sentinel detects a master failure, or that all slave node data on it may be wiped clean.

The core principles of Redis master-slave replication

  • When a slave node is started, it sends a PSYNC command to the master node.

  • If this is the first time that the slave node connects to the master node, a full resynchronization full replication is triggered. The master will start a background thread and start generating an RDB snapshot file,

  • In addition, all new write commands received from the client are cached in the memory. After the RDB file is generated, the master sends the RDB to the slave. The slave writes the RDB to the local disk and then loads the RDB from the local disk into memory.

  • The master then sends the write commands cached in memory to the slave, and the slave synchronizes the data.

  • If the slave node is disconnected from the master node due to a network fault, the slave node automatically reconnects to the slave node. After the connection, the master node copies only the missing data to the slave node.

Process principle

  1. After the MS relationship between the slave and master is established, the SYNC command is sent to the master database
  2. After receiving the SYNC command, the master library starts saving snapshots in the background (RDB persistence) and caches write commands received during the process
  3. When the snapshot is complete, the master Redis sends the snapshot file and all cached write commands to the slave Redis
  4. Upon receipt from Redis, the snapshot file is loaded and the received cache command is executed
  5. Then, whenever the master Redis receives a write command, it sends the command to the slave Redis to ensure data consistency

disadvantages

  • All data replication and synchronization of the slave nodes are handled by the master node. As a result, the master node is under too much pressure and uses the master-slave structure to solve the problem

What is the master-slave replication model for Redis clusters?

  • In order to make the cluster usable even if some nodes fail or most nodes fail to communicate, the cluster uses a master-slave replication model, with n-1 replicas per node

How is Redis deployed in production?

  • Redis cluster has 10 machines, 5 of which deploy the master instance of Redis, and the other 5 deploy the slave instance of Redis. Each master instance has a slave instance. 5 nodes provide read and write services externally, and the peak QPS of each node may reach 50,000 per second. The maximum for five machines is 250,000 read/write requests /s.

  • What is the configuration of the machine? 32G memory + 8-core CPU + 1T disk, but 10G memory is allocated to the Redis process. In general online production environment, the redis memory should not exceed 10G as far as possible, which may cause problems.

  • Five machines provide external reading and writing, with a total of 50G of memory.

  • Because each primary instance has a secondary instance, it is highly available. If any primary instance goes down, it will automatically failover and Redis will automatically change from a real instance to the primary instance to continue providing read and write services.

  • What data are you writing into memory? What is the size of each piece of data? Commodity data, each piece of data is 10KB. 100 pieces of data is 1mb, 100,000 pieces of data is 1 gb. Resident memory is 2 million items of data, occupying 20 GB of memory, only less than 50% of the total memory. The current peak is around 3,500 requests per second.

In large companies, there is an infrastructure team responsible for the operation and maintenance of the cache cluster.

What is the concept of a Redis hash slot?

  • The Redis cluster does not use consistent hash, but introduces the hash slot concept. The Redis cluster has 16384 hash slots. After each key is verified by CRC16, the model of 16384 is taken to determine which slot to place.

Will there be write losses in the Redis cluster? Why is that?

  • Redis does not guarantee strong data consistency, which means that in practice the cluster may lose writes under certain conditions.

How are Redis clusters replicated?

  • Asynchronous replication

What is the maximum number of nodes in Redis cluster?

  • 16384

How do Redis clusters select databases?

  • The Redis cluster cannot make database selection at present, default is 0 database.

partition

Redis is single-threaded, how to improve the utilization of multi-core CPU?

  • You can deploy multiple instances of Redis on the same server and use them as different servers. At some point, one server is not enough anyway, so if you want to use more than one CPU, you can consider shards.

Why do Redis partitions?

  • Partitioning allows Redis to manage more memory, and Redis will be able to use all the machine’s memory. Without partitions, you can only use the memory of one machine. Partitioning allows Redis computing power to be multiplied by simply adding computers, and Redis network bandwidth to be multiplied by adding computers and network cards.

Do you know of any Redis partitioning implementations?

  • Client partitioning is where the client determines which Redis node data will be stored or read from. Most clients already implement client partitioning.
  • Proxy partitioning means that the client sends requests to the proxy, which then decides which node to write or read data to. The broker decides which Redis instances to request based on partitioning rules and then returns the Redis response to the client. One proxy implementation of Redis and Memcached is Twemproxy
  • Query routing means that the client requests any random redis instance and redis forwards the request to the correct Redis node. Redis Cluster implements a hybrid form of query routing, but rather than directly forwarding requests from one Redis node to another Redis node, it redirects directly to the correct Redis node with the help of the client.

What are the disadvantages of Redis partitioning?

  • Operations involving more than one key are generally not supported. For example, you can’t intersect two collections, because they might be stored in different Redis instances (there are ways to do this, but you can’t use intersection directives directly).
  • You cannot use Redis transactions if you operate on multiple keys simultaneously.
  • The granularity used for partitioning is key, You cannot store a data set using a partitioning granularity is The key, So it is not possible to shard a dataset with a single huge key like a very big sorted set)
  • Data processing can be very complicated when using partitions, for example you have to collect RDB/AOF files from different Redis instances and hosts simultaneously for backup.
  • Dynamic capacity expansion or reduction while partitioning can be complex. Redis clusters can transparently rebalance data to users by adding or removing Redis nodes at run time, but some other client or proxy partitioning methods do not support this feature. However, there is a pre-slice technology can also better solve this problem.

Distributed problem

Redis implements distributed locking

  • Redis is a single-process single-thread mode, using the queue mode to change the concurrent access into serial access, and there is no competition between multiple clients on Redis connection. In Redis, setNx command can be used to achieve distributed lock.

  • Set the value of the key to value if and only if the key does not exist. If the given key already exists, setNx does nothing

  • SETNX is short for SET if Not eXists.

  • Return value: 1 is returned if the setting is successful. Setting failed, return 0.

  • The process and matters of using setNx to complete synchronization lock are as follows:

  • Run the SETNX command to obtain the lock. If 0 is returned (the key already exists, the lock already exists), the lock fails to be obtained. Otherwise, the lock succeeds

  • In order to prevent the program from being abnormal after obtaining the lock, which will cause other threads/processes to enter the deadlock state by calling the setNx command and always returning 0, it is necessary to set a “reasonable” expiration time for the key to release the lock and use the DEL command to delete the lock data

How to solve the Redis concurrent competing Key problem

  • The problem with the so-called Redis concurrent contention for keys is that multiple systems operate on the same Key at the same time, but the order in which the Key is executed is not the order we expect it to be executed, resulting in different results!

  • One solution is recommended: distributed locks (distributed locks can be implemented by Both ZooKeeper and Redis). (Do not use distributed locks if Redis does not have concurrent competing Key issues, as this can affect performance)

  • Distributed lock based on ZooKeeper temporary ordered node. The general idea is that when each client locks a method, a unique instantaneous ordered node is generated in the directory of the specified node corresponding to the method on ZooKeeper. The way to determine whether to obtain the lock is very simple, just need to determine the smallest serial number in the ordered node. When the lock is released, the instantaneous node is simply removed. At the same time, it can avoid deadlock problems caused by locks that cannot be released due to service downtime. When the business process is complete, delete the corresponding child node to release the lock.

In practice, of course, reliability is given priority to. That's why Zookeeper came first.

Distributed Redis is early or late scale up to do well again? Why is that?

  • Since Redis is so lightweight (a single instance uses only 1M memory), it is best to start many instances at first to prevent future expansion. Even if you only have one server, you can start with Redis running in a distributed fashion, using partitions and launching multiple instances on the same server.

  • Setting up a few more Redis instances to start with, such as 32 or 64 instances, may seem cumbersome to most users, but the sacrifice is worth it in the long run.

  • That way, when your data is growing and you need more Redis servers, all you need to do is simply migrate Redis instances from one service to another (without worrying about repartitioning). Once you add another server, you need to migrate half of your Redis instances from the first machine to the second machine.

What is a RedLock

  • Redis official site proposed an authoritative Redis based distributed lock method namedRedlockThis method is more secure than the original single-node method. It guarantees the following features:
    1. Security features: Mutually exclusive access, that is, only one client can always get the lock
    2. Deadlock avoidance: The client can eventually acquire the lock and no deadlock occurs, even if the client that originally locked the resource crashes or a network partition occurs
    3. Fault tolerance: Service can be provided as long as most Redis nodes are alive

Cache abnormal

What is Redis penetration?

  • Mysql is overloaded with user requests to the mysql server via Redis. However, in a Web service, the bottleneck is very easy to occur in mysql, so let Redis to share the pressure of mysql, so this kind of problem is to avoid

  • Solutions:

    1. If the data cannot be cached or retrieved from the database, you can write the key-value pair to key-null. The cache validity period can be shorter, for example, 30 seconds. (If the value is too long, the cache cannot be used in normal cases.) This prevents the attacking user from using the same ID repeatedly for violent attacks
    2. Add verification at the interface layer, such as user authentication verification, id basic verification, ID <=0 direct interception;
    3. A Bloom filter is used to hash all possible data into a large enough bitmap. A non-existent data will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system


What is a Redis avalanche?

  • That is, the Redis service breaks down due to the heavy load, which leads to the heavy load and breakdown of mysql, and finally the whole system breaks down

  • Solutions:

    1. Redis cluster, which distributes the work done by one person to multiple people
    2. Cache preheating (Disable extranet access, start mysql first, use the preheating script to write hotspot data into the cache, and start the cache. Enabling the Extranet service)
    3. Data should not be set to the same lifetime, otherwise redis will be under too much pressure when it expires

What is Redis penetration?

  • In the case of high concurrency, because a key fails, multiple threads go to mysql to check the same business data and save it to Redis (in the case of concurrency, multiple copies of data are saved). After a period of time, multiple copies of data fail at the same time. Causing a sudden increase in stress

  • Solutions:

    1. Tiered cache (Cache two copies of data, the second copy of data lives longer as a backup, the first copy of data is used to be hit, if the second copy of data is hit, the first copy of data has expired, you need to go to mysql to request data to cache two copies of data again)
    2. Scheduled task (If the data lifetime is 30 minutes, the scheduled task will update cached data every 20 minutes)

Cache warming

  • Cache preheating means that relevant cache data is directly loaded into the cache system after the system goes online. This avoids the problem of first querying the database and then caching the data when the user requests it! Users directly query cached data that has been preheated in advance!

  • The solution

    1. Write a cache refresh page directly, manual operation when on-line;
    2. The amount of data is not large and can be loaded automatically when the project is started.
    3. Periodically refresh the cache;

Cache the drop

  • When traffic surges, service problems (such as slow or unresponsive response times) occur, or non-core services affect the performance of the core process, you still need to ensure that the service is still available, even at the expense of the service. The system can automatically degrade according to some key data, or manually degrade by configuring switches.

  • The ultimate goal of cache degradation is to keep the core service available, even if it is lossy. And some services can’t be downgraded (add to cart, checkout).

  • The system should be combed before demoting to see if the system can lose the pawn to protect the boss; To sort out what must be fiercely protected and what can be downgraded; For example, you can refer to log level Settings:

    1. General: For example, if some services time out due to network jitter or online services, they can be automatically degraded.
    2. Warning: If the success rate of some services fluctuates within a period of time (for example, between 95 and 100%), the service can be automatically degraded or manually degraded, and an alarm is sent.
    3. Error: for example, the availability rate is less than 90%, or the database connection pool is hit, or the number of visits suddenly jumped to the system can bear the maximum threshold, at this time can be automatically degraded or manually degraded according to the situation;
    4. Critical error: For example, the data is wrong due to special reasons, and an emergency manual downgrade is required.
  • The purpose of service degradation is to prevent the Redis service failure from causing an avalanche of database problems. Therefore, for unimportant cached data, service degradation strategy can be adopted. For example, a common practice is that Redis does not query the database, but directly returns the default value to the user.

Hot data and cold data

  • Hot data, cache is valuable

  • For cold data, most of the data may be squeezed out of memory before it is accessed again, taking up memory and having little value. For frequently modified data, consider caching as appropriate

  • For hot data, such as one of our IM products, birthday greeting module, and the list of birthday stars of the day, the cache may read hundreds of thousands of times. Another example is a navigation product where we cache the navigation information and then read it millions of times.

  • To make sense, the cache must be read at least twice before the data is updated. This is the most basic strategy, and if the cache fails before it can take effect, it won’t be of much value.

  • What about scenarios that exist and have a high frequency of changes, but you have to worry about caching? There are! Read the interface to the database, for example, the pressure is very big, but it is hot spot data, this time will need to be considered by caching means, reduce the pressure of the database, such as our some assistant product, thumb up, collect number, the number of share is a hotspot of typical data, but also constantly changing, right now you need to save the data synchronization to Redis cache, Reduce database stress.

Cache hot Key

  • A Key in the cache (such as a promotion goods), expire after a certain point in time, just point to the Key at this time a large number of concurrent requests, the requests found cache expiration usually set back from the backend DB load data and to the cache, this time big concurrent requests may be instant crush backend DB.

The solution

  • Select * from cache where KEY does not exist; select * from cache where KEY does not exist; Other processes wait if they find a lock, and then wait for the unlock to return data or enter the DB to query

Commonly used tools

What Java clients are supported by Redis? Which is the official recommendation?

  • Redisson, Jedis, lettuce, etc. Redisson is officially recommended.

What does Redis have to do with Redisson?

  • Redisson is an advanced distributed coordination Redis customer service, Can help users in the distributed environment to easily implement some Java objects (Bloom filter, BitSet, Set, SetMultimap, ScoredSortedSet, SortedSet, Map, ConcurrentMap, List, ListMultimap, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, ReadWriteLock, AtomicLong, CountDownLatch, Publish/Subscribe, HyperLogLog).

What are the pros and cons of Jedis versus Redisson?

  • Jedis is the client of Java implementation of Redis. Its API provides comprehensive support for Redis commands. Redisson implements distributed and extensible Java data structures. Compared with Jedis, Redisson has relatively simple functions. It does not support string manipulation, sorting, transaction, pipeline, partitioning and other Redis features. The goal of Redisson is to promote a separation of focus from Redisso that users can focus more on processing business logic.

Other problems

Redis vs. Memcached

  • Both are non-relational memory key databases, and Redis is now used by companies to implement caching, and Redis itself is becoming more powerful! Redis differs from Memcached in the following key ways:
Compare the parameters Redis Memcached
type 1. Memory support 2. Non-relational database 1. Memory 2. Key-value pair 3
Data store Type 1. String 2. List 3. Set 4. Hash 5. Text type 2. Binary type
Query the operation type 1. Batch operation 2. Transaction support 3. Different CRUDS for each type 1. Common CRUD 2. A few other commands
Additional features Publish/subscribe mode 2. Primary/secondary partition 3. Serialization support 4. Script support Lua script 1. Multi-threaded service support
Network IO model 1. Single-thread multiplex IO multiplexing model 1. Multi-threaded, non-blocking IO mode
The event store Self-styled simple event library AeEvent Noble lineage LibEvent event library
Persistence support 1. RDB 2. AOF Does not support
Cluster pattern Native support cluster mode, can achieve master/slave replication, read/write separation There is no native cluster mode, and clients are required to write data in fragments to the cluster
Memory management mechanism In Redis, not all data is stored in memory at all times, and some values that have not been used for a long time can be swapped to disk The Memcached data is always in memory, and Memcached breaks the memory into chunks of a specific length to store the data, eliminating the fragmentation problem entirely. However, this approach makes memory utilization low. For example, if a block size of 128 bytes stores only 100 bytes of data, the remaining 28 bytes will be wasted.
Applicable scenario Complex data structure, persistence, high availability requirements, large value storage content Pure key-value: a service with a large amount of data and concurrency
  1. As an alternative to memcached, where all values are simple strings, Redis supports richer data types

  2. Redis is much faster than memcached

  3. Redis can persist its data

How to ensure data consistency between the cache and the database in dual write?

  • As long as you use cache, you may involve the cache and database double storage double write, as long as you are double write, there will be data consistency problem, so how do you solve the consistency problem?

  • In general, if your system is not strict with cache + database must be consistent, cache can be a little bit with the occasional inconsistent database, it is best not to do this project, read and write request serialization, string into an in-memory queue, so that you can guarantee won’t appear inconsistent

  • Serialization causes the throughput of the system to be drastically reduced, requiring several times more machines than normal to support a single request on the line.

  • Another option, which may cause temporary inconsistencies but is very rare, is to update the database and then delete the cache.

Problem scenario describe To solve
Write to cache first, then write to database, cache write success, database write failure The cache write succeeds, but the database write fails or the response is delayed, and the next time the cache is read (concurrent read), a dirty read occurs Write to the database first. Set the old cache to invalid. When reading data, if the cache does not exist, the database is read and then the cache is written
Write database first, then write cache, database write success, cache write failure If writing to the database succeeds but writing to the cache fails, the next time the cache is read (concurrent read), the data will not be read When the cache is used, if the cache fails to be read, the database is read first and then the cache is written back
Asynchronous cache flush is required Database operation and write cache are not in the same operation step, such as in a distributed scenario where simultaneous write cache cannot be achieved or asynchronous flush (remedy) is required Determine which data is suitable for this scenario, and determine a reasonable data inconsistency time and user data refresh interval based on experience values

Redis Common performance issues and solutions?

  1. It is best for the Master not to do any persistence work, including memory snapshots and AOF log files, and especially not to enable memory snapshots for persistence.
  2. If data is critical, a Slave enables AOF backup and synchronizes data once per second.
  3. For the speed of Master/Slave replication and connection stability, the Slave and Master should be on the same LAN.
  4. Try to avoid adding slave libraries to stressed master libraries
  5. The Master invokes BGREWRITEAOF to rewrite the AOF file. The AOF occupies a large amount of CPU and memory resources during the AOF rewrite. As a result, the load of services is too high and the service is temporarily suspended.
  6. Master< – Slave1< – Slave2< – Slave3… The Slave can replace the Master. That is, if the Master fails, Slave1 can be used as the Master immediately.

Why doesn’t Redis officially offer a Windows version?

  • Because the current Linux version has been quite stable, and a large number of users, there is no need to develop a Windows version, but will bring compatibility issues.

What is the maximum capacity that a string value can store?

  • 512M

How does Redis do mass data insertion?

  • Redis2.6 Start Redis – CLI supports a new mode called Pipe mode for performing large data inserts.

Suppose there are 100 million keys in Redis, and 10W of them start with a fixed, known prefix. What if you found all of them?

  • Use the keys command to scan out a list of keys for a given pattern.
  • If the redis is providing services to online businesses, what is the problem with using keys? This is the time to answer a key redis feature: Redis single threaded. The keys command causes the thread to block for a period of time and the online service to pause until the command is executed. In this case, scan command can be used. Scan command can extract the key list of the specified mode without blocking, but there will be a certain probability of repetition. It is ok to perform deduplication on the client, but the overall time will be longer than that of direct keys command.

Have you done asynchronous queues with Redis? How is it implemented

  • Use list to save data information, Rpush to produce messages, and LPOP to consume messages. When there is no lPOP message, you can sleep for a period of time, and then check whether there is any information. If you do not want to sleep, you can use BLPOP, when there is no information, it will block until the information arrives. Redis can implement one producer, multiple consumers through the Pub /sub topic subscription model, with the disadvantage of course that the produced messages are lost when the consumer goes offline.

How does Redis implement delay queue

  • Sortedset is used, time stamp is used as score, message content is used as key, and Zadd is called to produce messages. Consumers use ZrangbyScore to obtain data n seconds ago for polling.

How does the Redis recycle process work?

  1. A client runs a new command to add new data.
  2. Redis checks memory usage, and if it exceeds the maxMemory limit, it reclaims according to the preset policy.
  3. A new command is executed, etc.
  4. So we keep crossing the boundary of the memory limit, by constantly reaching the boundary and then constantly reclaiming back below the boundary.

If the result of a command is that a large amount of memory is used (for example, the intersection of a large set is saved to a new key), it does not take long for the memory limit to be exceeded by this memory usage.

What algorithm does Redis recycle use?

  • LRU algorithm