What data types does Redis support?

  • String: Indicates the basic data type. It is a binary safe character string with a maximum of 512 MB
  • List: A list of strings in the order they were added
  • Set: An unordered collection of strings with no duplicate elements
  • Sorted set: a sorted collection of strings
  • Hash: a collection of key/value pairs

Is Redis single-process or single-threaded?

Redis is single-process single-thread, Redis uses queue technology to change concurrent access into serial access, eliminating the overhead of traditional database serial control.

Why is Redis single threaded?

Multithreading involves locking, and it is designed to consume CPU by switching threads. Since CPU is not a Redis bottleneck, the Redis bottleneck is most likely machine memory or network bandwidth. Multi-core CPU performance cannot be achieved with a single thread, but this can be resolved by starting an instance of Redis on a single machine.

The advantage of Redis

  • ** Fast speed. ** Since the data is stored in memory, it is similar to a HashMap, which has the advantage of O(1) time complexity for both lookup and operation.
  • Support rich data types, support string, list, set, sorted set, hash
  • Transactions are supported, and operations are atomic, meaning that changes to data are either all or none
  • Rich features: can be used for cache, message, by key set expiration time, will be automatically deleted after expiration

What are the advantages of Redis and memcached

  • All of memcached’s values are simple strings, and Reids are an alternative to these, supporting richer data types
  • Redis is much faster than memcached
  • Redis can persist its data
  • Redis supports data backup, namely, data backup in master/ Slave mode

How to set the expiration time and permanent validity of Redis key respectively?

EXPIRE and PERSIST commands.

What physical resources does Redis consume?

Memory.

Why does Redis need to put all data in memory?

Redis reads data to memory for the fastest read/write speed and writes data to disk asynchronously.

So Redis is characterized by fast and persistent data. If data is not kept in memory, disk I/O speed severely affects Redis performance.

As memory gets cheaper and cheaper, Redis will become more and more popular. If the maximum memory usage is set, new values cannot be inserted after the number of existing records reaches the memory limit.

What is the concept of a Redis hash slot?

The Redis cluster does not use consistent hash, but introduces the hash slot concept. The Redis cluster has 16384 hash slots. After each key is verified by CRC16, the model of 16384 is taken to determine which slot to place.

What is the use of pipes in Redis?

A one-time request/response server can be implemented to process new requests even if the old one has not yet been responded to. This allows you to send multiple commands to the server without waiting for a reply, which is finally read in one step.

This is pipelining, a technology that has been widely used for decades. For example, many POP3 protocols have been implemented to support this feature, greatly speeding up the process of downloading new messages from the server.

How to understand Redis transactions?

A transaction is a single isolated operation: all commands in the transaction are serialized and executed sequentially. The transaction will not be interrupted by command requests from other clients during execution.

A transaction is an atomic operation: all or none of the commands in a transaction are executed.

What are the Redis transaction related commands?

MULTI, EXEC, DISCARD, and WATCH

Redis memory reclamation mechanism

Redis’s memory reclamation focuses on the following two aspects:

  1. Redis expiration policy: Deletes the key value of the expiration time
  2. Redis flushing policy: Memory flushing is triggered when the maxMemory limit is reached

Redis expiration policy

There are three types of Redis expiration policies:

  1. Periodic expiration A timer needs to be created for each key for which the expiration time is set. The timer is cleared immediately after the expiration time. This policy can immediately clear expired data and is memory friendly; However, it takes up a lot of CPU resources to process expired data, which affects the response time and throughput of the cache.

  2. Lazy expiration Only when a key is accessed, the system checks whether the key has expired and clears the expired key. This strategy maximizes CPU savings but is very memory unfriendly. In extreme cases, a large number of expired keys may not be accessed again and thus will not be cleared, occupying a large amount of memory.

  3. Periodic expiration Every time, a certain number of keys in the expires dictionary of a certain number of databases are scanned and expired keys are cleared. This strategy is a compromise between the first two. By adjusting the interval of periodic scan and the time limit of each scan, you can achieve the optimal balance of CPU and memory resources in different situations.

Redis uses both lazy expiration and periodic expiration strategies.

What are the data elimination strategies Redis has

In Redis, the user is allowed to set the maximum memory size server. Maxmemory. When the Redis memory data set size increases to a certain size, the data obsolescence policy is implemented

  • Volatile – LRU: Selects the least recently used discard from a dataset that has been set to expire
  • Volatile – TTL: Selects data to be obsolete from a set that has been set to expire
  • Volatile -random: Random selection of data from a set that has been set to expire
  • Allkeys-lru: Cull the least recently used data from the dataset
  • Allkeys-random: Randomly selects data from the data set for elimination
  • Noenviction: Disables data elimination

What persistence methods does Redis support

RDB persistence works by periodically dumping Redis data records in memory to RDB files on disk. In the specified period of time, the snapshot of the data set in memory is written to the disk. The actual operation is to fork a sub-process. The data set is written to a temporary file first.

AOF (Append Only File) persistence works by appending Redis operation logs to files. Each write and delete operation processed by the server is recorded in logs. The query operation is not recorded. You can open the file to view detailed operation records. These commands are re-executed when the server restarts to restore the original data. The AOF command appends each write operation to the end of the file using the Reids protocol. Redis can also rewrite AOF files in the background so that AOF files are not too large.

Pros and cons of Redis persistence?

RDB persistence

  • Advantages: RDB file compact, small size, fast network transmission, suitable for full copy; The recovery rate is much faster than AOF. Of course, one of the most important advantages of RDB over AOF is its relatively small impact on performance
  • Disadvantages: The fatal disadvantage of RDB files is that they cannot be persisted in real time because of the way their data snapshots are persisted. In today’s increasingly important world of data, a large amount of data loss is often unacceptable, so AOF persistence is called mainstream. In addition, RDB files need to meet the specific format, poor compatibility.

AOF persistence corresponds to RDB persistence. AOF has the advantages of second-level persistence and good compatibility, but has the disadvantages of large files and slow recovery, which has a large impact on performance

How to choose Redis Persistence policy?

Before introducing persistence strategies, it’s important to understand that whether RDB or AOF is enabled, persistence comes at a performance cost. In contrast to RDB persistence, on the one hand, the main Redis process will block when bdSave forks. On the other hand, the child process will also bring IO pressure when writing data to disk. For AOF persistence, data is written to hard disk more frequently (in seconds under EverySec policy), IO pressure is higher, and the setting may cause AOF appending blocking files. In addition, the rewriting of AOF files is similar to the basave of RDB, with blocking on forks and IO stress on child processes. Because AOF writes data to hard disk more frequently, the impact on Redis main process performance is greater.

In actual production environments, there are various persistence strategies based on data volume, data security requirements of applications, and budget constraints. For example, do not use any persistence at all, use EITHER RDB or AOF, or colleagues enable RDB and AOF persistence, etc. In addition, the choice of persistence must be considered in conjunction with the master-slave policy of Redis, because master-slave replication and persistence also have the function of data backup, and the master and slave can choose the persistence scheme independently.

Why do Redis partitions?

Partitioning allows Redis to manage more memory, and Redis will be able to use all the machine’s memory. Without partitions, you can only use the memory of one machine. Partitioning allows Redis computing power to be multiplied by simply adding computers, and Redis network bandwidth to be multiplied by adding computers and network cards.

What is the master-slave replication model for Redis clusters?

In order for the cluster to remain available even if some nodes fail or most nodes fail to communicate, the cluster uses a master-slave replication model, with n-1 replicas for each node

Will there be write losses in the Redis cluster? Why is that?

Redis does not guarantee strong data consistency, which means that in practice the cluster may lose writes under certain conditions

How are Redis clusters replicated

Asynchronous replication.

How does Redis optimize memory

As far as possible use hash (hashes), hash table (mean list stored number less) use of memory is very small, so you should as far as possible to your abstract data model into a hash table, such as your web system has a user object, not for the user’s name, surname, email, password key alone, Instead, you should store all of this user information in a hash table.

Common usage scenarios of Redis

  • Session sharing (single sign-on)
  • Page caching
  • The queue
  • Leaderboards/calculators
  • Publish/subscribe

What are the architectural patterns of Redis? What are their characteristics?

Stand-alone version

Features: Simple

Existing problems:

  1. Limited memory
  2. Limited processing capacity
  3. Not highly available

A master-slave replication

Redis replication allows users to create as many replicas of a Redis server as they want. The replicated server is the master server and the replicated server is the slave server. If the network connection between the primary and secondary servers is normal, the primary and secondary servers will have the same data, and the primary server will always synchronize its data updates to the secondary server, thus ensuring that the data on the primary and secondary servers is always the same.

Question:

  1. High availability is not guaranteed
  2. There is no pressure to solve the master write

The sentry

Redis Sentinel is a distributed system that monitors primary and secondary Redis servers and automatically fail-over when the primary server goes offline. Three of these features:

  • Monitoring: Sentinel continuously checks whether your primary and secondary servers are functioning properly.

  • Notification: Sentinel can send notifications to administrators or other applications via the API when a monitored Redis server has a problem.

  • Automatic failover: When a primary server fails, Sentinel starts an Automatic failover operation.

Features:

  1. Ensure high availability
  2. Monitor each node
  3. Automatic failover

Disadvantages:

  1. Switchover in master/slave mode takes time and results in data loss
  2. There is no pressure to solve the master write

Cluster (Proxy)

Twemproxy is an open source Redis and Memcache fast/lightweight proxy server for Twitter; Twemproxy is a fast single-threaded proxy that supports Memcached ASCII and Redis protocols.

Features:

  1. Multiple hash algorithms: MD5, CRC16, CRC32, CRC32a, Hsieh, Murmur, and Jenkins
  2. Failed nodes can be automatically deleted
  3. The back-end Sharding Sharding logic is transparent to services, and the business side reads and writes in the same way as the operation of a single Redis

Disadvantages:

  1. A new proxy is added to maintain its high availability.
  2. The failover logic must be implemented by itself. It cannot automatically transfer faults. The scalability is poor, and manual intervention is required to expand or shrink the capacity

Cluster (direct connection)

Redis 3.0 and later supports redis-cluster clusters. Redis-cluster uses a centrless structure, where each node stores data and the status of the entire cluster, and each node is connected to all other nodes.

Features:

  1. There is no central architecture (no node affecting performance bottlenecks) and no proxy layer.
  2. Data is distributed on multiple nodes based on slot storage. Data is shared among nodes to dynamically adjust data distribution.
  3. Scalability, linear scaling up to 1000 nodes, nodes can be added or removed dynamically.
  4. High availability. The cluster is still available when some nodes are unavailable. Add Slave to make backup data copy
  5. Automatic failover is implemented. Status information is exchanged between nodes through the Gossip protocol. Role promotion from Slave to Master is completed by voting mechanism.

Disadvantages:

  1. Resource isolation is poor, and it is easy to affect each other.
  2. Asynchronous data replication does not ensure strong data consistency

Have you ever used Redis distributed lock and how is it implemented?

Use setnx to fight for locks, and use expire to add an expiration time to locks in case they are forgotten to release. What happens if a process crashes unexpectedly or needs to restart maintenance after executing EXPIRE after setnx? The set directive has very complex parameters. It should be possible to combine setnx and EXPIRE into one directive.

Have you ever used Redis for asynchronous queues and how? What are the disadvantages?

The usual list structure is used as a queue, rpush to produce messages, and LPOP to consume messages. When there are no LPOP messages, sleep for a while and try again.

Disadvantages:

  • In the event that the consumer goes offline, the production message is lost and a professional message queue such as RabbitMQ is used.
  • Can we produce once and consume many times?
  • Using the PUB/SUB topic subscriber pattern, a 1:N message queue can be implemented.

What is cache penetration? How to avoid it?

The cache to penetrate

General cache systems cache queries based on keys. If there is no corresponding value, go to the back-end system to search for it (such as DB). Some malicious requests intentionally look for nonexistent keys, and the volume of requests can cause a lot of stress on the backend system. This is called cache penetration.

How to avoid it?

  1. If the query result is empty, the cache time is set shorter, or the cache is cleared after the data corresponding to the key is inserted.

  2. Filter for keys that must not exist. You can put all possible keys into a large Bitmap and filter the query through this Bitmap.

What is cache avalanche? How to avoid it?

Cache avalanche When a cache server restarts or a large number of caches fail in a certain period of time, the failure will put a lot of pressure on the backend system, causing the system to crash.

How to avoid it?

  1. After cache invalidation, the number of threads that read the database write cache is controlled by locking or queuing. For example, only one thread is allowed to query data and write to the cache for a key, while the other threads wait.
  2. A1 is the original cache and A2 is the copy cache. When A1 fails, access to A2 is available. Set the expiration time of A1 cache to short-term and A2 to long-term

Cache concurrency

This refers to the concurrency problem caused by multiple Redis clients setting keys at the same time. In fact, Redis itself is a single thread operation, multiple client concurrent operation, according to the principle of first come, first executed, the first arrived first executed, the rest of the block. The alternative, of course, is to serialize the redis.set operations in queues, which must be executed one by one.

Cache warming

Cache preheating means that relevant cache data is directly loaded into the cache system after the system goes online.

This avoids the problem of first querying the database and then caching the data when the user requests it! Users directly query cached data that has been preheated in advance!

Solution:

  1. Write a cache refresh page directly, manual operation when on-line;
  2. The amount of data is not large and can be loaded automatically when the project is started.

The goal is to load data into the cache before the system goes live.

How does the Redis recycle process work?

  1. The client writes data
  2. After redis Server receives a write operation, it checks the maxMemory limit, and if the limit is exceeded, it cleans up some data according to the corresponding policy
  3. The write operation is complete.

Redis is single-threaded, how to improve the utilization of multi-core CPU?

You can deploy multiple instances of Redis on the same server and use them as different servers. At some point, one server is not enough anyway, so if you want to use more than one CPU, you can consider shards.

Does Redis take effect in real time without restarting the configuration?

For running instances, there are many configuration options that can be modified through the CONFIG SET command without any kind of reboot. Starting with Redis 2.2, you can switch from AOF to RDB’s snapshot persistence or other means without rebooting Redis. Retrieve the CONFIG GET * command for more information. Occasional reboots are necessary, such as when you need to update the Redis program to a new version, or when you need to modify some configuration parameters that are not currently supported by the CONFIG command.

What happens when Redis runs out of memory?

If the upper limit is reached, Redis write commands will return an error message (but read commands will return normally). Or you can use the configuration flush mechanism by using Redis as a cache, flushing out old content when Redis reaches its memory limit.

Distributed Redis is early or late scale up to do well again? Why is that?

Since Redis is so lightweight (a single instance uses only 1M memory), it is best to start many instances at first to prevent future expansion. Even if you only have one server, you can start with Redis running in a distributed fashion, using partitions and launching multiple instances on the same server. Setting up a few more Redis instances to start with, such as 32 or 64 instances, may seem cumbersome to most users, but the sacrifice is worth it in the long run. That way, when your data is growing and you need more Redis servers, all you need to do is simply migrate Redis instances from one service to another (without worrying about repartitioning). Once you add another server, you need to migrate half of your Redis instances from the first machine to the second machine.

How is Redis different from other key-value stores?

Redis has more complex data structures and provides atomic operations on them, which is a different evolutionary path from other databases. Redis data types are transparent to programmers while being based on basic data structures without additional abstractions. Redis runs in memory but can persist to disk, so there is a memory tradeoff for high-speed reads and writes to different data sets, since the amount of data cannot be larger than hardware memory. Another advantage of an in-memory database is that it is much easier to operate in memory than the same complex data structures on disk, allowing Redis to do a lot of things with a lot of internal complexity. At the same time, they are compact in terms of disk formats and are produced in an appending manner because they do not require random access.


Good article recommendation:

  • Spark Streaming reads Kafka data in two ways

  • Spark Task Scheduling

  • First Spark execution | Spark started quickly

  • Spark Quick Start -RDD Operation manual

  • Spark Quick Start -RDD

  • Hive performance optimization Summary