preface

Let’s see what Redis is. The official synopsis explains:

Redis is a BSD-based open source project. It is a storage system that keeps structured data in memory. You can use it as a database, cache and messaging middleware. It also supports strings, Lists, Hashes, sets, sorted Sets, bitmaps, Hyperloglogs and Geospatial Indexes. It also has built-in replication, Lua scripting, LRU, transactions, high availability via Redis Sentinel and automatic sharding via Redis Cluster. Transactions, publish/subscribe, automatic failover, and so on.
To sum up, Redis provides a wealth of features that may be dazzling at first sight. What are these features for? What problems have been solved? When will the corresponding function be used? So let’s start from zero, step by step evolution to roughly explain.

Starting from 0

The initial requirements were very simple, we had an API that provided a list of hot news stories, and the CONSUMERS of the API complained that each request took about 2 seconds to return results.

We then looked at how to improve the performance of API consumer awareness, and soon came up with the simplest and most crude solution: add HTTP-based cache control cache-control:max-age=600 to the API response, allowing the consumer to cache the response for ten minutes.

API consumers can significantly improve perceived performance (less than 10 minutes) if they effectively take advantage of the cache control information in the response. But there are two drawbacks: the first is that API consumers may get old data within 10 minutes of the cache taking effect; The second is that if the API client ignores the cache and accesses the API directly, it still takes 2 seconds.

Native-memory based caching

In order to solve the problem that it still takes 2 seconds to call API, after investigation, the main reason is that it takes nearly 2 seconds to get hot news using SQL. Therefore, we came up with a simple and crude solution. That is, the result of the SQL query is directly cached in the memory of the current API server (set the cache validity time to 1 minute). Subsequent requests within 1 minute read directly from the cache instead of taking 2 seconds to execute SQL.

If the API receives 100 requests per second, then 6000 requests per minute, meaning that only the first 2 seconds of the crowded requests will take 2 seconds, and all the subsequent 58 seconds will be able to respond without waiting another 2 seconds.

Other apis found this a good idea, and soon we found that the API server was running out of memory…

Redis on the server

We found ourselves having to come up with another solution when the API server’s memory was full of caches. The most straightforward idea is to dump the cache on a dedicated server with a large memory configuration. And then we went after Redis… How to configure the deployment of Redis is not explained here, redis official has a detailed introduction. We then used a separate server as the Redis server, and the memory pressure on the API server was resolved.

3.1 Persistence

A single Redis server will be in a bad mood for a few days a month and then go down, causing all the caches to be lost (Redis data is stored in memory). Although it was possible to bring the Redis server back online, the pressure on the API server and database was suddenly increased due to the cache avalanche caused by the loss of data in memory.

So this is where Redis persistence comes in handy to mitigate the effects of the cache avalanche. Redis persistence means that Redis writes data from memory to hard disk and loads the data upon redis restart to minimize the impact of cache loss.

3.2 Sentinel and Replication

The unannounced Redis server outage is a nuisance. So what do we do? Answer: back up one, you hang it up. So how to know a certain Redis server is down, how to switch, how to ensure that the backup machine is a complete backup of the original server?

That’s where Sentinel and Replication come in. Sentinel can manage multiple Redis servers, providing monitoring, alerting, and automatic failover. Replication is the server responsible for making one Redis server capable of hosting multiple backups. Redis also uses these two features to ensure that Redis is highly available. In addition, Sentinel is an advantage of Redis’s publish and subscribe capabilities.

3.3 Cluster

The CPU and I/O resources on a single server are always limited. We can use master/slave replication to separate read/write resources and transfer some CPU and I/O pressure to the slave server. However, what about memory resources? The master/slave mode only backs up the same data, and cannot expand memory horizontally. The memory of a single machine can only be enlarged, but there is always a limit.

So we needed a solution that would allow us to scale horizontally. Ultimate goal is to each server responsible for only part of the integrated to let all the server and the REST of the world consumers, this group of the distributed server is like a centralized server (before interpretation in the REST of the blog, explained based on network is distributed in difference: based on the network application architecture).

Before the official distributed scheme of Redis comes out, there are twemproxy and CODIS schemes. Generally speaking, these two schemes rely on proxy for distribution, that is to say, Redis itself does not care about distributed matters, but is entrusted to Twemproxy and CODIS. The cluster solution given by Redis is to do the distributed part of the matter in each Redis server, so that it does not need other components can independently complete the distributed requirements.

We don’t care about the advantages of these solutions, we care about what kind of things are distributed here to deal with? That is, twemProxy and CODIS independently handle the distributed part of the logic and the cluster part of the logic integrated into the Redis service to solve the problem?

As we said earlier, a distributed service looks like a centralized service to the outside world. The problem with this is that increasing or decreasing the number of servers in a distributed service should be insensitive to the clients consuming the service; This means that clients cannot penetrate distributed services and tie themselves to a single server, because then you cannot add new servers or failover.

There are two ways to solve this problem:

The first approach is the most straightforward, which is that I add an intermediate layer to isolate the specific dependencies, the way TwemProxy does, so that all clients can only consume the RedSI service through it, and isolate the dependencies through it (but you can see that TwermProxy becomes a single point). In this case each Redis server is independent and unaware of the existence of the other;

The second way is to let the Redis server know the existence of each other, through the redirection mechanism to guide the client to complete their required operations, such as the client link to a redis server, say I want to perform this operation, the Redis server found itself unable to complete the operation, Then can finish this operation to the client and server information to let the client to request another server, then you will find that every redis server need to keep a full distributed server information of a material, or it how do you know which server for the client to try to find other action to execute the client want?

All servers in a distributed service have information about the services they can provide. The difference is that the first approach is to manage this part of the information separately, using this information to coordinate multiple independent Redis servers at the back end; The second approach is to make each Redis server hold this information and know each other’s existence to achieve the same purpose as the first approach. The advantage is that there is no need for an additional component to handle this part of the matter.
The specific implementation details of Redis Cluster is to use the Hash slot concept, that is, pre-allocated 16384 slots: in the client through the Key CRC16 (Key) % 16384 operation to get the corresponding slot is which; In redis server is each server is responsible for part of the tank, when there is a new server to add or remove, and then to migrate these slots and the corresponding data, each server at the same time have a full tank and its corresponding server information, which makes the server can be redirected to the client’s request.

Client Redis

The third section above mainly describes the evolution of Redis server and explains how Redis evolves from a stand-alone service to a highly available, decentralized, and distributed storage system. This section focuses on the Redis services that clients can consume.

4.1 Data Types

Redis supports a wealth of data types, from the most basic string to complex and commonly used data structures:

  1. String: Indicates the basic data type. It is a binary safe character string with a maximum of 512 MB.
  2. List: A list of strings in the order they were added.
  3. Set: An unordered collection of strings with no duplicate elements.
  4. Sorted set: a sorted collection of strings.
  5. Hash: a collection of key-value pairs.
  6. Bitmap: A more detailed operation, expressed in bits.
  7. Hyperloglog: Data structure based on probability.
These numerous data types are intended to support the needs of various scenarios, each of which has a different time complexity. In fact, these complex Data structures are equivalent to the implementation of Remote Data Access (RDA) that I described earlier in the Web Application-based architecture style of my Uncovering REST blog series, where a set of standard operation commands are executed on the server, Getting the desired scaled-down result set between servers simplifies client usage and improves network performance. For example, if there is no such data structure as list, you can only save the list as a string, the client gets the complete list, and then submits the complete list to Redis, which will be very wasteful.

4.2 transactions

Each of the above data types has a separate command to operate on, and in many cases you need to execute more than one command at a time, and it needs to succeed or fail simultaneously. Redis’s support for transactions also stems from this requirement, namely the ability to execute multiple commands in sequence at once, while maintaining atomicity.

4.3 the Lua script

On a transaction basis, lua can be useful if we need to perform more complex operations (including some logical judgments) on the server side at once (such as extending the expiration of a cache while fetching it). Redis guarantees atomicity of Lua scripts and can replace redis transaction related commands in certain scenarios. This is equivalent to a concrete implementation of Remote Evluation = REV introduced in the architecture style of Web-based applications.

4.4 the pipe

Because the redis client and server are connected over TCP, only one command can be executed per connection by default. Pipes allow multiple commands to be processed in a single connection, saving some of the overhead of TCP connections. The difference between pipes and transactions is that pipes save on communication overhead but do not guarantee atomicity.

4.5 Distributed Lock

The Redlock algorithm is recommended, which uses a string type, gives a specific key to lock, and then sets a random value. To cancel the lock, use lua scripts to get the comparison first and then remove the key. The specific commands are as follows:

SET resource_name my_random_value NX PX 30000
if redis.call("get",KEYS[1]) == ARGV[1] then
  return redis.call("del",KEYS[1])
else
  return 0
end

Copy the code

conclusion

This paper focuses on the abstract level to explain the functions of Redis and the purpose of its existence, but does not care about its specific details. This allows us to focus on the problem it solves, and the concept of abstraction allows us to choose a more appropriate solution in a particular scenario, rather than being limited to its technical details.

The last

Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.