Introduction of Redis
Author | Xie Ning, Volcano Engine r&d engineer
Redis is a typical KV storage used in our daily work, ranking no. 1 in DB-Engines key-value storage. Redis is memory-based storage that provides rich data structures, supporting string types, hash/list/collection types, and stream structures. Redis has a number of built-in features, among which the most important are:
- Replication: Redis supports asynchronous full and incremental synchronization. Data can be copied from Master to Slave to achieve high availability of Redis data.
- Persistence: Data can be persisted through RDB and AOF mechanisms.
- Support for sentry: The primary mode of sentry is to monitor the health of the Master node. When the Master node is found unavailable, Failover is performed to upgrade the Slave node to Master to ensure high availability of the Redis service.
- Provide cluster mode: Single Redis instances are limited by physical machine memory. When large Redis cluster capacity is required, the Redis cluster mode can be used. The principle of Redis cluster mode is that the data stored in it is sharded, and each part of the data is carried by a different Redis instance.
There are three typical application scenarios of Redis:
- Cache: Because Redis is memory based storage, its read and write requests are executed in memory, and the latency of the request response is very low, so Redis is used as a cache in many scenarios.
- Database: Redis supports persistence and can be used as KV database.
- Message queue: Redis supports stream data and encapsulates the pub-sub command on the basis of stream data structure to implement data publishing and subscription, that is, to provide the basic functions of message queue.
Redis protocol is a binary secure text protocol. It is simple and can be Telnet connected to a Redis Server instance to perform GET and SET operations.
K8s profile
K8s is a container orchestration system that automates the deployment, expansion, and management of container applications. The K8s provides some basic features:
- Automatic boxing: Specify the minimum and maximum resources required by Pod in K8s, i.e., limit and Request values. K8s can do Pod scheduling based on the value of the request and pull up the Pod on a node.
- Service discovery and load balancing: K8s provides dns-based service discovery mechanism as well as service-based load balancing.
- Automated online and rollback: K8s workload resources are involved here. K8s provides several different workload resources corresponding to different business scenarios:
- Support Deployment/DaemonSet
- Support StatefulSet
- Support CronJob/Job
These different workload resources can enable configuration changes to the service, such as updating the image, upgrading the binary, scaling the copy, and so on.
- Horizontal scaling: K8s naturally supports horizontal scaling, which can be dynamically scaled based on Pod CPU utilization, memory utilization, and third-party custom metrics.
- Storage Orchestration: THE K8s supports A PV – and PVC-based storage provisioning mode that can use storage inside a Pod via PV and PVC.
- Self-repair: One example is copy-keeping. For example, if you use Deployment to host a service, if the host of a Pod in Deployment becomes unavailable, K8s will pull up a new Pod on the available node to provide the service.
Services encountered in real work can be divided into stateful services and stateless services according to whether data persistence is needed. Services that do not require data persistence are considered stateless and include the following types:
- API class service: can be executed on any node. If you want to deploy such services on K8s, use K8s Deployment.
- Agent or Deamon-type service: Deployed on each machine, and each machine can deploy a maximum of one process. DaemonSet is optional on K8s to complete deployment.
- There is also a class of stateless services that require fixed unique identifiers. To meet these requirements, use K8s StatefulSet. Although StatefulSet is used to deploy stateful services, it can provide a fixed unique identity and can also be used to host stateless services.
Stateful services require stable persistent storage. In addition, there may be some other feature requirements:
- A unique identifier of stability
- Deploy and scale in an orderly and elegant way
- Orderly automatic scrolling updates
On K8s, we typically use StatefulSet Resource to host stateful services.
Redis Cloud native practice
The volcano Engine Redis cloud native practice will be introduced below. First of all, we will clarify the goals of Redis cloud native, mainly including the following:
- The abstraction and delivery of resources is done by K8s, without the need to focus on specific models. In the physical machine era, we need to determine the number of Redis instances that can be deployed on each machine based on the CPU and memory configuration of different models. Through Redis cloud native, we only need to declare the required CPU and memory size to K8s, and the rest of the scheduling, resource supply, machine screening is done by K8s.
- Node scheduling is performed by K8s. In the actual deployment of a Redis cluster, some components of the Redis cluster need to meet certain placement strategies to ensure high availability. In order to meet the placement strategy, in the era of physical machines, operation and maintenance systems are required to complete the screening of machines and the logic of calculation, which is relatively complex. K8s itself provides rich scheduling capabilities that can easily implement these placement strategies, thus reducing the burden on the operation and maintenance system.
- Node management and state maintenance are performed by K8s. In the era of physical machines, if a physical machine failed, the o&M system had to step in to understand the services and components deployed on it, and then pull up new nodes on other available machine nodes to replace the missing nodes due to machine downtime. If K8s manages nodes and maintains state, the complexity of o&M system can be reduced.
- Standardize Redis deployment and operation mode. The most important thing is to minimize manual intervention and improve operation and maintenance automation.
Redis cluster architecture
Let’s take a look at our Redis cluster architecture. A cluster has three components: Server, Proxy, and Configserver, each of which performs different functions.
- Server: the component that stores data, namely Redis Server. The back-end deployment model is a multi-fragment model. Server pods do not communicate between shards, which is share-nothing architecture. The internal shard has one master and many slaves, which can be one master and one slave, one master and two slaves, or even more.
- Proxy: receives requests from clients and forwards the requests to back-end Server fragments based on the read/write topology.
- Configserver: a configuration management component that is itself stateless and all state information is stored in the ETCD. All information about Server fragments during the cluster life cycle is stored in the Configserver. The Configserver checks the Master node of each fragment periodically. If the Master node of a fragment is unavailable, the Configserver performs a Failover to transform the Slave nodes available in the fragment into new Master nodes to ensure that the fragment can continue to provide services. In addition, the Configserver periodically updates its read/write topology based on Failover or other instance information to ensure that the Proxy can pull new and correct configurations from the Configserver.
Based on the Redis architecture and the characteristics of K8s, we abstract the basic configuration of a Redis cluster deployed on a K8s cluster:
- Deploy the stateless Configserver on K8s using Deployment. Because the Configserver can be shared by all Redis clusters, we specify that all Redis clusters share one Configserver to simplify operation and maintenance.
- Proxies are also stateless components that are deployed in Deployment.
- Because we have multiple shards and the Server is stateful, each shard is hosted with StatefulSet. When creating a cluster, we default Pod 0 in the shard as Master Pod and all other pods as slaves. This is an initial state that may change with Failover or other exceptions, but the Configserver records the latest status information in real time.
When Redis Server is started, it needs some configuration files, which involve some user names and passwords. We use Secret to store them. When The Server Pod is running, it is mounted to the Server Pod through the volume mechanism.
For Proxy, HPA is used to dynamically expand and shrink Proxy services based on Proxy CPU utilization.
Placement policy
For Server and Proxy components involved in a Redis cluster, we have some placement policy requirements, such as:
- Nodes in a Server shard cannot be on the same machine. That is, the primary and secondary nodes in a Server shard cannot be on the same machine. Convert to the model in K8s, where we want all the pods in a StatefulSet deployed on different machines. We use the required semantics under POd-Antiaffinity to ensure that all pods under StatefulSet are deployed on different machines.
- Proxy Pods in a cluster need to be distributed on different machines as much as possible, which can be satisfied by using the preferred semantics under POd-Antiaffinity and topology distribution constraints. The preferred semantics can only ensure that pods are distributed on as many different machines as possible. To avoid extreme cases where all pods are on the same machine, we use the topology distribution constraint.
storage
Storage uses PVC, PV and StorageClass with dynamic feeding capability. StorageClass is used to abstract different storage backends, supporting local disks and distributed storage. You can apply for storage directly by configuring StorageClass without knowing the implementation of the backend.
In addition, we use the StorageClass that supports dynamic provisioning, which automatically creates PVS of different sizes on demand. If static provisioning is used, it is impossible to predict the specifications of all Redis instances in advance and to create a specified number of PVS for them.
Redis cloud native features
After Redis cloud protogenization, Operator component is a Custom Controller based on Operator Pattern, which is mainly used to arrange Redis Cluster Resource and manage some changes of Redis Cluster. Configserver is also deployed on K8s, and all redis-related components are cloud biogenic.
A new cluster
- Common requests for creating a cluster are sent to ApiServer first. After ApiServer receives a request, the ApiServer uses the Watch mechanism of the Client Go to notify the Operator of the request.
- The Operator then asks ApiServer to create a StatefulSet for the Server.
- After K8s successfully creates StatefulSet for all servers, and all pods are in the ready state, there is no master-slave relationship between Server pods in all shards.
- When the Operator senses that all statefulSets are ready, it obtains all Server Pod information and registers with the Configserver.
- The Configserver then connects to all Slave nodes in the shard and executes the actual SLAVEOF command to ensure the true master/Slave relationship is established.
- The Operator periodically queries the progress of setting up the master/slave relationship in the Configserver. After the master-slave relationship for all fragments is successfully established, the Operator requests ApiServer to create the corresponding Proxy.
- After the Proxy is created, it pulls the latest topology from the Configserver to ensure that requests can be sent to normal backend fragments when providing services externally.
Expansion and fragmentation
In actual use, if the capacity is insufficient, expand the capacity horizontally. The request for sharding is similar:
- The request is sent to ApiServer.
- ApiServer pushes requests to the Operator.
- After the Operator senses this, it sends a request to ApiServer to create the StatefulSet corresponding to the new fragment.
- K8s creates the StatefulSet for the new shard. After being in the ready state, each Pod in a StatefulSet is also in an independent state, without establishing a true master-slave relationship.
- When the Operator learns that the newly created Server StatefulSet shard is ready, it registers the instance address of the new Server shard with the Configserver. Configserver now has two phases:
- Step 1: Guide the establishment of real master-slave relationships in new shards. That is, connect to all the slaves of the new shard and execute the SLAVEOF command.
- Step 2: It is important to guide the migration of data from the old shard to the new shard so that the new shard can function.
- The Operator will always check for data migration or rebalance progress. After the progress is complete, the Operator will update the status field in the Redis Cluster to indicate that the current fragment expansion operation is complete.
Shard shrinkage capacity
The fragment reduction process is similar to fragment expansion: A request is sent to ApiServer, and the Operator senses the request and sends the request to Configserver.
The Configserver does this:
- Start with data migration. Considering that some later shards are offline, data on the shards must be migrated to other available shards first to ensure data loss.
- The Operator will always query the progress of the rebalance as directed by the Configserver. After the Configserver is complete, the Operator requests ApiServer to perform a true Server StatefulSet deletion, which is a safe deletion operation.
Component upgrade
A Redis cluster involves two components: Proxy and Server.
The stateless Proxy is hosted by Deployment. If component upgrade is required, the corresponding image can be directly upgraded.
Server is a stateful component, and its upgrade process is relatively complex. To simplify the process, we take a Redis cluster with only one Server fragment as an example to introduce the upgrade process.
- The upgrade request of the Server component is sent to the ApiServer, which receives the request and pushes it to the Operator.
- The Operator first selects the Pod to be upgraded in the order of Pod numbers in the shard from largest to smallest.
- After the Pod is selected, it is removed from the Configserver read topology. (If the Pod to be removed is Master in the cluster topology, we call the Configserver API, perform Failover, change it to Slave, and then remove it from the read topology.)
- After that, the Operator waits for 30 seconds. The starting point of this mechanism is:
- First, the Proxy pulls configuration from the Configserver asynchronously. Data can be retrieved only after at least one round of data synchronization. Wait 30 seconds to ensure that all proxies have obtained the latest read topology and no new read requests are sent to the Server node to be upgraded.
- In addition, we need to ensure that we wait 30 seconds for all requests received by the Server Pod to be upgraded to be successfully processed and returned before we kill the Server Pod to be upgraded.
- After 30 seconds, ApiServer is requested to perform the actual Pod delete operation. After deletion, K8s will resend a new Pod. At this time, the newly created Server Pod is also an independent Server, without establishing a master-slave relationship with any node. The Operator senses that the new Server Pod is ready and registers it with the Configserver.
- The Configserver connects to a new Server Pod, establishes a master-slave relationship with the Master node of the shard, and synchronizes data.
- The Operator periodically checks whether new Server fragments have been synchronized to the Configserver. If so, the Operator assumes that a Server Pod has been upgraded. The upgrade process of all other Server Pods in the shard is similar. The shard is not considered to be upgraded until all Server Pods in the shard are upgraded. Finally, the Operator will update the Status information corresponding to the CRD of the Redis Cluster, indicating that the upgrade process and change operation of the current component have been completed.
Conclusion outlook
Using Redis as an example, this article introduces the abstract process of stateful service deployment to K8s, and introduces some exploration and practice of Volcano Engine in Redis cloud native direction. The volcano engine for Redis cloud native has completed the construction of basic functions, the future will be in a dynamic scalability capacity, abnormal recovery, fine-grained resource allocation and management, combination with the characteristics of K8s more deep thinking and practice, hope through the cloud the original biochemical way, further enhance operational ability of automation and the utilization of resources.
Q&A
Q1: Didn’t you use Cluster mode?
A: No, Cluster mode was used at the earliest. Later, as the business volume grew, it was found that Cluster had an upper limit on Cluster size, which could not meet business requirements.
Q2: Does Redis Proxy calculate which shard Key is in?
A: Yes. The Proxy hashes keys based on the Key Hash algorithm similar to Redis Cluster and distributes them to different Server fragments.
Q3: How is it defined that Slave can be promoted to Master? What are the switching steps?
A: The Configserver periodically sends health Check requests to the Master. The Master is considered unavailable only if several consecutive attempts to probe the same Master fail. In this case, the Configserver selects the available Slave from all the slaves in the shard to promote the Slave to the new Master. (Not all slaves are available. One Slave may also be suspended, or the delay in data synchronization between the Master and Slave is high.)
Q4: Is the Proxy unique for each Redis cluster or shared by all clusters?
A: Proxy is not exclusive to each Redis cluster. First, all clusters share a Proxy cluster, which has isolation issues. In addition, Proxy supports dynamic capacity expansion and elastic resource capacity expansion, so resources are not wasted.
Q5: What is the stability of the system and the time consuming of master/slave switchover?
A: Stability is good. The time spent on master/slave switching is a policy issue that requires some tradeoff. If the judgment policy is too aggressive, the primary/secondary switchover may be triggered frequently due to temporary network jitter. In practice, the master-slave switchover takes about 10s.
Q6: At what scale will Redis K8s deployment require more default configuration changes or direct source code changes? Will it be more difficult to build a Redis cluster based on dynamic expansion? Is there any way to make the Redis cluster infinitely scalable? Up to what?
A: The K8s cluster size deployed by Redis is optional. You can select the K8s cluster size based on the required Redis cluster capacity. Adapting cloud native will require some tweaking of the way services are discovered between components, but will not require much source code modification. Currently, we only support dynamic scaling of Proxy. Server is a stateful service and it is not good to access HPA (because some data migration may be involved), although HPA also supports automatic scaling of Statefulset service. Our Redis architecture theoretically allows for large cluster sizes, and the CRD limit is currently 1024 shards per Redis cluster.