Rui Gu: The road to building an open source enterprise Redis client

Abstract:

On behalf of Aliyun, the author attended the RedisConf 2018 conference and conducted an interview with Rui Gu, the author of open source Redisson client. Rui Gu was deeply impressed by the international influence of Redis community and her work on open source. The details of the interview are as follows.

The above photos are Rui Gu, Bai Chen and Zexian Ali yun

Why did you design and develop Redisson? Since 2004, I have been engaged in industrial automation and industrial IoT work, which involves monitoring and signal processing of a series of equipment in many scenarios. This scenario has high requirements on real-time processing capability, system stability, high availability, and disaster recovery capability. A lot of ideas came from the decision to adopt Redis as a real-time database in 2012. Redis looks similar but different from the data structures commonly used in programming languages like Java, and I’ve been looking for ways to connect the two. This idea intensified after the commercial Redis was started in 2013. Therefore, I started some related exploration and practice in my spare time, and finally decided to adopt the form of dynamic class to make Redis data structure operate more like the corresponding structure of Java. Nikita, back in Moscow, seems to have had a similar idea. He started working on the application on January 14, 2014, and soon open-source Redisson. At the same time my practice has also made some progress, and the preliminary realization of some basic functions. However, due to various reasons at work and lack of confidence at that time, after all, this was a road that no one had traveled before, and the progress was relatively slow after half a year. Nikita faced the same problem, but he just hung in there. He showed no signs of giving up. In the second half of 2014, I began to pay attention to the Redisson project. After learning more about it, I immediately had a strong resonance. Although it shared the same concept with my practice, it was from a different starting point. As a result, we started to communicate with each other and decided to give up our own practice program and join Redisson at the beginning of 2015. At this point, we no longer walk alone on this untraveled road.
What problem does Redisson solve? What advantages does it have over other Redis clients? 2.1) In the IoT industry, various real-time status values of a group of devices are often treated as an object with business significance, which is managed by the JVM in memory. If this object is stored in the String structure of the Redis database, each update of a status value needs to be serialized and deserialized. At the same time, it is also possible to face the concurrency problem caused by different state values of the same object at the same time. In the actual application, the Hash data structure provided by Redis is used to store this object. Only in this way can such problems be effectively avoided. Although the Hash structure of Redis is very similar to that of HashMap in Java, it is not as easy for applications to manipulate Redis as it is to manipulate HashMap. And if the usage of Redis related commands is not understood, or the details are not handled properly, it can eventually cause business problems. Redisson’s Map was created to fill the gap between Redis’s Hash and Java’s HashMap.

Figure 1 – Hash operation of Jedis

Figure 2 – Java ConcurrentHashMap

Figure 3 – Redisson’s ConcurrentHashMap

2.2) Industrial control and some IoT scenarios have high requirements for real-time processing capability, and all signals must realize millisecond response. Such scenarios are also characterized by large amounts of concurrency. Unlike, say, social e-commerce, there is almost no peak/valley traffic in these applications. So peak-cutting and valley filling, common in other scenarios, only adds to the burden. When using a synchronous programming model like Jedis in such a scenario, you need to always ensure that the number of concurrent threads is one-to-one with the number of connections, otherwise an error will be reported if there is no connection available. Redisson, by contrast, utilizes Netty’s asynchronous programming framework, uses an Eventloop-like thread pool similar to Redis server architecture, and manages connections flexibly in combination with connection pooling. Finally, a small number of connections can meet the requirements of a large number of threads, and the competition between threads can be fundamentally alleviated. The asynchronous mode also prevents data requests from blocking business threads.

2.3) The development of Redis has undergone many technological changes. In the course of iteration, the official version not only added many useful features, but also developed several high availability solutions. At the same time, the community and cloud computing providers have developed a variety of high availability solutions based on proxies on the official version. In contrast, each of these schemes has its advantages and disadvantages, and applies to different scenarios. Diversity brings convenience as well as trouble. For example, in service expansion, the simple single-node or master-slave mode is moved to sentinel or cluster mode. Or business migration from a self-built Redis environment to the cloud; Or use different Redis running modes at different stages of a project’s continuous delivery of CD/CI. Developers are often required to develop a set of usage methods to match different high availability solutions. To make a project highly coupled to the Redis runtime pattern, the business code must be changed when the Redis runtime pattern changes. Redisson provides an easy way to file configurations that support different Redis operating modes and environments through different JSON, YAML, or SpringXML files without modifying the program code. This reduces the difficulty of both development and operation.

Redisson has done a lot of work on distributed locking. Could you introduce his practice in this area? For Redis distributed lock implementation, online discussion related articles are basically “rotten street”. However, almost all the relevant introduction is a simple encapsulation based on the simple use of setnx command, and few articles analyze the defects of such design. In this era of blogs everywhere, code casually posted, such a situation invisible to everyone an illusion, is Redis distributed lock can only exist in such a simple form, even if there is a defect can only be avoided in the business code. So why not replace it with a slightly more complex design in exchange for business flexibility? Before redesigning Redis distributed locks, let’s take a look at the limitations of distributed locks encapsulated using setnx alone.

1). No reentrancy In the execution of setnx commands, the name specified on business is usually used as the key name, and the time interval or random value is used as the value. Such implementations do not have the ability to track request threads, nor do they have the ability to count reentrant counts, and some implementations do not even have the atomicity of operations. When a business needs to use the same lock in more than one place, it is obvious that using a lock that does not have reentrant properties can easily cause deadlocks. Especially in scenarios with recursive logic, deadlocks are more likely to occur. The Lock objects and Sychronized chunks in the Java Concurrency toolkit are reentrant, and it’s easy for regular users to overlook this flaw in Setnx.

2). Renewal is not supported. In distributed environment, in order to ensure lock activity and avoid deadlock caused by program downtime, distributed lock usually introduces an expiration time, after which it is considered to be automatically unlocked. The premise of such design is that the developer has a good grasp of the granularity of the automatic unlock time. If the time is too short, the lock may fail before the task is finished, while if the time is too long, other nodes need to wait for a long time to recover when the program is down or the service node is down, and it is difficult to ensure the SLA of the service. The design of SETNx lacks a renewal mechanism of the extension period of validity, which cannot ensure that the business can be unlocked after the work is finished, nor can it ensure that when a program breaks down or a business node hangs up, other nodes can quickly restore the business processing capacity.

3). No blocking ability. Due to the difference of Locking strategies, each lock has its own characteristics. But under normal circumstances these locks have two commonalities: one is mutual exclusion, the other is obstructive. Mutual exclusion means that at most one thread can be allowed to pass at any time. Obstructive means that in the event of a race, a thread that has not acquired a resource will stop continuing until the resource is successfully acquired or the operation is canceled. It is clear that the setnx command provides only mutual exclusion, but not the ability to block. Although it is possible to introduce a spin mechanism in the business code for re-fetching, this simply moves the functionality that should be implemented in the lock to the business code, making it easier to implement the lock by increasing the complexity of the business code seems counterproductive.

Redisson’s distributed locks add thread-safety features while meeting the above three basic requirements. The Hash structure of Redis is used as the storage unit, the name specified by the business is used as the key, the random UUID and thread ID are used as the field, and the locking times are stored as the value. The UUID is also stored on the client side as an instance variable of the lock. Using UUID and thread ID as tags ensures the independence of operation and meets the requirements of thread safety when running instances of multiple threads using the same lock simultaneously.

The Lua script is used to check whether the lock exists. If not, a hash field is created and an expiration time is set before the lock is returned, indicating that the lock is successfully locked. If the hash field already exists, check whether the random field is consistent with the thread ID. If they are the same, the value of value will be incremented and the expiration time will be updated again. At this time, it indicates that the same node and the same thread are locked successfully again, thus ensuring reentrancy. If the hash exists and the fields are inconsistent, another node or thread already owns the lock. So the Lua script returns the current validity period of the hash. When the result is returned to the client, if the lock is successful, the thread pool will execute the renewal according to the set parameters, and finally inform the requesting thread to continue the subsequent operation. If the lock fails, listen to a PubSub channel suffixed with this key until you receive the unlock message and try again.

Use the Lua script to check whether the lock exists. If the lock does not exist, release the unlock message and return. If it still exists, the tag is checked to see if it exists. If it does not, the lock is not owned by the thread, in which case the requesting thread will receive an error. If it exists, the lock is owned by the thread. In this case, decrement the label field to determine that if the number of locks returned is still greater than zero, the current lock is still valid, but the reentrant count is reduced. Otherwise, it indicates that the lock is fully unlocked. In this case, delete the lock immediately and publish the unlock information.

Solved setnx Redisson reentrant lock lock many congenital deficiencies, but because it still took the form of a single key is stored in the fixed a Redis node, and has the automatic failure. Such a design while you can largely avoid client node hang up the impact of downtime or business, but the resulting defects are met the service side Redis process downtime or nodes hang up, or is likely to cause information loss locks, such defects obviously unable to meet the certain scene of high availability requirements.

In this situation, Redis author Salvatore proposed a highly available distributed lock algorithm based on multiple nodes, named RedLock (RedLock: Redis. IO /topics/dist…). . In this algorithm, the client needs to attempt to acquire an independent lock on multiple nodes at the same time. Only when the client successfully obtains most of the locks at one time can it be regarded as winning the highly available distributed lock. Otherwise, the client needs to release the partially acquired locks and wait for a random time and try again.

In algorithm design, Salvatore still uses SETNX as an example to explain the mutual exclusion characteristics of distributed locks. In algorithm implementation, Redisson’s Redisson Redlock uses the previously mentioned more flexible and convenient reentrant lock. Redisson’s extended algorithm is the only approved Java implementation on the Redis website.

Although Redlock’s algorithm provides highly available features, its applicability is limited based on most visibility principles. Redisson provides a highly available distributed interlock Redisson Multilock based on an enhanced algorithm. This algorithm requires that the client must successfully acquire all the locks of the nodes to be regarded as successful locking, which further improves the reliability of the algorithm.

4. Could you introduce the most cutting-edge development direction of Redisson? Redisson’s development path determines that it always takes the lead in the industry in the extension and application of Redis, the most representative of which is the local cache function. This feature was developed in 2016 to address the real needs of an enterprise edition user. The principle is to sacrifice the space of the client’s own memory in exchange for the time spent on the network when frequently retrieving some commonly used data. The feature was opened to the public in September of the same year and immediately attracted the attention of many users. The emergence of this feature has accelerated the migration of traditional IT users from other similar platforms to Redis. It was more popular than Nikita or I could have imagined. Every year, companies travel thousands of miles to Redis and share their experiences using Redisson to migrate to Redis from other platforms. It was this trend that caught the attention of Redis author Salvatore. After meeting with some users face-to-face, Salvatore decided that client caching would be an important part of Redis’ future development and proposed the RESP3 protocol. The advent of RESP3 will provide server-side coordination capabilities for client-side caching capabilities. Salvatore also invited the Redisson team to serve as a member of the expert group that created the Redis client caching standard.

5. How does Redisson ensure continued growth as an open source project? In order to ensure the sustainable and healthy development of the Redisson project, and to avoid the embarrassing situation that other open source projects face when they are not maintained after a certain period of time, Nikita and I decided in early 2017 to provide consulting services for a fee on the basis of the open source project, so as to provide the necessary funds for the normal operation of the project. It also provides comprehensive enterprise-level solutions for specific scenarios encountered by large enterprise users, and finally packages all of these solutions with enterprise-level SLA support services as Redisson PRO for enterprise users.

Compared with other clients, although the Redisson project was founded in a relatively short time, it has been trusted by enterprises from different industries, including many industry leaders, among which the most worthy of introduction are these world-class enterprise users: • IBM in the computer industry. Everyone is familiar with IBM, the originator of the PC. Even so, IBM is willing to use Redisson. This trust is the biggest support for us. • Boeing in aerospace and defense manufacturing. Until they reached out to us, IT was hard for me to imagine Boeing being interested in Redisson. In fact, as well as making aircraft, Boeing is also the world’s largest provider of flight charts and solutions for mobile electronic flight kits, which are used by almost every airline. Redisson provides a solid foundation for their online flight navigation business. • American International Group in insurance. Founded in Shanghai, China in 1919, American International Group was the first Western company to bring the concept of insurance to the Chinese people, with operations in more than 130 countries and regions around the world. Although AIG was thrust into the limelight in 2008 when its share price plunged in the wake of the financial crisis, it is still a 99-year-old global conglomerate with assets of more than $600 billion. After a lengthy investigation by AIG’s team, Redisson was used to support its various financial and insurance businesses. • S&P Global, a financial institution. Standard & Poor’s, the world s leading financial analysis agency, has been mentioned in the global financial crisis. It is one of three rating organizations recognized by the SECURITIES and Exchange Commission (SEC) and specializes in providing investors with credit rating, investment research and advisory services. It is well known both inside and outside the industry, creating and maintaining the prestigious S&P 500 American stock index. S&p provides ratings not only for public companies but also for national governments. When it flatly downgraded the U.S. government in 2011 and put its outlook on negative, it immediately set off wild swings in the financial sector. But Redisson, which isso powerful that even the U.S. government doesn’t care about it, has become a loyal user of it, providing sophisticated analysis and processing of financial data. So Redisson’s trust rating is very high.

The original link

Rui Gu: The road to building an open source enterprise Redis client

Related Posts

A Redis production accident cost the company millions

Rounding out the Map. The merge ()

Practice of PB-level data impromptu query based on Flink in 360 Government and Enterprise Security Group