Introduction: Zhang Rui, a researcher of Alibaba, is in charge of the database technology team of Alibaba Group. She has experienced the technological transformation of Alibaba database and participated in the preparations for Double 11 as the general manager of the database for six consecutive years. Today, we invite him to share the application of the new generation of database technology in double 11.

Zhang Rui, head of Alibaba database technology team

Zhang Rui: Double 11 is a technical training, a super project in the Internet industry. Need to support as high as possible zero peak, to give users the best experience; Also to do as low as possible cost, extreme flexibility; But also to achieve overall system stability.

How do databases achieve extreme resiliency?

Cloud on database

As we all know, it is difficult to implement elastic capabilities in databases, on the one hand, because of the high performance requirements of databases, on the other hand, the relocation of large amounts of data, which is very expensive. The first direction of database elasticity is the cloud on the database.

The cloud in the database faces the following difficulties:

1. How to quickly move the database to the cloud and build a hybrid cloud?

2. How can I reduce performance loss caused by virtualization?

3. The communication between the public cloud environment and the internal network is abnormal.

 

After several years of exploration, these difficulties have been solved. First, the database uses high-performance ECS, through the use of SPDK, DPDK technology and NVMe storage, can make the virtualization loss is very small, close to the physical machine; Second, we have built a database hybrid cloud management system, which can manage both the cloud and the cloud environment at the same time. The hybrid cloud can be quickly built before The Double 11 to support the double 11. Thirdly, we connected alibaba’s internal and public cloud networks through the VPC network, which solved the network interconnection problem in the hybrid cloud scenario.

 

Flexible database scheduling

The use of cloud resources is not enough, in order to achieve more extreme flexibility, we can make the database use offline cluster computing resources through off-line mixing technology, to reduce the cost to the greatest extent. In order to realize the off-line mixing technology, there are two basic conditions: the first is containerization, which realizes the resource isolation and unified scheduling of computing nodes through containers; the second is the separation of computing and storage, which is the foundation of the elastic scheduling ability of database. Fortunately, the development of technology in recent years has made storage computing separation possible, such as 25G high-speed network, RDMA technology, high performance distributed storage, etc.

The database storage and computing separation architecture is shown in the figure, including storage layer, network layer, and computing layer. The storage uses The Distributed storage system pangu developed by Alipouch, and the database compute node is deployed in alipouch and connected to the storage node through 25G network.

 

In order to achieve the separation of database storage and computing, we made a lot of optimizations on distributed storage – Pangu, such as:

  • Response delay: single read/write response delay 0.4ms, RDMA network response delay is less than 0.2ms;

  • Two-three asynchronous: The third data copy is completed asynchronously, which greatly improves the stability of delay.

  • QoS flow control: Controls background I/O traffic according to foreground service loads to ensure write performance.

  • Fast Failover: The optimization of single storage cluster Failover is 5 seconds, reaching the industry-leading level.

  • High availability deployment: A single cluster of four Rack deployment improves data reliability to ten nine.

 

At the same time, we also made a lot of optimization in the database, the most important is to reduce the network transmission of compute nodes and storage nodes, so as to reduce the impact of network latency on database performance. The first was a redo log sync optimization that increased database throughput by 100%. Second, since Pangu supports atomic Write function, we have turned off the Double Write Buffer of the database. Under high pressure, the database throughput is increased by 20% and the network bandwidth is saved by 100%.

 

Double 11 database mixing technology

Containerization and storage computing separation, making the database stateless, with scheduling ability. During the double 11 peak, the shared storage is mounted to different computing clusters (offline clusters) to achieve fast database elasticity.

Ali new generation database technology

Ali was a commercial database at the beginning. Then we went to IOE and developed ali MySQL branch AliSQL and distributed middleware TDDL. In 2016, we started to develop ali’s new generation of database technology, which we named x-DB. X stands for the pursuit of extreme performance and the meaning of challenging infinite possibilities.

 

Ali’s business scenario has high requirements for the database:

  • Data should be extensible;

  • Continuous availability and strong consistency of data;

  • Large amount of data, high degree of importance;

  • The data have obvious life cycle characteristics, and the cold and hot data have distinct characteristics.

  • Transaction, inventory, payment and other businesses, simple operation logic, high performance requirements.

Therefore, defining a new generation of databases includes several important characteristics: strong data consistency and global deployment capability; Built-in distributed, high performance, high availability; Automatic data life cycle management capability.

 

X – DB architecture diagram

X-db architecture is shown in the figure, and Paxos distributed consistency protocol is introduced to solve the problem. Can be deployed in different places, although the network delay increases, but can maintain high performance (throughput), in the same city three-node deployment mode, the performance is equal to that of a single machine, and has high tolerance of network jitter.

X-paxos is the core of the three-node capability, which can achieve strong data consistency across AZs and regions and achieve a continuous availability rate of more than 9 in 5.

X-db core technology 2: Batching & Pipelining. During transaction submission, X-DB must ensure that logs are received and submitted by the majority of database nodes, which is the basis for ensuring strong consistency of data. Since transactions must cross the network when they are submitted, this will definitely lead to increased latency, and it is very difficult to ensure throughput under high latency. Batching & Pipelining technology ensures as much batch delivery as possible, data can be received and validated out of order, and logs can be delivered sequentially. Can maintain high throughput under high delay conditions.

X-db core technology 3: Asynchronous submission, the database thread pool will wait during submission, in order to maximize performance, we use asynchronous submission technology, maximum possible to ensure that the database thread pool can work efficiently. These technologies ensure high throughput of X-DB in three-node mode.

Comparison test for X-DB and MySQL Group Replication

Let’s compare with Oracle’s official Group Replication. Sysbench standardized tests in three nodes with IDC deployment mode. In the Insert scenario, we can achieve 2.4 times the official MySQL response time and lower than the official MySQL response time.

Sysbench standardized tests in remote deployment mode. In Insert scenarios, the performance advantage of X-DB (50,400) is particularly obvious, 5.94 times that of MySQL GR (80,500), and the response delay of X-DB (58ms) is 38% that of MySQL GR (150ms).

Typical Application Scenarios

Intra-city cross-AZ deployment replaces the traditional active/standby mode. We changed the original active/standby mode into three nodes to solve cross-AZ data quality and high availability problems. Data is consistent across AZs. No data is lost in a single AZ. No second switchover is available in one AZ. No cost increase compared with active/standby mode.

In cross-region deployment, lower-level database technology is used to solve the problem of multiple live events in different regions. In active/standby mode, six copies of data in three regions are reduced to five copies of data in three regions (five nodes and four data in three regions). For services, cross-region data consistency is ensured, and no data loss occurs in a single Region. High performance in strong cross-region synchronization. The switching policy is flexible. You can switch between the same Region preferentially or customize the switching sequence across regions.

Database in double 11 black technology

Application of X-KV in double eleven

X-kv is an enhancement of the official MySQL Memcached plugin. This year we have made significant improvements, including support for more data types, non-unique indexes, composite indexes, multi get, and Online Schema change. The biggest change is support for SQL transformation through TDDL. For the business side, X-KV has the advantage of ultra-high read performance, strong data consistency, reduced application response time and reduced cost. At the same time, because of SQL support, applications can be transparently migrated and the use cost is greatly reduced.

 

TDDLfor X-KV implements the following functions:

  • Independent connection pool: SQL and KV connection pool are independent of each other; In case of change, the two sets of connection pools must be consistent. Applications can quickly switch between two sets of interfaces.

  • Optimized KV communication protocol: no longer need delimiter, protocol implementation.

  • Result set automatic type conversion: String is automatically converted to MySQL type.

Transaction vendor library performance bottleneck solution

With the increase of transactions on Singles’ Day, the synchronization delay between buyer’s database and seller’s database has been relatively large in the past two years, leading to the failure of merchants to timely process orders on Singles’ Day. And the seller database has a large number of complex queries, poor performance. We have optimized it by setting up separate queues for large sellers, synchronous link merging operations, and flow limiting of seller libraries, but we still haven’t solved the problem completely.

ESDB is a distributed document database built on THE basis of ES. We support SQL interface on the basis of ElasticSearch, and applications can be seamlessly migrated from MySQL to ESDB. For large sellers, it provides dynamic secondary hashing, completely eliminating the performance bottleneck of data synchronization, and ESDB can also provide complex query capabilities.

 

The database monitors system evolution

The technical challenges of database monitoring system are as follows:

1. Massive data: 10 million monitoring indicators per second on average, 14 million at peak;

2. Complex aggregation logic: multi-dimensional data aggregation, such as region, machine room, unit, service cluster and database master/standby;

3. High real-time requirements: the monitoring value of the last second needs to be seen immediately when monitoring the screen;

4. Computing resources: Collect and calculate as few resources as possible.

 

The whole link experiences three generations of architecture: the first generation Agent + MySQL; Second-generation Agent + DataHub + distributed NoSQL; Third generation Agent + real-time computing engine + HiTSDB

HiTSDB is a time-series database developed by Ali itself, which is very suitable for storing massive monitoring data. Second level performance data and full SQL health are pre-processed by a real-time computing engine and stored in HiTSDB. Through the third-generation architecture, the second-level monitoring capability of the double 11 peak is realized, which is very helpful for us to understand the system running status and diagnose problems.

 

CloudDBA application in Double 11

Ali has the most experienced DBA in the industry and a huge amount of performance diagnostic data. Our goal is to combine the experience of ALI DBA, big data and machine intelligence technology. Our goal is that after three years, we no longer need DBA to do database diagnosis, optimization and other work, but to let machines to complete the intelligent management of database. We believe that self-diagnosis, self-optimization, self-operation and maintenance are the important directions of database technology development in the future.

CloudDBA also made some explorations on Double 11 this year. Through the analysis of full SQL and monitoring data, we implemented automatic SQL optimization (slow SQL tuning), spatial optimization (useless table useless index analysis), access model optimization (SQL and KV) and storage space growth prediction, etc.

Looking ahead to Next year’s Double 11

Looking ahead to next year’s Double 11, I sum up three key words: Higher, Faster and Smarter

Higher means a Higher transaction peak value, which is actually the pursuit of lower cost. It supports a Higher transaction peak value with extreme flexibility, so as to give users the best shopping experience. We hope to achieve unlimited flow one day.

Faster is the constant pursuit of our technologists: Faster application system, Faster database, Faster storage, Faster hardware and so on. The world’s martial arts, only fast can not break.

Smarter is the application of machine intelligence in Double 11. Whether it is database, scheduling, personalized recommendation, or even customer service, we hope that machine intelligence will be used more and lead to greater technological breakthroughs.