Introduction | in IT system localization under the background of the financial industry, the domestic financial industry began to promote the localization of IT infrastructure, gradually get rid of the traditional IOE architecture dependent. Since its establishment, WeBank has abandoned the traditional IOE architecture luhong and established a distributed basic platform based on DCN unitary architecture mode in combination with Tencent’s financial distributed database TDSQL. This architecture now supports WeBank’s hundreds of millions of users, hundreds of core banking systems, and hundreds of millions of financial transactions every day. This article is organized and shared by Hu Panpan, manager of Database platform room of Webank and TVP of Tencent Cloud, in the practice of Distributed Database in The Core System of Webank, a speech titled “Song of Ice and Fire of Data — From online database technology to Mass Data Analysis Technology” at Techo TVP Developer Summit. It mainly introduces the database trend of the financial industry, the union-based distributed architecture and the union-based database architecture of WeBank, and shares the future evolution direction of WeBank database.

Click here to watch a video of the speech

First, the trend of financial industry database

The main content to share today includes four aspects: The first is to introduce the trend of financial industry database; The second is to introduce the unitary structure of Webank. The third is how to do the database architecture based on the unitized architecture of Webank; Fourth, webank’s internal evolution direction for the future database architecture.

As a relatively traditional industry, the financial industry tends to be conservative in the selection of IT infrastructure. The traditional IOE architecture has always been adopted stably. However, there have been some changes in recent years, with three main trends:

The first is the trend towards localization. The international situation in recent years is well known, in a critical sector such as the financial industry, the localization of critical IT infrastructure is a direction promoted at the national level. On the other hand, thanks to the promotion of the wave of localization, domestic database products are also full of flowers, so that domestic financial enterprises have a lot of domestic databases to choose.

The second trend is decentralization. The development of the Internet industry brings data of explosive growth, traditional Banks are now walking slowly in the Internet and services online, such as mobile banking APP, so many business need to deal with to go offline site, mobile phone is on line now can deal with success, so bring the amount of data is exponential growth, the centralized architecture, Stand-alone storage or shared storage, for example, may not be able to handle such explosive growth in terms of performance or capacity.

The third trend is open-source. A few years ago, the financial industry, especially the traditional banks, did not trust the open source product model, believing that the stability of open source products was not guaranteed, and there was no fixed technical team to support them. However, in recent years, this perception has gradually changed. Many traditional banks in the same industry have begun to try some open source databases, such as MySQL and Redis. Running non-core business scenarios on an open source database platform is not trivial. In addition, our country is gradually increasing support for the whole open source software ecological construction, and the construction of the domestic open source community is gradually maturing. It further promotes the application of open source database in the financial industry.

This chart is from a domestic database ranking website cut, are now domestic database products. We can see that in this picture, there are database products of such big companies as Tencent, Ali and Huawei, as well as some popular open source products and some traditional products of manufacturers. It is really the era of a hundred flowers blooming, which was unimaginable five or six years ago.

Ii. Unitized structure of WeBank

Here is a brief introduction of the internal unit China structure of Webank. Webank started to prepare in 2014. As the first private bank in China, weBank has no historical burden in IT infrastructure and can build a brand new architecture from scratch. Therefore, the traditional IOE architecture of banks was not adopted at that time, and the distributed architecture-based mode was determined to carry the core system of the whole bank.

Ultimately, we chose unitary as the foundation of our architecture. How do we understand the monification here? Taking traditional offline banks as an analogy, traditional banks generally have one branch in each province, and the branch in each province is only responsible for customers in each province. The unitization of Webank is similar to the concept of branches. We divide all customers into a unit to manage, and each unit is a fixed number of users. For example, we only carry 5 million users in a unit. When this unit is full, we will directly expand another unit as a whole. We have an internal name for this unit: DCN, which is the minimal unit deployment unit. DCN contains the application layer, middleware and the underlying database. It can be understood that a DCN is a self-contained small webank. The number of users carried by DCN is fixed, for example, a fixed 5 million or 8 million users. When the DCN is used up, a new DCN will be extended horizontally, and the new users will be put into the new DCN.

There are two problems with this DCN architecture.

First question: For example, if you are a user of Webank and want to use the service of WeBank, how do you know which DCN you belong to? Here is a key component: GNS, which is responsible for storing and querying DCN routing information for all users. When a user’s request comes in, GNS will be queried first to get the location of the DCN, and then the corresponding DCN will make subsequent requests.

Second question: DCN and DCN are isolated. A user of A DCN wants to transfer money to A user of B DCN. How can the message exchange of transfer be realized? Another key component comes up here: RMB, or reliable Message Bus. The main function of RMB is to be responsible for the message exchange between DCN. Through the two components of GNS and RMB, the routing and message exchange function of the whole DCN is basically solved.

Of course, DCN cannot carry all business scenarios. Some archiving scenarios may need to be summarized from DCN for unified storage and calculation, and some data are global data that cannot be split by DCN. Therefore, we can see that there is an area for global data management and background management called ADM area below. To solve this type of business scenario.

In the process of continuous online practice, this DCN architecture brings many advantages:

The first advantage is that it reduces the impact of the fault. Because DCN is independently split from software to hardware, the fault impact of one DCN hardware is limited. For example, the biggest business of Webank, particle loan, now has more than 20 DCNS. The fault of one DCN may only affect half of the whole business, which can effectively control the impact range of the fault.

The second advantage is efficient expansion. At present, we can expand the capacity of one DCN within an hour through automated deployment, which is equivalent to expanding the capacity of 5 million users from software to hardware within an hour.

The third advantage is that it can achieve effective grayscale changes. Since each DCN is quite isomorphic, you can set up a dedicated grayscale DCN for a small number of users. Each version release, such as DDL of database or version release of application, can be gray scale in this small DCN first, gray scale verification is ok, and then full release to other DCN, which can effectively reduce version risk.

A final advantage, the DCN unit architecture brings with it the simplification of the database architecture. The user size of DCN is limited, so the database size in DCN, including TPS, IO/CPU load, and data volume, has an upper limit. Therefore, DCN database, there is no need to use more complex distributed or middleware architecture database to provide the scalability. Within DCN, we have adopted the simplest TDSQL single-instance architecture, without considering the complexity of the database brought about by distributed transactions.

Of course, this DCN unit Chinese architecture also brings some disadvantages:

First, the physical resources of DCNS are independent from each other, and each DCN may reserve some buff resources, which may lead to low overall resource utilization.

Second: DCN architecture has very high requirements for operation and maintenance automation. Now we have produced more than 100 DCNS, and automatic operation and maintenance are required for version management, version release and daily changes of multiple DCNS.

The third: Need the application layer to realize distributed transactions across DCN framework, such as I am A DCN, you B DCN, I need to give you A sum of money, if in the original centralized architecture, may directly use database transaction to ensure consistency, but under the architecture of this cross DCN, may require the application layer to achieve A similar transaction framework, To ensure the consistency of the overall transaction.

Third, database architecture based on unitary

The DCN unitary architecture of WeBank is introduced above. Based on the premise of this architecture, how does webank’s database architecture work? Before we introduce database architecture, we need to provide some background information.

1. IDC architecture of Webank

Webank’s current IDC construction is an IDC architecture with 2 locations and 7 centers. We have 5 production rooms in Shenzhen as the production center, and 2 disaster recovery rooms in Shanghai as the disaster recovery center. The location of the 5 computer rooms in shenzhen is also stipulated, and the distance between two IDCs is controlled within the range of 10-50 kilometers, so as to ensure the network delay between IDCs is about 2 milliseconds, which is the premise of cross-city IDC deployment of database multiple copies.

2. Webank TDSQL deployment architecture

Let’s start with the TDSQL product. TDSQL is a financial-level distributed database product launched by Tencent. At present, all core systems of WeBank are basically carried by TDSQL.

As shown in the figure below, from the horizontal dimension, the APP request on the left will pass through a load balancing component, which forwards the request to TDSQL Proxy module. This proxy actually implements SQL parsing, read/write separation, flow control and other functions, which is equivalent to a middleware. After receiving the request, TDSQL Proxy will forward the request to the corresponding SET list at the back end. The minimum unit of TDSQL is SET, and SET is essentially MySQL. TDSQL has made some customized optimization for the MySQL kernel. In a SET, there is generally one master and two standby architectures, and the TDSQL optimized strong synchronization mechanism is used between one master and two standby architectures to ensure the consistency of multiple copies of data. Multiple TDSQL sets can be mounted under one TDSQL Proxy.

Let’s look at the vertical dimension. Zookeeper is the management system for the entire TDSQL configuration. All metadata information and monitoring statistics will be reported to ZooKeeper cluster. There is also a module called Schduler on ZooKeeper, which is responsible for scheduling the whole TDSQL task flow. For example, SET1 Master node may be down now, and a Master/standby switchover is needed. The process of Master/standby switchover is scheduled by the whole Schduler module, which controls the process of each step of switchover. Until the switch succeeds.

There are two usage modes of TDSQL, one is NO Shard mode, that is, single instance mode of TDSQL. In this architecture, TDSQL Proxy simply does a SQL conversion function. After the SET at the bottom, each STE is an independent single instance architecture, and there is No relationship between the libraries of sets. This No Shard architecture does not divide the libraries and tables in the middleware layer, so the logic is much simpler. However, if a SET in this mode needs to be expanded, it can only be expanded vertically within the SET. The advantage of this architecture is that it does not involve distributed transactions or sharding of the database, so its syntax is fully compatible and the architecture is simple.

The second mode is TDSQL Shard mode. Shard mode can be simply understood as a mode based on middleware sub-database and sub-table. Through TDQSL proxy, a library is made into three Shard fragments, which are distributed in SET1, SET2 and STE3 respectively. The advantage of this mode is that it can achieve horizontal capacity expansion. However, the problem it brings is that the compatibility of syntax is not perfect. Because Shard needs to be implemented at the middleware level, some special syntax compatibility is needed, for example, Share key is needed when building tables, and Share key may also be needed in SQL, so adaptation transformation of application layer is required.

Just now, we mentioned that weBank’s unitary architecture is based on DCN unitary architecture, which can control the performance and capacity requirements of the database. Therefore, we do not need to use this Shard mode for expansion, but directly adopt No Shard mode, which greatly simplifies our workload in database architecture, operation and maintenance.

Based on the above background knowledge, let’s take a look at the deployment architecture of WEbank’s TDSQL. From the vertical dimension, each box represents an IDC. There are three production IDCs on the left and two IDCs in Shanghai on the right. From the bottom, our database is at the bottom, which adopts TDSQL database with one Master and two standby architectures. Three copies are distributed in three IDCs in the same city, such as Master in Nanshan computer room, Slave1 in Mission Hills computer room, Slave2 in Futian computer room. The TDSQL strong synchronization mechanism is adopted for data synchronization between Master and Slave. In a DCN, we may have multiple sets to bear the database.

On the upper layer of the database, each computer room has an independent access layer, which has an independent load balancing function. The application layer is on the upper layer.

Based on this deployment architecture, we implemented the concept of multi-live applications in the same city. Service traffic can come from any IDC of the production IDC. The access layer from the application layer to the load balancing layer is in the same room, but the access from the access layer to the database above may involve cross-room traffic access. For example, the traffic coming from IDC1, the Master of the database may be in IDC2, then it may involve the access layer to IDC2 Master traffic access across the machine room.

In addition to the three copies produced, our cross-city Dr In Shanghai will have two copies: one Master and one Slave. Data synchronization between inter-city Dr And production IDCs is asynchronous due to network delay.

This architecture provides idC-level disaster recovery in the same city. That is, the production IDC of the same city. If any IDC is lost, for example, the power is cut off in one room and the whole room is disconnected, THEN I can guarantee the fast recovery of business, because my database can also automatically switch to the other two IDCs.

You may have a question: the city of a master and two standby is distributed between three machine rooms, there may be a delay between the machine room and the machine room, we are controlled in two milliseconds, but will there be any impact on performance? First of all, in terms of infrastructure, we will set some principles in the construction of the computer room. Just now, we also mentioned that the distance of IDC in the same city is controlled within 50 kilometers, and multiple special lines will be established between IDC to ensure bandwidth and stability. We are now all infrastructure for a ten gigabit network, and that is the hardware level guarantee. In addition, the software level is aimed at TDSQL. TDSQL itself has made some asynchronous and mass performance optimization for the strong synchronization mechanism between master and slave, to ensure that the performance loss of strong synchronization across the machine room is controlled within 10% as far as possible.

According to our measured results, the performance loss caused by strong synchronization between the same city and the same machine room and between the same city and the same machine room may only be within 10%, which is acceptable for our business level.

Take a look at TDSQL’s operation and maintenance management system.

TDSQL supporting operation and maintenance management platform is also called Red Rabbit platform, responsible for the implementation of TDSQL monitoring and operation and maintenance functions. Can monitor all TDSQL instances of multiple indicators, including various indicators, slow query, IO, CPU, etc. Most O&M operations, such as active/standby switchover, migration, capacity expansion, and node replacement, are integrated on the platform to achieve highly automated O&M functions.

Another important operational component of TDSQL is called the CLOUD DBA. It is an intelligent fault location module, which can replace part of DBA’s work, such as analyzing SQL performance, giving index suggestions, and automatically analyzing and locating fault causes. CLOUD DBAs will proactively collect energy metrics, slow query SQL, and execution plans from TDSQL instances to generate health reports and optimization recommendations.

In the interface of CLOUD DBA, the picture on the upper left is the scoring function. It can perform a comprehensive check on an instance regularly every day, and the final score generates a health report, pointing out the possible defects and risks of the instance for you. For example, the examination and optimization of a CERTAIN SQL can analyze which indexes may be missing and propose additional optimization suggestions.

Below is the real-time diagnosis interface, which can diagnose the running status of the current instance in real time, such as lock loss, lock waiting, abnormal new indicators. These tools are of great help to our daily operation and maintenance.

At present, webank’s TDSQL database has more than 400 sets and more than 2000 instances in the whole network, and the data volume has reached PB level, carrying the core system of hundreds of banks. The financial trading volume should have reached about 600 million at present, and the highest TPS peak is more than 100,000.

Fourth, the future evolution of the architecture

Finally, I would like to share the future evolution direction of Webank database. There are three overall evolution directions.

The first is what we are doing: pushing hardware localization. Most of our hardware is still based on Intel CPU architecture on the X86 platform. Since last year, we are trying to migrate the underlying server hardware platform to the home-made ARM platform. At present, we use the CPU architecture of Huawei Kunpeng. Last year, we realized the whole link from hardware layer to middleware and database running on huawei Kunpeng ARM server platform in a certain business. This year, we may continue to promote the large-scale migration to the domestic ARM platform.

The second direction is cloud native. At present the whole TDSQL are based on physical machines and virtual machine pattern deployment of these two kinds of resources, the deployment pattern will bring some problems, such as resource management cost is higher, resource delivery efficiency is lower, because the physical machine involve the shelves, the initialization, the flow of resources allocation more troublesome, the isolation effects of resources will be more bad, The resource utilization rate will also be low, so what we plan to do this year is to slowly migrate TDSQL to K8S+Docker container architecture, so as to improve the utilization rate and efficiency of resource delivery. The architecture of K8S+Docker is relatively mature in stateless applications, because stateless scenarios are relatively simple. However, problems and complexity will be much more in stateful applications such as databases, so we will try cautiously.

The third direction is intelligent early warning and operation. The headache for DBAs is that many times they have to solve and locate problems after they have occurred, but they have already had an impact on business and transactions. Therefore, what we have been doing is hoping to detect, give early warning and solve such risks and failures in advance through some intelligent ways, which is also our current pain point.

For two examples, one is intelligent fault prediction alarms based on deep learning that we have launched. We will summarize all the performance indicators of the database, such as IO usage, CPU usage, and SQL quantity of slow query, on the deep learning platform, and make reasonable prediction of curves based on this deep learning platform.

When we find that some of the performance metrics of an instance are not within the expected range, we will alert it in advance. From the actual application situation, this is very effective, can find some potential risks in advance.

The second thing we are working on is a log analysis system based on database logs and ES. We will put all the whole TDSQL log, including middleware, monitoring, dispatching log, all incoming to ES garage, to do some SQL takes the analysis of the statistics, SQ execution plan, based on the analysis to some possible it has not produced, but the risk may be produced, the threat of SQL out ahead of time, for business to optimize.

The lecturer introduction

Hu, hope

Webank Database platform Room manager, Tencent Cloud TVP. I graduated from Huazhong University of Science and Technology with a master’s degree. After graduation, I joined Tencent as a senior engineer, engaged in r&d and operation related to distributed storage and cloud database. In 2014, I joined the infrastructure team of WeBank in the preparation of WeBank, experienced and witnessed the construction and development of weBank’s distributed core architecture from scratch, and also participated in the significant evolution of WeBank’s infrastructure 1.0 to Infrastructure 2.0. Currently, I am fully responsible for the construction and operation of weBank database platform, including relational database platform and KV database platform.