Li Wenjie, senior database engineer of netease Interactive Entertainment, TUG 2019 and 2020 MVA. Mainly responsible for big data research and development and data analysis, providing refined operation guidance for products; At the same time, I promoted the use of TiDB within the department, accumulated experience and explored the optimal scheme for cloud and database distribution in business, and now I am the leader of TiDB management team.

This article is organized from TUG netease online business activities, shared by Li Wenjie, senior database engineer of netease Games, mainly introduces the application experience of distributed database TiDB in netease games.

Netease games at the beginning of the introduction of TiDB from the perspective of AP to use. When TiDB was used for the first time, we migrated the high-computation tasks of batch running to TiDB. If a large number of tasks are running during the migration process, many people will encounter the error “Transaction too large”.

TiDB transaction limits

After some investigation, we found that since distributed transactions require two-phase commit and Raft replication at the bottom, if a transaction is very large, the commit process will be very slow and the Raft replication process will get stuck. In order to avoid the system being stuck, TiDB limits the size of transactions, including the number of SQL statements in a single transaction, the number and size of KV key-value pairs, and the size of a single KV key-value pair.

Knowing this limitation, we found a solution, which is to split the large transaction into several smaller transactions according to the business requirements, so that the SQL that failed before can be successfully run, and the batch program in MySQL/Oracle can be successfully migrated to TiDB.

At the same time, we have to think, when there is no problem, the program can run very smoothly, but when there is a network problem in the machine room, or other faults, some data will be written to the TiDB, and some data will not be written. In this scenario, the execution of a transaction does not guarantee atomicity, only part of it is executed, some of it succeeds, some of it fails.

After investigation, it was found that this was because we manually enabled transaction segmentation. In this case, the atomicity of large transactions could not be guaranteed, but only the atomicity of small transactions in each batch. From the global perspective of the whole task, data were inconsistent.

So how to solve this problem?

TiDB large transaction optimization

After reporting problems to the authorities, TiDB 4.0 made deep optimization for large transactions, not only removing some restrictions, but also directly relaxing the single transaction size limit from 100MB to 10GB, which is directly optimized by 100 times. But at the same time, it also brings another problem. In t+1 batch running business, there may be hundreds or even tens of millions of data on the previous day. If it is processed by JDBC+TiDB, the efficiency is not high, and the processing time often needs to last several hours, or even more than dozens of hours.

So how do you improve the overall throughput of computing tasks? The answer is TiSpark.

TiSpark: Efficient processing of complex OLAP calculations

TiSpark is a plug-in developed on the basis of Spark, which can efficiently read data from TiKV. At the same time, it supports index search and calculation of push-down strategy, with high query performance. We found in practice that 200 million lines of data can be read in 5 minutes by using TiSpark to read TiKV. It also has high write performance. With TiSpark, we can access TiKV data directly through the Spark tool. It has been proved over time that TiSpark has excellent performance in both reading and writing TiKV. Through TiSpark, we can process complex and large data operations.

TiSpark practice

In practice, there are two main ways to use TiSpark:

  • Method 1: TiSpark + JDBC write

The TiSpark + JDBC write mode can automatically split large transactions, but does not guarantee atomicity and isolation of transactions. In addition, manual intervention is required during fault recovery. This method can write up to 1.8 million lines /min, through TiDB processing SQL and then write TiKV, the speed is average.

  • Method 2: TiSpark writes TiKV in batches

TiSpark batch writes to TiKV do not automatically shard large transactions. Using TiSpark to read and write TiKV data is similar to reading and writing TiKV data through large transactions. This ensures atomicity and isolation of transactions and provides good write performance with a write speed of 3 million lines /min. After the application of TiSpark, the problem of batch processing of large data volume is solved, but there are also some hidden dangers. When TiSpark reads and writes TiKV, TiKV serves as the storage engine of the entire TiDB architecture. If the data read and write pressure on the storage engine layer is high, other online services will be significantly affected. In addition, when TiSpark reads and writes TiKV, if the I/O is not restricted, performance jitter may occur, resulting in increased access latency, and other online services may be affected.

How can effective isolation be achieved? Perhaps the TiFlash column storage engine can provide the answer.

TiFlash: column storage engine

TiFlash complements the TiKV line storage engine. It is a RAFT copy of TiKV data, and TiFlash is a column copy based on TiKV to ensure data synchronization consistency and integrity through RAFT protocol. The same data can then be stored in both storage engines. TiKV stores row data and TiFlash stores column data.

When performing Spark calculation analysis, we can directly read data from the TiFlash cluster, which is very efficient. AP analysis with column data is a dimension reduction blow for row data.

TiFlash: TPC-H performance analysis

The combination of TiSpark and TiFlash improves the computational efficiency both quantitatively and qualitatively. Tpc-h performance analysis shows that in the horizontal comparison with TiKV, the execution efficiency of TiFlash is higher than that of TiKV in almost all Query scenarios, and much higher than that of TiKV in some scenarios. After using TiFlash, neither the cluster performance of TiKV nor the offline cluster business will be affected, and it can still maintain good performance and throughput when doing offline big data analysis.

Through practice, TiFlash can solve many of our problems, is a great tool.

TiFlash application: More efficient computing

In some index calculation scenarios of Netease game user portraits, SQL processing speed of different business contents is at least 4 times faster than TiSpark + TiKV after TiSpark + TiFlash is used. Therefore, after using TiFlash, the efficiency of offline batch processing has been improved.

JSpark: Cross-source offline computing

As service scales and application scenarios increase, data is distributed and stored in different storage engines. For example, log data is stored in Hive and database data is stored in TiDB. Cross-data access requires a lot of data migration, which is time-consuming and laborious. Can you directly access different data sources and realize cross-source communication? To solve this problem, netease uses the JSpark tool. JSaprk is an offline computing tool designed to bridge underlying storage and achieve cross-source access. The core of the tool is TiSpark + Spark. Spark acts as a bridge to access different data sources.

JSpark is encapsulated based on TiSpark and JDBC. Data can be read and written in TiKV, AP calculation can be performed in TiFlash, and SQL calculation can be performed in TiDB. Currently, we have encapsulated and implemented the mutual read and write functions of TiDB and Hive. In the future, JSpark will support the mutual read and write communication between TiDB and ES to achieve multi-source data access of TiDB, Hive, and ES.

Currently, the JSpark tool provides the following functions:

  • Supports TiSpark+JDBC to read and write TiDB and Hive. The efficiency is low.
    • Application scenario: In the TiDB wide table, only some columns required by services are operated.
  • TiDB table data can be read and Spark calculation results can be written to the Hive target table. TiSpark is recommended to read TiKV or TiFlash, and TiSpark is recommended to write TiKV to improve efficiency.
    • Application scenario: Periodically rotate expired partitions of TiDB partition tables and back up permanent copies to Hive to avoid large TiDB tables.
  • Read Hive table data and write Spark calculation results to TiDB target tables. TiSpark is recommended for writing TiKV data, which is efficient.
    • Application scenario: Analyze user portrait indicators generated by Hive data and write them into the online TiDB to provide ONLINE service TP query. Another practice scenario is to restore Hive backups to TiDB.
  • You can send HTTP requests to the front-end Web or services, remotely start Spark jobs, and perform joint query of TiDB and Hive.
    • Application scenario: The front-end management platform can click the query button to obtain the aggregation result of Hive link logs and TiDB data of a player and extract relevant behavior data.

During the development and use of jSpar-related features, we also found a point of optimisation for TiSpark.

Currently, TiSpark cannot access multiple TiDB data sources at the same time. Only one TiDB cluster can be registered during the runtime, which makes it difficult to calculate across TiDB clusters. In the future we hope TiSpark will support simultaneous access to multiple TiDB clusters.

TiDB application: HTAP data system

JSpark is currently the core framework for offline computing. In addition, it is combined with JFlink real-time computing framework to form big data processing capabilities. JSpark is responsible for the offline Big Data computing framework and JFlink is responsible for the real-time computing framework, which together make up HTAP’s data architecture.

HTAP computing capability: JSpark+JFlink

First of all, online data are synchronized and summarized to TiDB cluster in real time, and then JSpark + JFlink is used for offline and real time calculation in TiDB and other data sources, and user portraits and other index analysis data are produced to feed back online business queries.

TiDB application: HTAP data system

At present, after three years of development, the total number of cluster instances of netease games is 170, and the data scale has reached 100 + TB level. Our business covers user portrait, anti-addiction, operation, reporting, business monitoring and other aspects, and the business scale and cluster scale are also developing and expanding.

The above is netease games in the process of using TiDB, for AP computing evolution process, I hope today’s sharing can inspire you.