TiDB Cluster tuning for massive regions

Author introduction: Tian Weifan, senior database engineer of netease Game, Co-leader of South China District of TUG. Database veteran, has maintained a variety of databases, and is currently in charge of the operation and development of the database private cloud platform in netease Games.

In the TUG netease online business activities, Mr. Tian Weifan, a senior database engineer from netease Games, shared the theme of TiDB massive region cluster tuning. The following content is compiled from the sharing record of the day’s activities. The theme of this sharing includes three aspects:

The first part is about netease games and the current status of TiDB used by netease games.
The second part is the tombstone Key case sharing, including an introduction to the entire screening process.
The third part is the first experience of using TiDB 5.0 and the future outlook.

About us

We are the department that provides one-stop database private cloud services for the whole netease games, including MongoDB, MySQL, Redis, TiDB and other database services. More than 99% of netease games’ game business is using our database private cloud service.

The usage of TiDB in netease games is as follows: the number of the entire TiDB cluster is 50+, the overall data scale of TiDB environment is 200TB+, the maximum number of cluster nodes is 25+, including most of TiKV and a small part of TiFlash nodes, and the maximum data volume of a single cluster is 25TB+. Although TiDB is in the initial use stage, it has reached a certain scale, and we have encountered many problems in the whole process of using the business.

I have chosen an interesting case to share the whole process of our investigation, and I hope you can get some reference from it.

As a tombstone Key,

What is a Tomstone Key? What is a Tomstone Key? The meaning of a Tomstone key will be explained in more detail later.

Why is ** called “blood case”? ** because when people think about the blood cases, they think the problem is really serious.

The problem affected real-time and offline business analysis scenarios of a large netease game, causing the entire analysis service to be unavailable, and thus affecting the availability of the internal data reporting system. In addition, due to the long duration of the game, the product operation and the company’s top management can not see the game data in a timely manner.

background

Before analyzing specific problems, briefly introduce the whole service background: A cluster has a large number of single-node regions. The number of TiKV nodes is more than 25, and the number of single-node regions is more than 90,000. The main business belongs to the real-time analysis business, there are a lot of transactional delete and insert operations.

Question the status quo

Let’s go to the problem case, which starts with a phone call. One day at 4 am, we received a phone call to the police. After determining the impact of the problem, we immediately notified the business, and then began to troubleshoot the problem to see what caused the alarm.

As shown in the figure above, you can see that the CPU of the whole node is full. At this time, first guess whether this phenomenon is caused by hot spots. Generally, the abnormally high CPU of a single node is considered as a hot spot problem. To deal with this issue quickly, let’s take the node offline for now, lest the entire database become unavailable. It was found that after the node was taken offline, other TiKV nodes would suddenly spike in height, that is, services could not be restored by simply restarting the node. Since it is a hot issue, then urgent investigation hotspot urgent. First of all, there are some hot spots in the heat map, and some tables corresponding to the hot spots can be targeted to find out.

Secondly, through the heat map, and some hot metadata table, lock a large number of hot table, found that these tables are jointly formed. To make it easier, set a parameter SHARD_ROW_ID_BITS in batches.

At the same time, you also need to contact the business to discover the possible hotspot table, and do the dispersion together. In addition, we also monitor the hot Region table. If the hot Region table is not smashed, the hot Region table is automatically smashed. In addition, adjust the concurrency of PD hotspot scheduling appropriately and increase the scheduling of the whole hotspot to quickly handle similar hotspot problems. After a wave of adjustments, it was found that the CPU was indeed down, and the access service feedback was normal, as if the problem had been solved. But the next day, at 10 o ‘clock, we received another wave of calls to the police. The problem was so urgent in business that we immediately rearranged the whole issue. In the process of combing, we reviewed the first treatment and guessed that the first treatment was either not thorough enough, or it was not a hot issue at all. So we go back to the problem. Since it was the CPU that soared, we first checked from the CPU point. After checking, we found that the storeAge ReadPool CPU index soared.

This index before the spike has not actually encountered, was also confused, so we contacted the TiDB community technology experts: Qi Bin, Qi Hang and jump, online support.

The seek Tombstone key is locked through the log, as well as a number of other metrics, including seek duration and Batch Get.

Back to the original question, what is a tombstone key? Here is also a brief introduction.

Generally, multiple versions of data are retained in TiDB after data is deleted. That is, when new data is inserted to overwrite the old data, the old data will not be cleaned up immediately, but retained with the new data. At the same time, time stamps are used to distinguish the multiple versions. When is this multiversion data deleted and cleaned up? There is a GC thread, and the GC thread will go through the background, traversing the keys that are longer than the GC time. Then mark these keys as tombstone keys. These keys are really just a marker, a tombstone key. These keys are recovered by a thread that compacts the RocksDB engine in the background. This is a brief overview of the RocksDB Compaction process. After data is first inserted into RocksDB, it is written to WAL and then formally written to the memory block of the memory table. When the block is full, we will mark it as immutable member table.

After that, RocksDB flushs the memory block to disk, generating an SST file, which is called level 0.

When the level 0 SST file grows, there will be a number of tombstone keys that will start to appear in this tier, and RocksDB will consolidate these SST files into level 1 through a compaction protocol.

When the level 1 compaction reaches a certain threshold, it is consolidated into the level 2 compaction. This is a brief description of the RocksDB compaction process.

At this point, the whole problem is almost confirmed.

Before the fault occurs, a large amount of data has been deleted, that is, a large number of tombstone keys exist. If a large number of transactions are executed in this case, TiKV may seek a large number of deleted tombstone keys. The need to skip these tombstone keys may severely degrade the batch GET performance, causing the CPU of the entire Storage ReadPool to soar and the entire node to become stuck.

It’s worth recalling that the first time we dealt with the problem, after we broke it up through the hot spots, the whole problem seemed to recover, right?

Why did the problem recover after we broke up the hot spots? We did a review and found that we had a breakthrough when RocksDB broke a tombstone key that died from a compaction. So the problem has been restored.

After determining the problem, the next step is how to optimize and completely solve the problem. To this end, we have formulated some optimized measures.

How to optimize

The business side

First, partition table transformation. In some scenarios, if a service can Delete a partition every day while supplementing data, TiDB will run a fast Delete Ranges without waiting for compaction.

Second, avoid hot issues. The service read request is changed to the follower read mode, creating the table to specify the fragmentation feature, alleviating the read and write hotspot issues, and to a certain extent, alleviating the impact of seek tombstone key.

Third, break down the big things as much as possible. Too much business, on the entire TiKV layer, will cause more pressure; The tombstone key is more likely to be triggered when the transaction is too large and a large number of keys need to be operated on.

After doing some optimization schemes in the business layer, we also did some processing schemes or optimization schemes in the operation and maintenance side.

Ops side

First, a manual compaction. When a compaction occurs, you can pull down the faulty TiKV node, manually press a compaction machine, and reconnect it. Second, the hot table is scattered. An automatic monitoring mechanism is established to monitor the cluster hotspot table and automatically disperse the cluster hotspot table based on the hotspot type and table type. Third, better communication channels. Establish a unified communication channel with the official, communicate with TiDB related issues in real time, and organize some offline communication activities from time to time.

Thorough optimization

In fact, the best solution to this problem is to do a thorough optimization from the database layer, from the above introduction, the idea should be relatively clear.

In the presence of tombstone keys, write operations or Batch gets can completely avoid this problem by bypassing these tombstone keys to avoid seeking too many of them. In addition, related optimizations were officially incorporated into TiDB 5.0, and an optimization was also made in version 4.0.12.

Initial experience of TiDB 5.0

We are looking forward to 5.0. So after the release of 5.0, I also did some related tests at the first time. Here are some test cases to share with friends.

TiDB 5.0 test experience

The stability of

First of all, stability, because we are in version 4.0 or before 4.0, we will more or less encounter some problems, such as performance jitter in the case of high concurrency, which has been bothering us. In the test, we found that TiDB 4.0.12, under high concurrency, Duration and QPS are fluctuating.

Comparison test 5.0 shows that the entire duration and QPS are relatively stable, even almost a stable line.

MPP

There are some new features in 5.0 that we are interested in, so we have done some testing.

For example, with two-phase commit, you can see that when asynchronous commit was turned on during the test, the overall insert performance TPS was increased by 33% and delayed response was reduced by 37%, which is a significant improvement over the previous period.

Speaking of 5.0, I have to mention one of the most important features of 5.0, namely the MPP feature, which we are paying more attention to.

We did a lot of testing on this, including benchmarking and some other single SQL tests.

Here I want to share with you a real database scenario test case. The test MySQL has about 6T of data, we compared the performance with TiDB 5.0 MPP, the results found that the MPP in the 5.0 single TiFlash case, most of the time is better than MySQL performance and a lot of times, even some of the MySQL performance is more than ten times better.

If you have more than one TiFlash, you will get better performance after using the MPP feature. Through the test, we are full of confidence in 5.0.

TiDB 5.0 is expected in the future

Next, we will promote the implementation of 5.0, such as some real-time business analysis scenarios, and at the same time, we will mainly promote the implementation of some big data scenarios in TiDB 5.0. In addition, we are also planning to promote the TP business in TiKV 5.0.

For the private cloud service of the whole database, we have done a lot of surrounding ecology during the construction of TiDB, such as database migration, unified monitoring platform, and some automatic creation and addition, etc. In the future, we will continue to improve the ecological construction of TiDB, so that TiDB can be better implemented in the cloud service of netease games, and at the same time solve the worries of our business. In the process of using, we also got great support from the TUG partners. So while we are getting support from the TUG community, we will also be deeply involved in the construction of the entire TUG community.

TiDB Cluster tuning for massive regions

Related Posts

The best time to buy and sell stocks

Rewriting equals() and rewriting hashCode ()

Remember a Mariadb password forgotten event