background

Ifeng.com (A company listed on the New York Stock Exchange, code: FENG) is a leading global cross-platform network new media companies, comprehensive portal integration, phoenix TV, mobile phone three platforms, the phoenix phoenix video network and adhering to the “the Chinese feelings, global view, compatible with open, progressive forces” concept of the media, for the mainstream of Chinese Internet, wireless communication, network three nets fusion seamless new media content and quality service.

In the media industry, the news content is the core of business data, we need a stable, high available, easy to scale the data storage system, to hold the company’s core data, at the earliest, we adopt popular MySQL to store the contents of each business module, by means of master-slave switch for high availability, but with the increase of the amount of data, MySQL stand-alone capacity has become the bottleneck, the traditional based on MySQL subdivision scheme in the implementation and operations require more expensive cost, MySQL mainstream master-slave switching scheme because the mechanism at the same time, in the judgment “main library is dead”, “new main library election” and “the new road by the information broadcast” there are some deficiencies, the overall time consumption is larger, Does not meet the high availability requirements of the company’s core business. So we had to find a new solution.

Choose TiDB

Preliminary program selection

At the beginning of the selection evaluation, we mainly have the following considerations:

1. Support horizontal capacity expansion and reduction of service elasticity;

2. High availability for self-recovery of service faults;

3. Support convenient and stable migration of MySQL without affecting online business;

4, support SQL, as little as possible to change the code;

5, use, operation and maintenance should be enough optimization, it is best to support online DDL.

In our search, we were pleasantly surprised to find TiDB, an open source distributed database led by Chinese r&d.

Database capacity and expansion

Remember that there is such a sentence: “single MySQL can solve the problem, do not use TiDB!” , our original data store is stored in multiple MySQL databases. It is true that MySQL performs better than TiDB for some common point-and-range searches for libraries with small volumes of data. In the short term, using master-slave MySQL is better than TiDB for online needs if scaling is not taken into account, but even then we have to consider the cost. Migrate master/slave MySQL to TiDB to make full use of server resources and reduce resource waste. For many businesses, the expansion problem is impossible to avoid, for the database with increasing data, the response time of single table will be longer and longer, and the cost of sub-table is too high, the code needs to be changed and tested, even if the business can accept this operation, then what about the next expansion? TiDB solves this problem by automating sharding at the bottom level, and the increase or decrease in traffic can be handled by adding decrease nodes, which is basically the same code as we expected.

High availability

As for the original master-slave MySQL, it is not configured with high availability. We have also done research on third-party tools such as MHA. After the master-slave switch, network configuration modification or database middleware (DBproxy) is also required for the new routing information distribution, which has a certain time cost. But there are also many problems and technical costs, so this direction is not our first choice, the previous way is once the main MySQL database down, we learned through the internal monitoring system, and then change Keepalived + HAproxy configuration, even if the human response is very timely, the impact of the time has already exceeded the business tolerance. The choice of natural multi-node TiDB will naturally avoid this point, with the network HAproxy fully realized the DB level of high availability. Some time ago, our internal monitoring system was upgraded, one of the machines did not add monitoring to TiKV, and the TiKV node service was down for several days due to hardware reasons, and our business did not realize it. Of course, this is our fault, but it also reflects the high availability mechanism of TiDB itself.

OSC problem

MySQL 5.7 already supports Online DDL, but the actual operation has many limitations and often causes delays from the library. TiDB supports Online DDL, and the process of adding fields is non-blocking and smooth. Business is imperceptive, which is a big reason why we chose it.

The migration TiDB

MySQL data is migrated to TiDB, and we use Mydumper and Loader to import and export data. For subsequent incremental data synchronization, PingCAP provides a Syncer tool that plays back the Binlog sent by MySQL to simulate MySQL’s Slave. To our surprise, it supports the simultaneous synchronization of multiple MySQL databases to a TiDB. At the beginning, we adopted this way to build the environment. This convenient solution can free our energy from data migration, and focus more on business testing and trial. We found that TiDB was highly compatible with MySQL, so we gradually switched each library traffic to TiDB, and the whole data migration and traffic migration was very friendly and convenient. In the actual migration process, only the Binlog format is not compatible due to some of the original MySQL version is too low, otherwise everything goes smoothly.

Node migration

Online node migration of TiDB. Some of our online TiDB is deployed using Binary, and the version is too old (2016 version) to be updated automatically ina timely manner, so when it comes to machine room migration, we worry about whether the service will be affected. Fortunately, the migration involves stateless TiDB nodes and PD nodes storing metadata. After the verification of the increase and decrease of PD nodes step by step, the TiDB nodes in the new machine room are started, and the original TiDB nodes are removed after several gray scale online in the Haproxy layer to complete the migration.

Strong official support

Of course, although the official migration plan is very friendly, there will be some difficulties in learning and practical operation. We did not get in touch with PingCAP company for a long time before, but when we had migration problems, we chose to ask the official for help, and they responded to us very quickly. We solved the PD node removal failure caused by ETCD in online migration, and also arranged architects to come to our company for technical communication. It is better to help timely than icing on the cake. We deeply realized this when communicating with the official staff at that time.

The stability of TiDB database is very good, and the BEta4 version we are using has been running stably for nearly 220 days when we contacted the authorities.

TiDB current environment

Currently, our company has three sets of TiDB in use, all of which are OLTP services. The first two use Binary’s ancient installation (Beta4), and the third is TiDB GA, which is deployed using official Ansible.

We learned from the official communication that the two sets of ancient version (BEta4) need to be migrated and upgraded because they are too far away from the current version. The official is also very willing to provide us with technical support. Here, we would like to thank the official help in advance.

A bit of fun

The emergence of TiDB helped us skip the traditional Sharding + Proxy route and saved us huge technical costs. We love TiDB very much, but we also encountered some problems in the process of contacting and using TiDB. Even though the official server configuration has clear requirements (SSD or above hard disk), it was difficult for our company to apply for high-performance machines for TiDB functional testing at the beginning, and to learn and get familiar with TiDB construction, expansion, migration and other operations. Therefore, at the beginning, we took several low-performance testing machines. When I used Loader to import data for incremental test, an error occurred. TiDB error message did not remind me that this was caused by low machine performance, and there was no guidance in this regard in the official documents, which led to our repeated import and test, but the problem still existed. After considering the possible cause of machine performance (the latest Ansible installation script has done a hardware performance check), we applied for several high-performance machines to perform a retest to verify that the cause was indeed machine performance. Later, when communicating with the official staff, I learned that the whole TiDB architecture of ** is futures-oriented and high-concurrency scenario of massive data. The underlying storage technology (such as seek for data positioning) is designed and optimized for the current mainstream SSD instead of the traditional SATA/SAS mechanical hard disks. ** As for official documentation and error messages, they are also being updated rapidly.

Looking forward to

After communicating with the official authorities in detail and listening to the sharing of the official architects, we plan to set up 2-3 sets of TiDB clusters to organize and summarize the main and secondary MySQL of various business lines in the near future, which can not only gradually unify and standardize the use of database, but also greatly reduce the waste of machine resources and operation and maintenance costs. At the same time with the help of TiDB to achieve the high availability of our database. The partners in other departments have also taken action. For a new key OLTP project of our company, we have decided to use Cloud TiDB (TiDB deployed on K8S) as the database system. At the same time, we learned that TiDB not only meets OLTP business, but also meets many OLAP business scenarios. After listening to the sharing, the partners in the big data group are also eager to try.

In the future, we will strengthen communication with PingCAP officials, conduct more in-depth research and development, continuously improve the performance of TiDB, and comprehensively apply it to the business of Ifeng.com.

✎ author: phoenix network engineer Charting the new building