Author: Yin Fei

Small T introduction: SF Tech’s big data cluster needs to collect massive monitoring data every day to ensure the stable operation of the cluster. Although OpenTSDB+HBase was used as the storage solution for full monitoring data of the big data monitoring platform, it has many pain points and must be reformed. We chose TDengine after investigating several timing data storage solutions such as IoTDB, Druid, ClickHouse, and TDengine. Big data monitoring platform using TDengine has great improvement in stability, write performance, query performance and other aspects, and the storage cost is reduced to 1/10 of the original scheme.

Scenes and pain points

Sf Express technology is committed to building intelligent brain, building intelligent logistics services, and continuously ploughing deep into the fields of big data and products, artificial intelligence and applications, and comprehensive logistics solutions, taking a leading position in China’s logistics technology industry. To ensure the smooth operation of various big data services, we built a big data monitoring platform around OpenFalcon. OpenFalcon uses RRDTool as data storage, which is not suitable for storing full monitoring data. Therefore, OpenTSDB+HBase is used to store full monitoring data of the big data monitoring platform.

Currently the entire platform averages billions of writes per day. With the increasing amount of data accessed by the big data monitoring platform, we have many pain points to solve, including the problems of excessive dependence, high cost of use and unsatisfactory performance.

  • High dependency but low stability: As an underlying big data monitoring platform, it relies on big data components such as Kafka, Spark, and HBase for data storage. A long data processing link reduces the reliability of the platform. In addition, the platform depends on the big data component, and the monitoring of the big data component depends on the monitoring platform. If the big data component is unavailable, the monitoring platform cannot locate the fault in a timely manner.

  • High usage cost: A large amount of data is written into monitoring data, and full monitoring data needs to be saved for more than six months to trace problems. Therefore, based on capacity planning, we use four-node OpenTSDB+ 21-node HBase as a full monitoring data storage cluster. After compression, 1.5T (3 copies) of storage space is still needed every day, resulting in high overall cost.

  • Performance fails to meet requirements: As a storage solution for fully monitored data, OpenTSDB meets requirements in writing performance, but fails to meet requirements in large-span and high-frequency daily queries. On the one hand, the OpenTSDB query returns results slowly, requiring more than ten seconds in the case of a large time span. OpenTSDB, on the other hand, supports low QPS, and as more and more users grow, OpenTSDB tends to crash and the entire service becomes unavailable.

Technology selection

To solve the above problems, it is necessary to upgrade the full monitoring data storage solution. In terms of database selection, we have done pre-research and analysis on the following databases:

  • IoTDB: a newly hatched Apache top-level project, contributed by Tsinghua University, with good single-machine performance. However, we found that cluster mode was not supported during our investigation, and the single-machine mode could not meet the requirements in terms of disaster recovery and expansion.

  • Druid: a distributed system with high performance and scalability. It is self-healing, self-balancing, and easy to operate. However, it relies on ZooKeeper and Hadoop as deep storage and has high overall complexity.

  • ClickHouse: Best performance, but high maintenance costs, extremely complex scalability, and resource intensive.

  • TDengine: meet performance, cost, operation and maintenance difficulties, support horizontal expansion, and high availability.

Through comprehensive comparison, TDengine is selected as the monitoring data storage scheme. TDengine supports a variety of data import modes, including JDBC and HTTP, which are convenient to use. Due to high performance requirements for monitoring data writing, we finally adopted Go Connector. The following operations are required during the access process:

  • Data cleaning, eliminate the format of the wrong data;

  • Data formatting, converting data into entity objects;

  • SQL statement splicing, judge the data, determine the written SQL statement;

  • Data is written in batches. To improve writing efficiency, data is written in batches after a single piece of data is stitched together in SQL.

Data modeling

TDengine needs to design schemas based on the characteristics of the data before accessing it to achieve the best performance. The data features of the big data monitoring platform are as follows:

  • Fixed data format, with its own timestamp;

  • The uploaded data content is unpredictable. New nodes or services will upload new label content. As a result, data models cannot be created uniformly in the early stage and need to be created in real time based on data.

  • There are not many data label columns, but the label content changes a lot. The data value column is relatively fixed, including time stamp, monitoring value and sampling frequency;

  • The data amount of a single piece of data is small, about 100 bytes.

  • Daily data volume is large, more than 5 billion;

  • Keep for more than 6 months.

According to the above characteristics, we constructed the following data model.

According to the data model suggested by TDengine, each type of data collection point requires a supertable, such as disk utilization, which can be collected from disks on each host and abstracted into a supertable. Combined with our data characteristics and usage scenarios, the data model is created as follows:

  • The index is used as the super table to facilitate the aggregation analysis and calculation of the same type of data.

  • The monitoring data contains label information. The label information is directly used as the label column of the super table. The same label values form a sub-table.

The library structure is as follows:

The supertable structure is as follows:

The implementation of landing

The big data monitoring platform is the base for the stable operation of the upper big data platform, which needs to ensure the high availability of the whole system. As the volume of monitored data increases, you must ensure that the storage system can be easily scaled horizontally. Based on the above two points, the overall architecture of TDengine is as follows:

In order to ensure the high availability and scalability of the whole system, we use nGINx cluster for load balancing at the front end to ensure high availability; The client layer is separated to facilitate capacity expansion and reduction based on traffic requirements.

The implementation difficulties are as follows.

  • Data writing: The upload interface for monitoring indicators is open and verifies only the format. Therefore, the data indicators to be written are uncertain, and the super table and sub-table cannot be created in advance. This checks whether a new supertable needs to be created for each data item. If you need to access TDengine every time you make a judgment, the write speed will drop too fast to meet the requirement. In order to solve this problem, the local cache is established. In this way, the TDengine only needs to be queried once, and the subsequent data of related indicators can be directly written in batches, which greatly improves the writing speed. In addition, the batch creation speed of tables before 2.0.10.0 is very slow. To ensure the write speed, you need to create tables and insert data in batches, and need to cache the data information of child tables. Later versions have optimized the sub-table creation function, greatly improving the speed and simplifying the data insertion process.

  • Query problems: 1. Query bugs. The monitoring platform data was mainly displayed through Grafana, but in the process of using, we found that the official plug-in did not support parameter setting, so we modified it according to our own needs and provided it to the community. In addition, in the process of using, triggered a serious query bug: when setting more kanban, refreshing the page will cause the server to crash. After investigation, it was found that the problem was caused by a dashboard refresh in Grafana that initiated multiple query requests at the same time and processed concurrent queries, which was later repaired by the authorities. 2. Query single point problems. TDengine native HTTP queries are done directly by querying specific servers. This is risky in a production environment. First, all queries are centralized on one server, which can easily lead to overload on a single machine. In addition, there is no guarantee that the query service will be highly available. Based on the above two points, we use Nginx cluster as reverse proxy in front of TDengine cluster to evenly distribute query requests to each node, which can theoretically expand query performance indefinitely.

  • Capacity planning: Data types and data scales have a great impact on TDengine performance. It is recommended to plan capacity for each scenario based on its own characteristics, including the number of tables, data length, number of copies, and table activity. Adjust configuration parameters for these factors to ensure the best performance, such as blocks, Caches, rational Fquerycores, and so on. The capacity planning calculation model of TDengine is determined according to the communication with Taos engineers. The difficulty in capacity planning for TDengine is memory planning. Generally, a three-node 256GB memory cluster supports a maximum of about 2000W of sub-tables. If the number of sub-tables continues to increase, the write speed will decrease and some memory space must be reserved for query cache. If the number of child tables exceeds 2000w, you can expand new nodes to share the pressure.

Modification effect

After the transformation, TDengine cluster can easily carry the full monitoring data write, and it runs stably at present. The composition of the rear frame is as follows:

  • Stability: After the transformation, the big data monitoring platform gets rid of the dependence on big data components and effectively shorten the data processing link. It has been running stably since it went online, and we will continue to observe it in the future.

  • Write performance: the write performance of TDengine is closely related to the write data. After adjusting related parameters according to the capacity planning, the cluster write speed can reach the maximum write speed of 90w /s under ideal conditions. Under normal conditions (new table, mixed insert), write speed is 20W /s.

  • Query performance: In terms of query performance, in the case of using predictive function, the query P99 is within 0.7 seconds, which has been able to meet most of our daily query needs; In the case of large-span (6 months) non-predictive query, the first query takes about 10 seconds, and the subsequent similar query takes a significant decrease (2-3s), mainly because TDengine caches the latest query results. Similar queries first read the existing cached data, and then aggregate the new data.

  • Cost: The number of physical servers is reduced from 21 to 3, and the daily required storage space is 93G (2 copies), which is only 1/10 of that of OpenTSDB+HBase for the same copy. It has great advantages over the general-purpose big data platform in cost reduction.

conclusion

Currently from the big data monitoring scenario, TDengine has very big advantages in cost, performance and ease of use, especially in cost brings great surprise. We would like to express our gratitude to the engineers of Taos for their professional and timely assistance during the pre-research and project landing process. We hope TDengine will continue to improve performance and stability, develop new features, and contribute code to the community based on our own needs. Best wishes to TDengine. For TDengine, we also have some feature points we are looking forward to improving:

  • Friendlier table name support;

  • Support for other big data platforms, joint query requirements;

  • Support for richer SQL statements;

  • Gray level smooth upgrade;

  • Sub-table automatic cleaning function;

  • Recovery speed of the cluster abnormally stopped.

In the future, we will try to apply TDengine in more sf Tech scenarios, including:

  • Internet of Things platform, as the underlying Internet of Things data storage engine to build SF Technology big data Internet of Things platform;

  • Hive on TDengine: Enables Hive on TDengine to query data jointly with other data sources so that the data can be used in the existing system smoothly, lowering the access threshold.