The author | Zhao Yunzhou

In the business around, we use Nginx as our reverse proxy, and real-time status monitoring of Nginx is required to ensure the availability of the proxy layer. In the choice of server base monitoring, we gradually replaced OpenFalcon with Nightingale, which was also used initially for reqstat monitoring of Nginx. However, these two monitors have a common disadvantage, that is, when displaying the number of bars, the number of domain names multiplied by the number of machines increases, which cannot meet the demand.

In order to solve this problem, we consider the upgrading of existing monitoring module, a new database selection, in the pre and analysis stage, according to the current needs of the business we chose two models from open source database sequence database, is InfluxDB and TDengine respectively, which can achieve high performance query and data storage scheduling, However, TDengine has three advantages over InfluxDB:

  • The clustering feature is open source and supports horizontal scaling and high availability

  • The balance between performance and cost is optimized, and the difficulty of operation and maintenance is significantly reduced

  • Its super table is particularly suitable for our domain-based monitoring scheme

Through comprehensive comparison, we choose TDengine as the database of monitoring module. In addition, taos official test results showed that TDengine wrote faster than InfluxDB, which confirmed our decision to choose TDengine.

TDengine also supports a variety of data interfaces, including C/C++, Java, Python, Go, and RESTful. The previous service used Python, so it is convenient to continue using the Python Connector.

Use TDengine for database modeling

As a structured sequential database, TDengine needs to design schema according to the characteristics of data before accessing data in order to achieve optimal performance.

Nginx monitoring data features are as follows:

  • Fixed data format: After req_status_zone is configured, the log file is fixed, but the total number of fields is limited. For example:

  • Data rarely needs to be updated or deleted

    • Factual data that is accessed by the service is not deleted as long as it is not dirty
  • The number of data tags to be collected is small and fixed

  • The amount of a single piece of data is about 1KB

  • Keep for more than 6 months

In addition, TDengine’s documentation shows that:

TDengine creates a separate table for each collection point, but it is often necessary to aggregate different collection point data in practical applications. To efficiently aggregate operations, TDengine introduces the concept of super tables (stables). A super table is used to represent a specific type of data collection point. It is a collection of tables containing multiple tables. Each table in the collection has exactly the same schema, but each table has its own static label, which can have multiple tags and can be added, deleted and modified at any time.

Following its suggested data model, we need to create a super table. Combined with our data characteristics and usage scenarios, the data model is created as follows:

  • Super table: Uses indicators as the super table

  • Subtable: Create a subtable for each domain name

    • Tag: Directly uses the tag information as the tag column of the super table
    • Column: Monitors data without labels

Specific examples are as follows:

Implementation and final effect display

Because it is a new database integration, there will inevitably be some problems in the real implementation. The following three points are the implementation experience we have summarized and put in this article for your reference:

  • Data is written to

In the stage of data writing, we started to design one domain name and one subtable, but using domain name as table name directly does not conform to the specification of reserved characters, so we need to convert the domain name.

  • Query problem

Since the data is written to live values, and the monitoring business needs to obtain the difference between the front and back values more, the TDengine function DIFF needs to be used. Officially, from 2.1.3.0 onwards, the DIFF function can be used for supertables (i.e., GROUP BY TBNAME) with a separate timeline from GROUP BY. Moreover, TDengine’s super table greatly simplifies the query code, and its sharding feature also ensures that multiple domains can be queried simultaneously with sufficient multi-core concurrency.

  • Capacity planning

In the landing process, we found that the data type and data scale have a great impact on TDengine performance. It is better to plan the capacity according to the characteristics of each scenario. The influencing factors include:

  1. Table number
  2. The length of the data
  3. replications
  4. Table activity, etc

Adjusting configuration parameters based on these factors ensures optimal performance. After communicating with The Data engineers at Taos, we have determined the current capacity planning calculation model. It is worth noting that the difficulty of TDengine capacity planning lies in memory planning, which requires a balance between memory usage and read/write performance.

After connecting TDengine, our current system topology is as follows:

After using TDengine to complete the transformation, the online monitoring status meets the expectation, meets the current business requirements, and runs very stably at present. With Grafana, traffic, connection count, response time and other information of each domain name can be monitored in real time.

Write in the last

All in all, TDengine has great advantages both in terms of cost and performance and in terms of ease of use, which has been proved in our practice, especially in terms of cost control. At the same time, we would like to thank our colleagues at Taos for their professional and timely help. We also hope that TDengine can develop more and better new features in the future. Of course, as TDengine users, we also contribute code to TDengine on GitHub.

In addition, from our own projects and practices, we also have some features that we would like to improve on TDengine:

  • More friendly table name support: less special character masking (said to be coming in later versions, so it’s closer to the scene and saves special processing on the application side)

  • Support for richer SQL statements: The ability to provide more flexible SQL statements for rare scenarios, facilitating more complex analysis, which is an advanced part of AIOps

  • Grayscale smoothing upgrade: TDengine currently maintains a biweekly release pace, but expects new features to be introduced quickly. However, every downtime will be a troublesome thing to upgrade, looking forward to official support for rolling upgrade as soon as possible

  • Custom aggregation methods can be implemented: Due to time issues, official UDF features were not caught up. The official UDF is expected to be released soon so that more complex aggregation computing can be implemented

  • Subtable automatic clearing function: Due to the offline problem of domain names, the current TTL policy is only for data rather than Table itself. Eliminating subtables also requires human operation and maintenance

Despite its shortcomings, TDengine was a good first try, and we look forward to working with TDengine in more scenarios in the future, including more monitoring items and access to business timing database requirements. Finally, wish TDengine better and better!