In the previous article, we described how to migrate from OpenTSDB to TDengine in an operations monitoring scenario.
If the application is particularly complex, or if the application area is not an o&M monitoring scenario, this article will cover the advanced topic of migrating OpenTSDB applications to TDengine in a more comprehensive and in-depth manner.
Migration assessment and strategy for other scenarios
1. Differences between TDengine and OpenTSDB
This section describes in detail the differences between OpenTSDB and TDengine at the system function level.
After reading this section, you can fully evaluate whether you can migrate some complex OpentSdb-based applications to TDengine and what to look out for after the migration.
TDengine currently only supports visual kanban rendering in Grafana, so if an application is using other kanban (TSDash, Status Wolf, etc.), it cannot be migrated directly to TDengine and will need to be re-adapted to Grafana to run properly.
As of 2.3.0.x, TDengine can only support Collectd and StatsD as data collection aggregation software, of course, more data collection aggregation software will be provided in the future. If the collecting end uses other types of data aggregators, the data can be written to the two systems only after being adapted to them. In addition to the above two data aggregation software protocols, TDengine also supports data writing using InfluxDB’s row protocol, OpenTSDB’s data writing protocol and Json format. You can rewrite the logic on the data push side and use TDengine’s row protocol to write data.
In addition, if your application uses the following features of OpenTSDB, you need to be aware of the following considerations before migrating:
/api/stats
: If an application uses this feature to monitor the OpenTSDB service status and has established related logic in the application for linkage processing, the logic for reading and obtaining the OpenTSDB status needs to be adapted to TDengine. TDengine provides a new mechanism for processing cluster status monitoring to meet the monitoring and maintenance requirements of applications./api/tree
: If you rely on this feature of OpenTSDB for hierarchical organization and maintenance of timelines, you cannot migrate them directly to TDengine. TDengine uses the hierarchy of database -> super table -> sub-table to organize and maintain timelines. All timelines belonging to the same super table are in the same hierarchy in the system, but can simulate the multi-level structure of application logic through the special construction of different tag values.Rollup And PreAggregates
There are Rollup and PreAggregates, and applications need to determine where the Rollup results are accessed, and in some cases the original results. The opacity of this structure makes application processing logic extremely complex and completely non-portable. We see this strategy as a compromise in a situation where sequential databases do not provide high performance aggregation. TDengine does not support automatic downsampling and pre-aggregation of multiple timelines. Due to its high-performance query processing logic, TDengine provides very high performance query responses without relying on Rollup and pre-aggregation results, and makes your application’s query processing logic much simpler.Rate
TDengine provides two functions to calculate the rate of change, namely Derivative (Derivative behavior consistent with that of InfluxDB) and IRate (calculated result consistent with that of IRate function in Prometheus). However, the calculation results of these two functions are slightly different from Rate, but they are more powerful overall. In addition,All the calculation functions provided by OpenTSDB are supported by TDengine’s corresponding query function, and the function of TDengine’s query function is far more than that supported by OpenTSDB.Application processing logic can be greatly simplified.
With this information, you should be able to understand what changes OpenTSDB’s migration to TDengine will bring. This information will also help you decide whether it is acceptable to migrate your application to TDengine to experience the powerful timing data processing capabilities and ease of use provided by TDengine.
2. Migration strategy
Firstly, data mode design, system scale estimation, data writing end transformation, data diversion and application adaptation are involved in the migration of the system based on OpenTSDB. Then run the two systems in parallel for a while and migrate the historical data to TDengine. Of course, if you have an application that relies heavily on the OpenTSDB feature and don’t want to stop using it, you can keep the OpenTSDB operating system running and start TDengine to provide the main services.
Data model design
On the one hand, TDengine requires strict schema definitions for the data it brings into the repository. On the other hand, TDengine’s data model is more abundant than OpenTSDB’s, and the multi-value model is compatible with all the requirements of the single-value model. Now let’s assume an operations monitoring scenario where we use CollectD to collect the basic metrics of the device, including memory, swap, and Disk. The pattern in OpenTSDB is as follows:
TDengine requires data to be stored in a data schema, that is, a supertable must be created and its schema specified before data is written. For data schema establishment, there are two ways to accomplish this:
1) Make full use of TDengine’s support for OpenTSDB’s data native writing, call the API provided by TDengine to write (text line or JSON format) data, and automatically establish a single-value model. This approach does not require major adjustments to the data writing application or conversion of the data format. At the C level, TDengine provides taOS_insert_lines to write data directly in OpenTSDB format (in version 2.3.x this corresponds to taOS_schemaless_INSERT). For example code, see schemaless.c in the installation package directory.
2) On the basis of fully understanding the TDengine data model and combining the characteristics of generated data, manually establish the mapping relationship between OpenTSDB and TDengine data model adjustment. TDengine can support both multi-valued and single-valued models. Considering that OpenTSDB is a single-valued mapping model, it is recommended to use a single-valued model for modeling in TDengine.
- Single value model
The steps are as follows: The name of metrics is used as the name of TDengine supertable, which has two basic data columns — timestamp and value. The labels of the supertable are equivalent to the label information of metrics, and the number of labels is equal to the number of labels of metrics. Subtables are named with fixed rules: metric + ‘_’ + TAGs1_Value + ‘_’ + TAG2_value + ‘_’ + TAG3_value… As a child table name.
Create 3 supertables in TDengine:
Create stable Memory (TS timestamp, val float) tags(host binary(12), memory_type binary(20), memory_type_instance binary(20), source binary(20)); create stable swap(ts timestamp, val double) tags(host binary(12), swap_type binary(20), swap_type_binary binary(20), source binary(20)); create stable disk(ts timestamp, val double) tags(host binary(12), disk_point binary(20), disk_instance binary(20), disk_type binary(20), source binary(20));Copy the code
Create a dynamic table for the subtables as follows:
Insert into memory_vm130_memory_bufferred_collectd using memory tags(' vm130 ', 'memory ', 'buffer', 'collectd) values (1632979445, 3.0656);Copy the code
Eventually there will be 340 or so sub-tables, three supertables. Note that if the concatenation of label values results in a subtable name exceeding the system limit (191 bytes), some encoding (such as MD5) is required to convert it to an acceptable length.
- Multiple value model
To take advantage of TDengine’s multi-valued model capabilities, the following requirements must be met first: different data acquisition measures have the same acquisition frequency, and can reach the data writing end through the message queue at the same time, so as to ensure the use of SQL statements to write multiple indicators at a time. Using the name of the measure as the name of the super table, a multi-column data model with the same collection frequency that can be arrived at the same time is established. Table names of child tables are named in a fixed way. Each of the above measures contains only one measurement, so it cannot be converted to a many-valued model.
Data diversion and application adaptation
Subscribe to the data from the message queue and start the adjusted writer to write the data.
After data is written for a period of time, you can use SQL statements to check whether the amount of data written meets the expected write requirements.
The following SQL statement is used to collect data:
select count(*) from memory
Copy the code
After the query is complete, if the data written is the same as expected, and there is no exception error message from the writing program, you can confirm that the data written is complete and valid.
TDengine does not support query or data retrieval processing using OpenTSDB’s query syntax, but does provide support for each query in OpenTSDB. For details, please refer to relevant documentation.
TDengine supports database manipulation using the standard JDBC 3.0 interface, and can also use connectors in other types of high-level languages to query and read data to suit applications. Please refer to the user manual for detailed operation and help.
Historical Data Migration
1. Use tools to automatically migrate data
In order to facilitate the migration of historical data, we provide a plug-in for DataX, a data synchronization tool, to automatically write data to TDengine. It is important to note that DataX’s automatic data migration can only support the single-value model data migration process. For details on how to use DataX and how to use DataX to write data to TDengine, see its help manual github.com/taosdata/da… .
2. Manually migrate data
If you need to use the multi-valued model to write data, you need to develop a tool to export data from OpenTSDB, then determine which timelines can be merged into the same timeline, and then write the time that can be imported simultaneously into the database through SQL statements.
Note the following questions when manually migrating data:
1) To store exported data in a disk, the disk must have sufficient storage space to accommodate the exported data file. To avoid disk file storage shortage after all data is exported, you can use partial import mode to export timelines that belong to the same super table first, and then import the exported data files to TDengine.
2) When the system runs at full load, if there are enough remaining computing and IO resources, a multi-threaded import mechanism can be established to maximize the efficiency of data migration. Given the heavy load that data parsing places on the CPU, the maximum number of parallel tasks needs to be controlled to avoid overall system overloads triggered by importing historical data.
Because TDegnine itself is easy to operate, there is no need to maintain the index and change the data format during the whole process. The whole process only needs to be executed in sequence.
When the historical data is fully imported into TDengine, the two systems are running at the same time, and the query requests can then be switched to TDengine for seamless application switching.
🌟 Click to explore taosAdapter!