The second part of this video tutorial is to quickly clarify the abstract concepts in TDEngine and learn how to plan the data model in a production scenario.
Click the link for a video tutorial.
Welcome to the world of data in the Internet of Things
In a typical iot scenarios, generally speaking, there are a variety of different types of acquisition device, collect a variety of different physical quantities, the same types of collection devices, often have multiple devices distributed in different locations, the system need to all sorts of acquisition of data, calculation and analysis for the same kind of equipment, the collected data is very regular.
In this paper, we take the smart meter (the acquisition amount is current and voltage) as an example to discuss how to build database, super meter and meter in TDEngine.
Assume that each smart meter collects two quantities, current and voltage, and the data it collects are shown in the figure below.
Each record has a device ID, a timestamp, a physical quantity collected (such as current and voltage in the figure above), and a static label associated with each device (such as Location and Groupid in the figure above). Each device is triggered by the outside world, or in accordance with a set period of data acquisition. The data points collected are sequential and a data stream.
So how does TDengine abstract this IoT data?
Here, the key innovation of Tdengine needs to be mentioned — one collection point, one table. Collection points of the same type are described by a super table, namely a table structure Schema and a static label Schema. As for the above figure, the electricity meter ID is used as the name of the child table (D1001, D1002, D1003, D1004, etc.), the dynamically collected physical quantity is used as each field, and the static attribute (Location and Groupid) is used as the label of the child table. The super table is used as a template to generate sub-tables corresponding to each collection point. The super table greatly facilitates the data retrieval, query and aggregation of the same collection point.
This design has several advantages:
- It can ensure that the data of a collection point is continuously stored in blocks on the storage medium. If the data is read over a period of time, it can significantly reduce random reads and increase read and query speed by orders of magnitude.
- Because the process of data generation by different acquisition devices is completely independent, the data source of each device is unique, and there is only one writer for a table, so the lock free way can be used to write, and the writing speed can be greatly improved.
- For a data collection point, the data produced by it is sequential, so the write operation can be implemented in the way of append, further greatly improving the data write speed.
If the data of multiple devices are written into a table in the traditional way, the timing of the data of different devices arriving at the server cannot be guaranteed due to the uncontrollable network delay, the writing operation should be protected by lock, and the data of a single device is difficult to ensure that they are continuously stored together. The method of one data collection point and one table can ensure the performance of single data collection point insertion and query is optimal to the greatest extent.
Basic methods of data modeling
TDengine uses the relational data model, and needs to build a database and a table. Therefore, for a specific application scenario, we need to consider the design of library, super table and ordinary table.
CREATE DATABASE DBNAME USE DBNAME CREATE TABLE STBNAME (TS TIMESTAMP, OTHER FIELDS...) Tags (Tag fields) CREATE TABLE tbname using stbname tags INSERT INTO tbname VALUES(now, VALUES...)
Create the library
Different types of data collection points often have different data characteristics, including the frequency of data collection, the length of data retention time, the number of copies, the size of data blocks, etc. To get the most out of TDEngine in a variety of scenarios, it is recommended that tables with different data characteristics be created in different repositories, since each repository can be configured with different storage policies.
When creating a library, in addition to the SQL standard options, an application can specify a number of parameters such as retention time, number of copies, number of memory blocks, time accuracy, maximum and minimum number of records in a file block, whether to compress, and the number of days a data file can be overwritten. For example, it is suggested to create a library for tables with the same data characteristics, each of which can be configured with a different storage policy.
CREATE DATABASE power KEEP 365;
This creates a library called power, whose data is kept for 365 days. For more parameters and syntax, see:
https://www.taosdata.com/cn/documentation20/taos-sql/
After creating the library, you need to USE the SQL command USE to switch the current library over, for example:
USE power;
Change the current action library to POWER. You can also use the “library name. Table name to specify the name of the operation library, table.
Introducing Super Table
A data collection point for a table means that 10 million smart electricity meters correspond to 10 million meters. For an Internet of Things system, there are often a large number of data collection points of the same type. Working with that many tables is a huge challenge. To facilitate the operation of multiple tables of the same type, TDEngine introduces super tables.
When creating a super table, you need to provide: table name, table structure Schema, tag Schema.
CREATE TABLE meters (ts timestamp, current float, voltage int) TAGS (location binary(64), groupdId int);
The columns of a super table are divided into two parts: the dynamic part and the static part.
The dynamic part is the collected data, the first column is the time stamp (TS), the other columns are the collected physical quantity (CURRENT, VOLTAGE).
The static part refers to the static attribute of the collection point and is generally used as a label. Such as the geographic location of the collection point, device model, device group, administrator ID, etc.
Tags can be added, removed, or modified afterwards.
TDEngine supports the following data types.
In-depth understanding of super tables
Simultaneous collection of the same table: In a super table, the collection physical quantities contained must be collected at the same time, that is to say, the timestamp is the same.
For a type of equipment, there may be multiple groups of physical quantities, each group of physical quantities is not collected at the same time, so it is necessary to build a separate super table for each group of physical quantities. Therefore, for a single type of device, you may need to create multiple super tables.
If the system has N different types of devices, you need to create at least N super tables.
A system can have multiple DB libraries, and a DB library can have one or more super tables.
Create a table/child table
TDengine needs to build tables independently for each data collection point; Because of the creation of super tables (meters), they are also called child tables (D1001); Create a super table as a template, and specify the value of the label; A super table that can contain several child tables with no limit on the number of child tables.
CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2);
D1001 is the name of the child table, meters is the name of the super table, followed by the Location tag value “Beijing.chaoyang”, and the tag value of groupID is 2. Label values need to be specified when creating a table/child table and can be changed later. It is recommended that the global unique ID of the data collection point be used as the child table name (such as the device serial number).
The child table is automatically created
In some special scenarios, when a user is writing data, he is not sure whether a child table exists or not. At this point, you can use the automatic table build syntax to create a non-existent table. If the table already exists, a new table will not be created.
INSERT INTO d1001 USING meters TAGS (" beijng.chaoyang ", 2) VALUES (now, 10.2, 219);
Table D1001 is not created yet. If you have not created a table, you will automatically create it by using a template with a value of “Beijing.chaoyang “, 2.
Multi-column model vs. single-column model
Tdengine supports both multi-column and single-column models.
- Multi-column model: As long as physical quantities are simultaneously collected at the same data collection point, these quantities can be grouped as separate columns in a super table.
- Single column model: Each physical quantity is separately tabulated. For example, current and voltage are two quantities, and two super tables are built.
We recommend: Use the multi-column model as much as possible because it is more efficient to insert and store. For scenarios where the types of physical quantities at a collection point change frequently, the single-column model can be used.
Examples of new energy vehicles
Scenario and modeling analysis
- A car company plans to track and analyze the new energy vehicles it produces and sells;
- Each vehicle is equipped with a remote acquisition terminal to collect vehicle status information: position (latitude and longitude), vehicle speed, battery temperature, battery current, ambient temperature, tire pressure;
- Background statistical analysis needs to be classified and aggregated according to models, sales regions, salespeople, battery pack capacity and motor power;
- The first four of the six volumes will be collected simultaneously and will be placed in a single super table – vehicle_main and the remaining two points will be created separately on vehicle_temp and vehicle_tire where the temperature is not at all the same as the frequency of tire pressure.
- Each vehicle has a unique code VIN, which is used with the table name prefix of the super table as the unique table name.
SQL statement example
CREATE DATABASE nev KEEP 3650; USE nev; CREATE TABLE vehicle_main (ts timestamp, longitude double, latitude double, vspeed int, btemp int, bcurrent int) TAGS (vin binary(30), model binary(20), szone binary(30), sales int, bcapacity float, mpower float); CREATE TABLE vmTS8392EGV062192009 USING vehicle_main TAGS ("TS8392EGV062192009", "GTS7180", "Beijing.haidian", "10060089", 86.0, 125.5); CREATE TABLE vehicle_temp (ts timestamp, vtemp int) TAGS (vin binary(30)); CREATE TABLE vtpTS8392EGV062192009 USING vehicle_vtemp TAGS ("TS8392EGV062192009"); CREATE TABLE vehicle_tire (ts timestamp, vpressure int) TAGS (vin binary(30)); CREATE TABLE vtrTS8392EGV062192009 USING vehicle_vtire TAGS ("TS8392EGV062192009"); SELECT ts, longtitude, longtitude, longtitude SELECT AVG(vspeed), AVG(btemp), AVG(btemp), AVG(btemp), AVG(btemp), AVG(bcurrent) FROM vehicle_main GROUP BY model
I believe that by this point, you have completely clarified the concept of library, table, super table in TDEngine, you can start to operate!
Follow the public account TDengine, reply “1203” in the background, and get the PPT corresponding to this tutorial.