Last week, Taos data and EMQ Meetup jointly released the industrial Internet integration solution. Based on TDengine and EMQ X, it builds a lightweight edge computing industrial Internet platform integrating industrial data collection, aggregation, cleaning, storage analysis and visual display. Currently TDengine fully supports ARM 32 and ARM 64 processors, so why is TDengine a more efficient storage option for edge side data? How is it better than SQLite? On Meetup, Tao Data co-founder Jiang Yi shared the technology behind this.
From the Internet to the mobile Internet and now our Internet of Things, computers, mobile terminals, wearable devices, cars, even lights at home and all kinds of devices in factories have been connected to the Internet. In general, a variety of devices continuously collect real-time status data and then collect the data into a computing platform on the cloud. This is the general idea of cloud computing for the Internet of Things.
The whole technology chain of the Internet of Things has four layers: collecting the status data of the device through sensors -> sending the data to the cloud through communication module -> storing, querying and computing in the cloud -> finally accessing the analysis and application system.
In cloud computing, however, data must be transferred to the cloud for centralized storage, archiving, analysis, and so on. The node on the edge may be a gateway, or it may be a terminal that we actually use. If it does not have its own computing capacity, it must send the collected data to the cloud, rely on the cloud computing resources for complex calculation, get a guiding conclusion, and then send it to the terminal through the network. It’s easy to see how much work the terminal does in this process depends on the network. If there is some interruption or failure of the network, the terminal can not interact with the cloud, some of its work will be greatly affected. Therefore, this kind of central side (cloud) master control idea has very high requirements for the communication between the side clouds, and high cost high-speed communication network is often used in application. On the other hand, as the volume of data continues to grow, so does the cost of storage and computing in the cloud.
A good way to solve this problem is edge computing, that is, part of the storage and computing capacity is sunk to the edge side (i.e. the device side), and the terminal device can independently store, calculate, make decisions and apply data. The edge becomes smarter, less cloud-dependent, more time-sensitive and no longer networked.
The advantages of edge computing can be summarized as follows.
But what is the difficulty of edge computing that has not been solved yet? As we know, the edge side is usually some small intelligent terminals that can be laid in large quantities. Considering the cost, the hardware resources such as memory, CPU and computing power are very limited. The difficulty of edge computing is to realize the most efficient data storage, analysis and computation with limited computing resources. This makes database selection on the edge particularly important. The data collected by the terminal devices on the edge side has obvious characteristics, which are generally structured time sequence data flow with time stamp. Therefore, the requirements of edge computing on database capability are reflected in the following aspects:
- Ultra high read/write performance
- Low hardware overhead
- Universal interface, suitable for various computing requirements on the edge
- Real-time data caching capabilities, streaming computing capabilities
- Persistent storage and efficient compression of historical data
- Historical data backtracking ability, statistical aggregation ability according to time window
- Cloud side synergy
TDengine — A big data engine more suited to the edge
Timing database is the best choice for edge side data storage. However, timing databases such as OpenTSDB (hBase-based transformation) and InfluxDB are too heavy for the edge side, and the hardware resource overhead is too high. An extremely lightweight open source timing database is TDengine, and the entire installation package is just over 2MB. Its core function is a high performance distributed time series database. In addition, it also has message queue, cache, streaming computing, data subscription and other functions, providing an all-in-one solution for sequential structured data storage.
Currently, TDengine community has released versions supporting ARM32 and ARM64 processors, which can run smoothly on mainstream edge hardware such as raspberry PI, and provide real-time data caching, historical data backtracking, time-by-time aggregation and other capabilities. Although the probability of using a distributed cluster on the edge is relatively small, if a few raspberry PI’s, boxes, or gateways want to build a cluster, that’s fine.
The TDengine ARM version also supports a wide variety of interfaces that are virtually indistinguishable from the normal cluster version. At the same time, a TaOS Shell client is provided so that the debugger can easily check the running status of TDengine.
TDengine edge cloud collaborative thinking
The resources on the edge side are limited, and the total amount of data that can be stored is also limited. Therefore, data backup and collaboration should still be made to the cloud. There are also a lot of edge cloud collaborative ideas, here are some of our ideas.
Let me give you an example to help you understand. There are many gateways in the edge side factory. We can install an edge side version of TDengine in each gateway, so TDengine will become a storage engine in the edge side, and the data collected by the gateway can be stored persistently. Depending on the data collection frequency and compression, the edge side can selectively store the original data for a certain period of time (for example, one month to six months) based on existing storage resources. For integer or floating point data, TDengine can compress it down to about 10%, depending on the data type. If the value of the data varies greatly at random, the compression ratio can be affected, but overall, the compression ratio is still around 10%. Therefore, if we put a 2GB or even 1GB SD card in the gateway, we can store about 10GB of raw data. This magnitude is sufficient for edge-side real-time analysis.
However, if longer historical data needs to be stored and further analysis, such as big data mining, data needs to be synchronized to the cloud data center for storage. Versions of TDengine on the edge can be accessed directly by TDengine clients in the cloud (when the network is open), so data synchronization from the edge to the cloud is very simple. Cloud applications can pull the latest data from the edge gateway in real time through the SUBSCRIPTION module of TDengine, and then write the received incremental data into the local TDengine cluster for historical archiving. The implementation is essentially a timed query, so TDengine allows users to add data filtering criteria and selectively synchronize data on the edge (for example, only pull records that exceed a certain threshold) without having to report all historical data to the cloud.
Based on TDengine’s edge storage advantages and the overall idea of edge cloud collaboration, Taos Data and EMQ also jointly developed an edge solution. To put it simply, EMQ X Neuron, EMQ X Edge, EMQ X Kuiper and TDengine are deployed in the edgeside gateway. The streaming data collected by the device is analyzed by Neuron protocol and converted into MQTT message, and then Edge (edgeside MQTT Broker) is released. This is then deposited via Kuiper into TDengine, which is deployed on the edge. Applications running on the edge can then fetch and process data from TDengine for real-time display and alarm. EMQ’s Edge Manager, which runs on the Edge, provides an administrative console that makes it easy to configure software and manage the other three components. Click on the”here”To learn the configuration method of the solution in detail. Such a programme would be tantamount to handing over coordination to EMQ.
However, there may be users who already use TDengine clustering in the cloud and now have industrial devices that want to access TDengine directly from the TDengine Cluster client on the edge. This can also be directly realized through TDengine’s data subscription module, that is, cloud applications call the data subscription module to create a series of subscription tasks, directly to real-time pull the latest incremental data in TDengine edge side. This kind of solution leaves the work of collaboration to TDengine, of course, so that the network is smooth.
Compilation of TDengine Edge version on Raspberry PI
Here are some practical steps to compile, install, and run TDengine on raspberry PI.
Environment to prepare
1. Burn operating system
Burn operating system to SD card. TDengine supports mainstream operating systems such as Ubuntu16.04, CentOS7.0 and later.
2. Set the network
Configure the network environment on the raspberry PI, set the static IP and hostname for the development version, and connect to the network.
3. Download and compile TDengine
From www.github.com/taosdata/TDengine clone TDengine source to raspberries pie, compile and run.
The build process
# clone source code $ git clone --recursive --recurse-submodules https://github.com/taosdata/TDengine.git # checkout to The latest version $CD TDengine/ $git checkout ver-2.0.7.0 # compile and install $mkdir build && CD build $cmake.. / -dcputype = aarch64-dvernumber =2.0.7.0 -dvercompatible =2.0.0.0 $make && make install # start taosd $systemctl start taosd $ taosdemoCopy the code
After the compilation and installation is complete, you can see the TaosDemo program provided by us, which is convenient for you to experience the extreme speed. You can use TaosDemo to test TDengine’s data writing and query efficiency.
A simple comparison between TDengine and SQLite
Data storage in edge-side, embedded devices has to be called SQLite. SQLite is an ultra-lightweight database that does not require a background, plug and play, and is the most installed database in the world. SQLite even identifies itself on the website as a reference to fopen() rather than a database: Think of SQLite not as a replacement for Oracle but as a replacement for fopen() SQLite is a compact library. SQLite, of course, provides a series of apis that are standard relational databases, and it even supports transactions, so the industry often uses it as an embedded relational database.
For comparison, SQLite on Linux is 1.9MB and TDengine is 2.7MB. Both are extremely lightweight. As TDengine is a solution specifically for sequential structured data, it does not support transaction and complex table relation processing, but provides temporal index of sequential data, real-time stream calculation, column storage and better compression ratio, down-sampling aggregation capability by time, data retention time, etc. In this respect, TDengine is closer than SQLite to the processing needs of timing data in edge-side production environments. TDengine edge version can also achieve seamless connection of products in the cloud. If the network is not smooth, TDengine can realize automatic data caching and automatic transmission after networking, realizing the ability of edge cloud collaboration. Here’s a diagram to briefly summarize the differences between TDengine and SQLite.
As a representative of the new time sequence database, TDengine has many advantages, and it really challenges the grandmaster SQLite in the storage selection of the edge side, which is really a little young people don’t talk about military virtues. However, it is important to realize that TDengine and SQLite have different priorities to deal with. They do not have to choose between them, but can be used flexibly according to their own business needs. TDengine handles timing data, while SQLite handles relational data, so as to better realize data autonomy on the edge.
Follow the public account “TDengine” and reply to “1117” in the background to obtain the complete PPT