Abstract: Recently, Ali Cloud announced the formal commercialization of high-performance Time Series Database (HiTSDB).


Recently, Ali Cloud announced the formal commercialization of high-performance Time Series Database (HiTSDB).


Let me tell you a little bit about temporal data. In simple terms, it’s a series of values distributed over time. The key word is value. We generally think of time series data as what happens at what time, but the time series data defined in the field of time series data is all about value. In other words, a single piece of data with a timestamp is not called sequential data. For example, if I went upstairs for dinner at 8:30 in the morning, this is like a log, which does not constitute a sequential data, but if there are 50 people eating in a restaurant at 8:30 in the morning, this 50 plus the information of the restaurant plus the time point constitutes a sequential data.


Single-value and multi-value modeling


There are actually two common approaches to modeling, one of which is single-valued. We actually be modeled for different things, multivalued model for data modeling, we each line of data for a data source, its three indicators is measured in the same column, so each data source, the source of the data at each point in time is a line, this is the model more value.
Is single value model, and a model of single value of the model we are measuring it precisely to the time sequence, also in the time series of each point in time only a value, so a single value, that is to say, each line for multivalued model which corresponds to a data source, data for single value model, it is a time series of the corresponding, In fact, the multi-valued model corresponds to a data source that generates a row of data at a point in time, while in the single-valued model, each index on a data source generates a row of data.


What are interpolation and precision reduction?


As mentioned above, time series will be distributed on some time lines. If data source and measurement index are determined, time series will be distributed backward along the time axis. In fact, its sampling is at a fixed time interval in a typical scene, and processing of some points in it will involve interpolation and precision reduction. For example, if you lose a point in the middle, the easy way is to insert a value in the middle, and the common way is linear interpolation, which is to draw a straight line on the time axis and insert the point in the middle.
The other is called reduced precision. For example, if we have a time series sampled in seconds, showing data in a time range of one year, we need to reduce the time precision to one day for easy viewing. For example, we just pick the maximum or minimum or average of the day, and we use that as the temperature of the day, the highest temperature, the lowest temperature and the average temperature. Using algorithms or converting time series data into low-precision time series for observation and understanding is a way that is not available in traditional databases.
Another is data aggregation, is a very typical based on the data of the equipment, such as there are many devices index data of time line, the time-series data aggregation is on the dimension of time line, rather than according to the point, in the space of treatment usually polymerization, is generally to gather a lot of data points according to the individual, In the actual data processing, the abstract points are generally connected into lines, which is the time series just seen. Each data source will generate a time line on a measurement value, plus the time series. If the measurement is based on a dimension, the time series can be adjusted into lines in the same dimension and processed out.


Wisdom park, for example, business systems need to look at a building the power consumption of a lamp, you will need to put the lamp power consumption data from the database query and display, if due to the failure of some point power consumption data is missing, then need to by a specific algorithm to approximate estimate the data, This process of calculating the completion data is called interpolation. When it is necessary to check the power consumption trend of the lamp for a year, it is usually only necessary to calculate the power consumption of each day for viewing, rather than output all the data collected at each moment. This process of converting the original accuracy into the accuracy required by business is called “precision reduction”. In order to collect the overall trend data of power consumption of a certain floor or building, it is necessary to conduct a “combined statistics” of the power consumption data of all lamps within the statistical range. This similar statistical process is Aggregation.


Time series data precision reduction is done in the dimension of time series. For a relational database, the time series dimensions are first taken out and interpolated in the middle, whereas SQL actually operates on a point-by-point basis. So if you want to do drop accuracy, need to use a value to query the whole article on the time series data query, inserted between the time sequences of good value to do aggregation, then the throughput between services and SQL server is very big, the equivalent of SQL is a data channel to pull all the values out operation again, the query performance will be very slow, In addition, every calculation needs to re-pull data, which is hundreds of times worse than HiTSDB. The spatial aggregation Aggregator also supports comprehensive ADhoc queries. HiTSDB improves the efficiency of time series retrieval by introducing inverted indexes and data shardings. The overall computing performance is significantly improved.


HiTSDB high compression technology reduces storage costs by 90%


The Internet of Things field is the most typical scene of time series data generation and application. These scenes have some characteristics, especially large amount of technical data, such as the temperature sensor of an intelligent device. First, time series will continuously generate a large amount of data, what does continuous generation mean? Because we often is regular sampling function for time series, if the measurement time, every second day is 86400 seconds, if in 24 hours, on average every sensor instrument at a point in time to produce a data point, an instrument was born 86400 data, if the national each county cloth a sampling point, the day has hundreds of millions of data, In fact, it is not enough for each county to have a temperature sensor for meteorological sampling. Maybe we have such sensors for every street or even every community, so the data add up to a very amazing number.


The high compression technology of HiTSDB is used to increase the size of the original timing data by about 10 times compared with OpenTSDB. Generally, the size of the original timing data is 200-300 bytes, and the singular data of OpenTSDB consumes about 20 bytes and the singular data of HiTSDB consumes about 2 bytes. HiTSDB can save more than 90% of the database storage cost.


For the Internet of Things platform, enterprises can use the product capabilities of HiTSDB and Ali Cloud to build the Internet of Things platform on the cloud based on the following architecture.
In addition to providing efficient time series data service capabilities on the cloud, enterprises can and can use HiTSDB to realize intelligent manufacturing and smart cities by combining the big data solutions of Ali Cloud industrial brain and urban brain. Using the “edge + center” solution of HiTSDB, it can meet the requirements of the industrial Internet of Things (IoT), especially the power and energy industry, local storage analysis at the edge of data, step by step data reporting, and stable data reporting of network instability and global device data monitoring and analysis at the center, and open up the data channel of the intelligent brain.
During the commercial launch of HiTSDB, 15% discount will be offered on the official website. Click on the official website of HiTSDB for more details.


The original address
To read more articles, please scan the following QR code: