Author: Chen Yu | Tao Si Data

TDengine is a high-performance time series database that provides powerful data analysis capabilities. In the first section of the TDengine website, the description reads: “Whether the data is 10 years old or 1 second old, the specified time range can be queried. Data can be aggregated on a timeline or across multiple devices.”

Today, our hero is INTERVAL, a powerful function that can be aggregated on a timeline.

INTERVAL is an important feature of TDengine that allows us to reduce the frequency of data collection — that is, reduce sampling. Take a simple example: suppose we have a device’s data for a year, and the frequency of time data is one day, so there are 365 pieces of data. Now, if we want to count by the frequency of ‘month’, the number of data becomes 12.

According to the grammar description on the official website, there are three relevant functional modules:

  1. The INTERVAL itself
  2. SLIDING
  3. INTERVAL OFFSET

For the sequential database based on the processing of sequential data, how to flexibly use time frequency to calculate and analyze data is very important. Below, we give a simple example of application scenarios and make specific explanations for the above three functional modules:

INTERVAL: queries the average temperature and pressure recorded by temperature sensor T1 every five minutes

select avg(t), avg(p) from t1 interval(5m);
Copy the code

In the simplest case, INTERVAL is responsible for specifying the time range window, and the aggregate function AVG calculates the average for that time range. You can also use a selection function such as MAX/MIN to calculate the maximum/minimum value within this time range. (In TAOS SQL, aggregate functions refer to functions such as COUNT/AVG/TWA/SUM that are used to join and then calculate from the dataset, and select functions such as MIN/MAX/FIRST/LAST/LAST_ROW that are used to filter results from the dataset.)

INTERVAL is essentially a time version of Group by, so it must be used in conjunction with the above aggregation or selection functions. The time units after INTERVAL can be a(milliseconds), s(seconds), m(minutes), h(hours), D (natural days), W (weeks), N (natural months), and Y (natural years). (Natural weeks are not yet supported, interval(1w) is currently equivalent to interval(7d)).

SLIDING: statistical averages similar to the stock market

Select a social-media (T) from social-media (5d) sliding(1d).Copy the code

The average price of a stock on the stock market is calculated over a period of five days. When these values are combined, they are known as the five-day average.

SLIDING plays a key role in the above calculation. We already know that the value of INTERVAL is responsible for specifying the time window for each query to be executed. SLIDING represents the specified time for the window to slide forward. As shown below:

T0s, T1S and T2S are the starting points of the three time Windows respectively, while T0E, T1E and T2E are the end points of the three time Windows respectively. When we do not specify the VALUE of SLIDING in the query, it defaults to INTERVAL VALUE. Select avg(t), avg(p) from t1 interval(5m) as select AVg (t), avg(p) from T1 interval(5m).

3. INTERVAL OFFSET: Collects the total data volume of a device in a month in other time zones (three time zones to the west)

select sum(t) from t1 interval(1n,3h) ;
Copy the code

The current time zone of the server is pushed three time zones to the west to collect the monthly total data of the device. OFFSET 3h indicates that the OFFSET value is 3h, which means that this SQL is applicable to the statistics of natural monthly data in different time zones.

INTERVAL offsets are a little more complicated. To understand this, you need to learn more about INTERVAL and time zones.

If INTERVAL is a natural day (D), month (n), or year (y), then it is window shard starting at 0 in the time zone where the TDengine server is located, as shown below: all timestamp columns start at 0 regardless of the current time zone.

However, if the shard window is shard in hours and seconds or less, the INTERVAL value is aligned with the shard window starting at 00:00:00.000 UTC-0.

As shown in the figure below, the timestamp columns start at 8 o ‘clock because the TDengine client’s time zone is UTC-8, and utC-8 is 8 o ‘clock east when the standard time is 0 o ‘clock. (Note: ON POSIX, UTC-8 stands for EAST 8.

This scenario is the meaning of this passage in the official documentation:

Time zones for timestamps in TDengine are always handled by the client, not the server. In particular, the client converts the time stamp in the SQL statement to a Unix time stamp in THE UTC-0 time zone, which is then written and queried by the server. When reading data, the server uses the raw data provided by THE UTC-0 time zone. After receiving the data, the client converts the timestamp to the time zone required by the local system for display.

In summary, TDengine stores time data in the form of timestamps. Timestamps themselves are time-zone independent, but because TDengine queries the data and displays it to users in different parts of the world, it is time zone independent. In INTERVAL, time Windows start from 0 o ‘clock in the time zone where the TDengine server resides if the date is a natural day, month, or year. If Windows are shard in h (hours) and below, the start time for window shard is 0 o ‘clock in the UTC time zone.

Regardless of the time zone client, the final computed column results are the same, but there are some deviations in the timestamp column due to different time zones. Therefore, we strongly recommend that, except in special cases, the time zones of the client and server servers be the same so that the queries on both sides are displayed in a consistent manner, thereby reducing unnecessary misunderstandings.

For example, the left client count(*) looks like 1, 4, 2, but it’s actually 4 and 3. This is the visual difference caused by the different time zones. At this point, it’s all about the server side: four data points on June 30, three data points on July 1 – what-you-see-what-you-get query results on the server where the TDengine server is located.

Now that we know INTERVAL’s different logic for dividing natural days, months, years, minutes, and seconds, we can finally say OFFSET.

**OFFSET is actually the OFFSET of INTERVAL. ** By adjusting the OFFSET value, you can freely choose the starting point of the time window on the timeline. So as to complete the data analysis statistics of different time zones.

For example, the current time zone of the server is GMT + 8, but we want to know the total monthly data amount for device T1 in GMT + 5. You can write:

select sum(t) from t1 interval(1n,3h) ;
Copy the code

However, OFFSET currently does not support negative values, so the start of the time window can only be OFFSET from east to west of the time zone. Therefore, if you want to use the time zone offset function to count 24 time zones, you can temporarily set the time zone of the server where the server resides to UTC-12.

As an internationalized product, we will continue to improve the functionality of TDengine later. For more details, check out the source code on GitHub.