Part 1 – Background
Redis is stored as a flexible, high-performance key-value data structure that can be used as a database, cache, and message queue. Redis has the following features compared to other key-value cache products:
- Redis supports data persistence. Data in memory can be saved to disk, and can be reloaded into memory for use upon restart.
- Redis supports the storage of data structures such as strings, hashes, lists, sets, and sorted sets.
Time series data refers to a series of data indexed according to the time dimension, which is characterized by no strict relational model. The recorded information can be expressed as the relationship between keys and values, so it does not need to be stored in a relational database. In practice, sequential data is usually written continuously with high concurrency. Aiming at this feature of time series data, Redis provides two schemes for saving time series data based on its own data structure and extension module:
1. Save time series data based on Hash and Sorted Set data;
2. Based on RedisTimeSeries module.
1. Save time series data based on Hash
The feature of saving time series data based on Hash is that it can realize the quick query of single key, which can meet the requirements of single key query of time series data. The Hash implementation of Redis uses the internally stored value as a HashMap and provides an interface for accessing Map members directly, using the timestamp as the key of the Hash set and the device status value as the value of the Hash set. Therefore, the modification and access of data can be directly implemented through the Key of its internal Map to operate the corresponding attribute data, without the need to store data repeatedly, and without the problems of serialization and concurrent modification control.
However, the shortcoming of saving time series data based on Hash is that it cannot support the range query of data. Although time series are inserted into Hash sets in chronological order, the underlying structure of Hash type is Hash table, and the ordered index of data is not realized, so the range query of Hash type needs to be carried out. All the data in the Hash set needs to be scanned and sorted by the client. Only then can the client obtain the data in the query range. The query efficiency is low.
2. Save time series data based on Sorted Set
The characteristic of saving time series data based on Sorted Set is that it can simultaneously support the query by timestamp range and sort according to the weight value of elements. In the case of sequential data, the timestamp is regarded as the weight value of Sorted Set followed by the measurement data recorded at the time point, for example :< timestamp >:< measurement value >. The RedisSorte Set uses Hash Map and SkipList to ensure orderly storage of data. The structure of SkipList ensures high query efficiency and is relatively simple to implement.
However, the disadvantage of saving time series data based on Sorted Set is that it can only support range query and cannot directly complete the aggregation calculation of time series data. Therefore, the data in the time range can only be fetched back to the client, and then the aggregation calculation can be done on the client itself. Although this method can complete the aggregation calculation, it brings some potential risks, that is, a large amount of data is frequently transferred between the Redis instance and the client, which will compete with other operation commands for network resources, causing other operations to slow down. Therefore, SortedSets are not a memory-saving data structure, and their insertion time complexity is O (log (N)). Therefore, the larger the cluster, the longer the write time.
Generally speaking, the shortcoming of the strategy for saving time series based on Hash and SortedSet mainly includes two aspects: first, when the aggregation calculation is performed, the data needs to be read into the client for aggregation. When a large amount of data needs to be aggregated, the data transmission overhead is high; Second, when using this strategy, all the data will be saved in two data types, and the memory overhead is high.
2. Save time series data based on RedisTimeSeries
As an extension module of Redis, RedisTimeSeries makes up for Redis’ defects of saving time series data memory and high data transmission overhead based on Hash and Sorted Set. It provides data types and access interfaces specifically for time series data. It also supports aggregations of data in time range directly on Redis. It uses fixed-size chunks of memory as time series samples and uses the same Radix Tree as Redis Streams for indexing. The underlying data structures of RedisTimeSeries use linked lists and range queries are O(N) complex. This strategy for saving time series data based on RedisTimeSeries has the following characteristics:
-
Ensure large capacity insert, low latency read;
-
Query by start time and end time.
-
Supports aggregate query of buckets at any time (min, Max, AVG, sum, range, count, first, and last). The retention time can be configured.
-
Downsampling/compression – automatically updated aggregate time series;
-
Secondary index – Each time series has a label, allowing queries by label.
Part 2 – RedisTimeSeries Storage structure
RedisTimeSeries stores all the sequential data in chunks. Each chunk consists of two related arrays in a bidirectional linked list (one for timestamps and one for sample values). Each chunk has a predefined sample size, and when the chunk fills up, other data is automatically stored to the next chunk. Chunks size can be set with the CHUNK_SIZE parameter. (CHUNK_SIZE must be set to a multiple of 8, default: 4096)
The Key for RedisTimeSeries consists of metrics and tags, where each Sample is a combination of time and value. Labels are key-value metadata that we attach to data points, allowing us to group and filter. They can be strings or numeric values and are added to the time series at creation time.
Part 3 – Use of RedisTimeSeries
When used for time series data access, the operation of RedisTimeSeries mainly includes the following aspects:
1. Run the ts.create command
The ts.create command is used to CREATE a time series data set. When using this command, you need to set the key of the time series data set and the data expiration time (in milliseconds). You can also set labels for the data collection to represent the properties of the data collection. Description:
RETENTION: Optional, data RETENTION duration, default: 0.
ENCODING: Optional, specify series sample ENCODING format, which can be COMPRESSED or UNCOMPRESSED.
CHUNK_SIZE: optional, block size;
DUPLICATE_POLICY: This parameter is optional. DUPLICATE_POLICY specifies the operation to be performed on duplicate samples. State type: (BLOCK, FIRST, LAST, MIN, MAX, SUM)
LABELS: mandatory, data LABELS.
Example 1:
2. Run the ts.add command
The ts.add command is used to insert data, including a timestamp and a specific value, into a collection of time series. If you have not previously created a time series using ts.create, the time series data collection is automatically created.
Note: You cannot add data before the last used timestamp. The timestamp of the value added using the ts.add command must be greater than the timestamp of the last value.
Example 2:
You can also use * so Redis will automatically generate the timestamp.
Example 3:
The ts.madd command is used to insert new sample data into an existing time series collection.
Example 4:
3. Run the ts.get command
The ts.get command is used to read the latest data of the time series.
Example 5:
The ts.mget command is used to query the latest data in a collection by label. When you CREATE a data collection using ts.create, you can set label attributes for the collection. When the query is carried out, the data samples can be matched according to the collection label attribute in the query condition, and the query result only returns the latest data that meets the matching set.
Example 6:
The following ts.mget command, along with the FILTER setting (which is used to set FILTER criteria for collection labels), queries for all sets with area_id = 32 and returns the latest entry in each set.
Ts.range/ts.rerange command
The ts.range /RERANGE command is used to query the RANGE of time series aggregation calculation.
Description:
[FROM_TIMESTAMP][TO_TIMESTAMP]: Mandatory, start time stamp;
FILTER_BY_TS: this parameter is optional. Sample data is filtered by timestamp.
FILTER_BY_VALUES: This parameter is optional. Sample data is filtered based on value.
[COUNT] : Optional, returns the maximum number of samples.
[AGGREGATION] : Indicates the AGGREGATION calculation type to be performed. RedisTimeSeries supports a wide range of AGGREGATION calculation types, including AVG, MAX, MIN, SUM, COUNT, LAST, and FIRST.
Example 7:
The ts.mrange command FILTERS the range of queries across multiple time series by FILTERS.
Description:
[FROM_TIMESTAMP][TO_TIMESTAMP]: start timestamp, also see “- +” from the start to the latest timestamp;
[GROUPBY] : summarizes the results of different time series and groups them according to the provided label name.
[REDUCE] : aggregates reducer types that have the same set of label values.
Example:
RedisTimeSeries other commands
Ts.del KEY_NAME FROM_TIMESTAMP TO_TIMESTAMP: deletes the value in the timestamp range of the given KEY_NAME.
DEL KEY_NAME: deletes the created KEY.
TS.ALTER KEY_NAME [RETENTION] LABELS: Changes the metadata of created keys, including label and RETENTION values.
Ts.increby/ts.decreby: to add/DECREBY a value on the latest data;
Ts.info: Returns time series information and statistics;
KEYS * : Get all KEYS;
EXISTS KEY_NAME: checks whether the given KEY EXISTS. If yes, 1 is returned. If no, 0 is returned.
Part 4 – Summary
As an extension module of Redis, RedisTimeSeries provides a new method for access to sequential data, with efficient query performance and a small cost in access process, which can realize the desire of real-time analysis of sequential data.