Abstract: GaussDB (forInflux) offers a unique data storage management solution with a cloud-native storage and computing architecture that enables rapid expansion and reduction based on business changes. Efficient data compression capability and data hot and cold separation design can greatly reduce the cost of data storage; The high-throughput cluster can meet the performance requirements of large-scale o&M monitoring and massive data writing and query in the Internet of Things scenario.

This article is shared from Huawei Cloud community “Huawei developed PPB distributed time series Database Revealed Phase I: Initial Understanding of GaussDB (forInflux)”, original author: Qi Qi Yu Yiqiu.

preface

With the increasing scale of cloud computing and the gradual popularization of Internet of Things (IOT) applications, there is a huge amount of time series data to be stored and managed in the field of Internet of Things (AIoT) and Operation and Maintenance Monitoring (AIOps). Take Huawei Cloud Eye Service (CES) as an example. A single Region needs to monitor more than 70 million monitoring indicators and processes 900,000 reported monitoring indicators every second. Assume that each indicator is 50 bytes, and the data in a year reaches PB level. Taking the earthquake monitoring system as an example, tens of thousands of monitoring sites continuously collect data 24 hours a day. The average index data to be processed every day reaches TB level, and the data in a year also reaches PB level, and the data needs to be stored permanently. Traditional relational databases can hardly support such a large amount of data and write pressure. Big data solutions such as Hadoop and existing open source sequential databases also face great challenges. The need for real-time interaction, storage, and analysis of sequential data will drive continuous innovation and optimization of sequential databases in terms of architecture, performance, and data compression.

GaussDB(for Influx) based on Huawei’s years of practical experience in the field of data storage, integrated huawei cloud computing, storage, service assurance, and security capabilities, made bold technical innovations in architecture, performance, and data compression, and achieved good results, internally supporting Huawei cloud infrastructure services. Open to the outside in the form of services, to help enterprises on the cloud to solve relevant business problems.

Cloud native storage and computing separation architecture

GaussDB(forInflux) interface fully compatible with InfluxDB, and write interface compatible with OpenTSDB, Prometheus, and Graphite. Architecturally, a sequential database cluster can be divided into three components. They are:

Shard node: The node adopts stateless design and is mainly responsible for writing and querying data. Within the node, in addition to sharding and timeline management, it also supports data pre-aggregation, data downsampling and TAG group query, which are optimized for sequential scenarios.

Config cluster: Stores and manages cluster metadata in a three-node replication set mode to ensure high metadata reliability.

Distributed storage system: Stores persistent data and logs in a centralized manner. Data is stored in three-copy mode and is transparent to upper-layer applications. The storage system is developed by Huawei. After years of product practice, the high availability and reliability of the system have been verified.

Compared with open source timing databases such as InfluxDB, the cloud native database design with storage and computing separation has the following advantages:

Tolerate N-1 node failures, higher availability. Storage and computing are separated, and mature distributed storage systems can be reused to provide system reliability. Sequential data is constantly written with high performance, and a large number of query services exist. Service interruption or even data loss caused by any system failure will have a serious impact on services. However, a proven and mature distributed storage system can significantly improve system reliability and reduce the risk of data loss.

Capacity expansion for compute nodes in minutes and storage nodes in seconds. In the traditional Shared Nothing architecture, the constraint of physical binding between data and nodes is removed. Data is only logically bound to a node, making the compute node stateless. In this way, during the capacity expansion of compute nodes, a large amount of data is not migrated between compute nodes. You only need to logically transfer some data from one node to another node, shortening the capacity expansion time from days to minutes.

Eliminate redundant copies and reduce storage costs. By uninstalling the multi-copy replication from the compute node to the distributed storage node, you can avoid the redundant problem of 9 copies caused by 3 copies of the distributed database and 9 copies of the distributed storage when the user builds a database on the Cloud Hosting mode. This greatly reduces the storage cost.

GaussDB(forInflux) uses a cloud native storage and computing isolation architecture, with five features: 100 million timelines, extreme write performance, low storage cost, high-performance multidimensional aggregate queries, and extreme flexibility in scaling and scaling.

Support for 100 million level timelines

In the sequential database system, there are a lot of concurrent query and write operations, so it is very important to control the usage of memory reasonably. [Bug Mc-10868] – Process exits OOM due to memory exhaustion when the time line for writing data increases to tens of millions. GaussDB (for Influx) has been optimized to avoid memory depletion by writing massive amounts of timeline data:

  • In memory allocation, memory pool overcommitment technology is used to reduce temporary object memory request and reduce memory fragmentation.

  • In memory reclamation, the algorithm dynamically adjusts GC frequency according to memory load to speed up memory reclamation.

  • On single queries, implement Quota control to avoid single queries running out of memory;

  • In terms of cache usage, different optimal configurations are provided for different node specifications.

After improvement, the system write performance remains stable under the mass time line, significantly outperforming the open source implementation of InfluxDB. For aggregated queries involving a large number of timelines, such as high-hash aggregated queries, the performance improvement is more significant.

Extreme write performance: Supports trillions of data writes per day

Compared with the single-machine mode, the cluster mode can distribute the write load among the compute nodes in the cluster to support larger data writes. GaussDB (for Influx) supports trillions of data writes per day with the following optimizations for engineering implementation:

First, sequence data is Hash Partition according to time line, and all nodes are written in parallel to give full play to the advantages of cluster.

Secondly, the Shard node adopts the LSM-tree layout optimized for write scenarios. After WAL is written to ensure that the log is persistent, it can be returned after writing to the Buffer.

Finally, multi-copy database replication is uninstalled to distributed storage to reduce network traffic from compute nodes to storage nodes.

For mass write scenarios, GaussDB (for Influx) has linear write performance scaling greater than 80%.

Low storage cost: only 1/20 of the storage cost

In the two typical application scenarios of AIOps operation and maintenance monitoring and AIoT Internet of Things faced by timing database, several GB or even several TB of timing data will be generated every day. If this sequential data is not managed and compressed well, it will put a very high cost pressure on the enterprise.

GaussDB (for Influx) stores data in columns, where data of the same type is stored centrally, facilitating data compression. The self-developed time-series data adaptive compression algorithm is adopted to sample and analyze the data before compression, and select the most appropriate data compression algorithm according to the data volume, data distribution and data type. Compared with the original InfluxDB, the compression algorithm is optimized and improved for Float, String and Timestamp.

Float data type: Gorilla compression algorithm is optimized to convert lossless values into integers and select the most appropriate data compression algorithm based on data characteristics.

String data type: ZSTD compression algorithm with better compression efficiency is adopted, and different encoding methods are used according to the Length of the data to be compressed.

Timestamp data type: Differential compression method is adopted. Finally, similarity compression is carried out for Timestamp in data files to further reduce the storage cost of sequential data.

The following figure compares the data compression efficiency of the InfluxDB using event log data (dataset 1) and cloud server monitoring indicator data (dataset 2) in actual service scenarios.

Data compression is not the only way to save on storage costs. For Influx, GaussDB (for Influx) provides tiered storage of sequential data, enabling customized hot and cold data, and data separation. Hot data has a small amount of data, is frequently accessed, and is stored on storage media with better performance and higher cost. Cold data has a large amount of data, a low access probability, and a long storage time. Therefore, cold data is stored on low-cost storage media to save storage costs. According to actual service data, the storage cost of the same amount of data is only 1/20 of that of a relational database.

High-performance multidimensional aggregate queries

Multidimensional aggregation is a kind of query that is common in sequential database and repeated periodically. For example, in AIOps operation and maintenance monitoring scenario, the average value of CPU and memory is queried within a specified time range.

SELECT mean(usage_cpu), mean(usage_mem)FROM cpu_info WHERE time >= '2020-11-01T06:05:27Z' and time < '2020-11-01T18:05:27Z' GROUP BY time(1h), hostname
Copy the code

To improve aggregate query performance overall, GaussDB (for Influx) has made the following optimizations:

  • MPP architecture: a query statement can be executed concurrently on multiple nodes and cores.

  • Vectorization query engine: when the query result data is large, the traditional volcano model returns one data per iteration, resulting in excessive overhead and performance bottleneck. GaussDB (for Influx) internally implements a vectorized query engine that returns data in batches with each iteration, significantly reducing additional overhead.

  • Incremental aggregation engine: Sliding window-based aggregate queries, most of which are hit directly from the aggregate result cache and only need to aggregate the incremental data portion.

  • Multi-dimensional inverted index: Supports multi-dimensional and multi-condition query, avoiding a large amount of Scan data.

  • Store summary indexes to speed up filtering of irrelevant data in data queries.

For Influx, GaussDB (for Influx) has 10 times the aggregate query performance of InfluxDB Enterprise and 2 to 5 times the performance of Timescale for the same node size.

Minute elastic expansion and shrinkage capacity

In the running process of sequential database, with the increase of service volume, it is often necessary to expand the database online to meet the service requirements. Data in traditional databases is stored locally. After capacity expansion, data must be migrated. When the amount of data reaches a certain scale, the data migration time is usually calculated by day, which brings great difficulties to operation and maintenance.

As shown in the figure above, each Database logically consists of multiple partitions, each stored independently and self-describing. All Partition data is stored on distributed shared storage, and there is no physical binding relationship between database Shard nodes and data. During capacity expansion, offload the Partition on the source node and assign the Partition to the target node.

conclusion

Timing data should be stored in a timing database system optimized for timing data. After a huawei cloud service switched from Cassandra to GaussDB (for Influx), the total number of compute nodes decreased by four times from 39 (18 hot clusters, 9 cold clusters, and 12 big data analysis clusters) to 9. The storage space consumption is reduced from 1TB to less than 100GB per day, reducing the storage space consumption by 10 times.

GaussDB (for Influx) provides a unique data storage management solution with a cloud-native storage and computing architecture that enables rapid expansion and reduction based on business changes; Efficient data compression capability and data hot and cold separation design can greatly reduce the cost of data storage; The high-throughput cluster can meet the performance requirements of large-scale o&M monitoring and massive data writing and query in the Internet of Things scenario.

Click to follow, the first time to learn about Huawei cloud fresh technology ~