preface

This paper focuses on delta-of-delta algorithm, a common timestamp or numerical compression method in sequential database, which can greatly reduce the cost of data storage and improve the performance of data writing and query.

The delta-of-delta compressed timestamp is described in the Facebook Gorilla Paper at www.vldb.org/pvldb/vol8/… The Prometheus TSDB project also borrowed ideas from the Facebook Gorilla paper to achieve high compression rates for sequential data.

Delta timestamp compression

Time stamps are generally stored as long and occupy 8 bytes of storage space. The most straightforward optimization is to store the timestamp difference, where the starting timestamp and the maximum range threshold of the delta are required. There are two common implementations:

  • Delta(n) = T(n) -t (n-1)
A Unix timestamp Delta
1571889600000 0
1571889600010 10
1571889600025 15
1571889600030 5
1571889600040 10
  • Delta(n) = T(n) -t (0)
A Unix timestamp Delta
1571889600000 0
1571889600010 10
1571889600025 25
1571889600030 30
1571889600040 40

Assume that the start time stamp is 1571889600000, the maximum range threshold of the delta is 3600s, and the value of each delta needs 13 bits to be stored. Therefore, the total space occupied by the above timestamp data is 64 + 13 * 4 = 116bit.

The advantage of Idea 2 is that the data in the block need not be traversed successively. However, compared with Idea 1, the start time may need to be changed more frequently, and an appropriate compression scheme should be selected based on actual requirements.

Delta-of-delta timestamp compression

Facebook Gorilla elaborates on how delta-of-delta codes are calculated. Here’s an excerpt from the paper.

Facebook Gorilla offers a more generic way to deal with data over different time spans.

D Identify a Total bits
0 0 1
[-63,64] 10 2 plus 7 is 9
[255256] 110 3 plus 9 is 12
8] [- 2047204. 1110 4 plus 12 is 16
> 2048 1111 4 plus 32 is 36

To get a sense of the compression effect of delta-of-delta encoding, again with a set of timestamp data:

A Unix timestamp delta delta-of-delta Total bits after compression
1571889600000 0 0
1571889600010 10 10 9
1571889600010 0 – 10 9
1571889600011 1 1 9
1571889600012 1 0 1
1571889600013 1 0 1
1571889600015 2 1 9
1571889600017 2 0 1

It is still assumed that the start time stamp is 1571889600000, the maximum range threshold of delta is 3600s, and the storage space occupied is compared as follows:

  • Delta algorithm: 64 + 13 * 7 = 155 bits.
  • Delta-of-delta algorithm: 64 + 9 x 4 + 1 x 3 = 103bit.

It can be seen that the delta-of-delta algorithm further achieves higher compression ratio than the Delta algorithm. In practical application scenarios, the time stamps of massive time series data are dense and continuous, and most of them meet the condition of delta-of-delta=0. In this way, the storage space of time stamps can be greatly reduced.

conclusion

  • The delta-of-delta algorithm is often used to monitor the compression of time stamps in data, which can greatly reduce storage costs and improve write and query performance.
  • For the data with large span, the system needs to be fault-tolerant.
  • The delta-of-delta algorithm also has some defects. Because it is an indefinite compression algorithm, it must be calculated from the first address of the data block after decompression.

The appendix

  • Implementation of time series compression method from the Facebook’s Gorilla paper

Reprint please note the source, welcome to pay attention to my public number: Yap technology wheel