The role of Compaction

Since LevelDB updates when it’s written, it needs to be updated by a background thread compaction:

  1. Clear expired (old version or deleted) data.
  2. Maintain order of data.

Compaction of the trigger

In addition to calling CompactRange externally, LevelDB automatically triggers a compaction when it dies:

  1. When the MemTable size reaches a threshold, a MemTable switch is performed, and an Immutable MemTable is written to external memory. This is called a Minor Compaction.
  2. When a Level-N SSTable exceeds its limit, the Level-n and Level-n +1 Sstables conduct a compaction, known as a Major compaction.
    1. Level-0 determines whether a compaction is required based on the number of Sstables.
    2. Level-n (n > 0) Determines whether to compaction based on the size of the SSTable.

Minor Compaction

Minor Compaction is simple. The basic code path is DBImpl::CompactMemTable => DBImpl::WriteLevel0Table => BuildTable.

Major Compaction

  1. When a compaction completes and updates the manifest, the next major compaction level is calculated by calling VersionSet::Finalize.
  2. At the beginning of each major compaction, call VersionSet: : requires a compaction SSTable PickCompaction calculation.
  3. If the key ranges of the selected Level-N SstAbles do not overlap those of the Level-n +1 Sstables, you can directly move the Level-N Sstables to Level-n +1 by modifying the Manifest.
  4. Otherwise, call DBImpl::DoCompactionWork to merge the Level-n and Level-n +1 sstables.

The problem of a Compaction

Compaction affects LevelDB’s performance and stability when operations occur:

  1. CPU consumption: Parses, decompresses, and compresses the SSTable.
  2. Consuming I/ OS: A large number of read/write operations in sstAbles consume I/ OS and shorten the service life of SSDS (the write times of SSDS are limited).
  3. Cache invalidation: Delete the old SSTable and generate a new SSTable. The first request of the new SSTable fails to match the cache, which may cause system performance jitter.

A common practice is to control the speed of a compaction (such as RocksDB’s Rate Limiter) so that it does not cause a sudden CPU, I/O, or cache failure. This raises the question: How fast should a compaction occur? If this Compaction happens too fast, system performance deteriorates. If this Compaction happens too slowly, it blocks write requests. This speed is highly dependent on specific hardware capabilities and workloads, and can only be set as an “experience value”, which is difficult to generalize. While this behavior only minimizes system burr and jitter, the write amplification caused by this Compaction remains the same.

Write magnification simple analysis

  • +1 – WAL writes.
  • + 1-immutable Memtable is written to level-0 files.
  • + 2-LEVEL-0 and Level-1 compaction (Key ranges for each Level-0 SSTable overlap. When a compaction occurs, the data size of a Level-0 vm is the same as that of a Level-1 vm. When a compaction occurs, the data size of a Level-0 vm is the same as that of a Level-1 vm.
  • + 11-level-n and Level-n +1 merge writes (n >= 1, by default, the data size of Level-n +1 is 10 times that of Level-n).

So, the total write magnification is 4 + 11(n-1) = 11n-7 times.

Assuming there are five levels, the maximum write magnification is 48 times — that is, if 1GB of data is written externally, 48GB of I/O write traffic is observed internally.

There are a lot of papers about lSM-tree writing and magnification, which have carried out detailed introduction, discussion and optimization scheme, such as:

  1. Dostoevsky: Better space-time trade-offs for LSM-tree Based key-value Stores via Adaptive Removal of Superfluous Merging The method and impact of compaction are clearly stated.
  2. WiscKey: Separating Keys from Values in SSD- Conscious Storage – Greatly reduces write magnification
  3. .

summary

  1. As a rule of thumb, problems caused by compaction are not obvious in most scenarios.
  2. When a compaction cannot meet the speed of the actual write, it is easy to cause a write to fail.
  3. In Write intensive scenarios, Write magnification shortens SSD life is also a problem.