preface

BlueStore’s BitMap allocator was introduced earlier, and we know that the new version of the BitMap allocator has the advantage of using contiguous memory space to hit as many CPU caches as possible to improve allocator performance. Here we take a look at interval tree-based Stupid allocator (similar to the Linux Buddy memory management algorithm) and compare its strengths and weaknesses.

directory

  • Partner algorithm
  • The data structure
  • Initialize the
  • Insert the delete
  • Space allocation
  • Space recycling
  • Advantages and disadvantages analysis

Partner algorithm

Linux memory management algorithm in order to quickly respond to requests, improve memory utilization as much as possible and reduce external memory fragmentation, Buddy System algorithm is introduced. The algorithm groups all the free pages into 11 linked lists, each of which contains 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous frame blocks respectively. The physical address of the first memory page of each frame block is an integer multiple of the size of the block. The characteristics of a partner are as follows: two blocks are of the same size, the addresses of the two blocks are consecutive, and the physical address of the first page box of the first block is an integer multiple of the total size of the two blocks (both belong to the same block, blocks 1 and 2 are partners, blocks 3 and 4 are partners, but blocks 2 and 3 are not partners). The details of memory allocation and memory release are available at Google.

Advantages:

  • Better resolution of external fragmentation issues cannot be completely resolved.
  • For large memory allocation designs, continuous memory can be allocated quickly.

Disadvantages:

  • The merge requirements are so strict that only blocks that satisfy the partnership can be merged.
  • If only one page in a contiguous memory block is occupied, the entire memory is not eligible for consolidation.
  • Algorithm page continuity is poor, DMA request large contiguous physical memory space may fail, when requiredCMAWe have a Contiguous Memory Allocator.
  • Space waste can be solved by using slab and kmem_cache.

The data structure

The Stupid allocator uses interval trees to organize data structures and efficiently manage offsets (length).

class StupidAllocator : public Allocator { CephContext* cct; STD ::mutex lock; // Total free space int64_t num_free; // Uint64_t last_alloc; STD ::vector<interval_set_t> free; STD ::vector<interval_set_t> free; // extent: offset, length typedef mempool::bluestore_alloc::pool_allocator< pair<const uint64_t, uint64_t>> allocator_t; // Stores extents in an orderly btree map. typedef btree::btree_map<uint64_t, uint64_t, std::less<uint64_t>, allocator_t> interval_set_map_t; // Interval tree. The main operations are insert, erase, etc. typedef interval_set<uint64_t, interval_set_map_t> interval_set_t; };Copy the code

The subscript of each interval tree is index(0-9), and index(1-9) represents the space size: [2^(index-1) * bdev_block_size, 2^(index) * bdev_block_size),

  • free[0]: [0, 4k)
  • free[1]: [4k, 8k)
  • free[2]: [8k, 16k)
  • free[3]: [16k, 32k)
  • free[4]: [32k, 64k)
  • free[5]: [64k, 128k)
  • free[6]: [128k, 256k)
  • free[7]: [256k, 512k)
  • free[8]: [512k, 1024k)
  • free[9]: [1024k, 2048k)

Initialize the

After the Stupid Allocator is initialized, the caller adds or deletes free space to the Allocator.

// Add free space void StupidAllocator::init_add_free(uint64_t offset, uint64_t length) {STD ::lock_guard< STD ::mutex> l(lock); // Insert space into free _insert_free(offset, length); Num_free += length; Void StupidAllocator::init_rm_free(uint64_t offset, uint64_t length) {STD ::lock_guard< STD ::mutex> l(lock); interval_set_t rm; rm.insert(offset, length);for(unsigned i = 0; i < free.size() && ! rm.empty(); ++i) { interval_set_t overlap; overlap.intersection_of(rm, free[i]); // Delete the corresponding spaceif(! overlap.empty()) { free[i].subtract(overlap); rm.subtract(overlap); } } num_free -= length; // Update available space}Copy the code

Insert the delete

Interval tree implementation code:

https://github.com/ceph/ceph/blob/master/src/include/interval_set.h

Insert:

https://github.com/ceph/ceph/blob/master/src/include/interval_set.h#L445

Erase:

https://github.com/ceph/ceph/blob/master/src/include/interval_set.h#L516

The core implementation is to insert and delete an interval into the interval tree as follows:

Extent is inserted into the interval tree

// According to the length of the interval, select the interval tree to be stored. The greater the length, the greater the bin value. unsigned StupidAllocator::_choose_bin(uint64_t orig_len) { uint64_t len = orig_len / cct->_conf->bdev_block_size; // cbits = (sizeof(v) * 8) -__builtin_clzll (v) // __builtin_clzll returns the number of leading zeros // Cbits results in the subscript of the highest bit 1 (starting from 0), len larger, Int bin = STD ::min(int)cbits(len), (int)free.size() -1);returnbin; } void StupidAllocator::_insert_free(uint64_t off, uint64_t len) {while (true) {// Free space to insert interval tree free[bin]. Insert (off, len, &off, &len); unsigned newbin = _choose_bin(len);if (newbin == bin)
            break; Bin may need to be adjusted. In this case, the old one needs to be deleted, and then the new one needs to be inserted. It merges behind the original interval. free[bin].erase(off, len); bin = newbin; }}Copy the code

Recalling the partner algorithm in section 1, there are differences between the two methods of merging:

  • The buddy algorithm is more stringent, see section 1.
  • Stupid extents are loosely merged, as long as the two extents are contiguous.

Delete Extent from the range tree

Deleting extents from an interval tree is relatively simple. You can delete the extents from the original extents and calculate whether the final extents fall into another interval tree. If the extents fall into another interval tree, delete the extents from the interval tree and add a new one to the interval tree.

Space allocation

The function of space allocation is defined as follows:

allocate(uint64_t want_size, uint64_t alloc_unit, 
        uint64_t max_alloc_size, int64_t hint,PExtentVector* extents);

allocate_int(uint64_t want_size, uint64_t alloc_unit, int64_t hint,
                         uint64_t* offset, uint32_t* length)Copy the code

Among themhintIs an important parameter, indicating that the start address assigned should be as large as possible as the hint value.

The core process consists of four 2-layer FOR loops roughly as follows: The contiguous space with want_size greater than or equal to want_size is first allocated from the hint address to the high-level interval tree. If no contiguous space with length greater than or equal to alloc_unit is first allocated from the hint address to the low-level interval tree.

A simple space allocation diagram looks like this:



The detailed flow chart of space allocation is as follows:


Space recycling

The function of space release is defined as follows:

release(const interval_set<uint64_t> &release_set)Copy the code

The process is very simple, first lock, then loop call _insert_free to insert into the corresponding interval tree, will involve the merge of adjacent free space, but will lead to the problem of space fragmentation allocation.

Advantages and disadvantages analysis

CPU Cache

Stupid uses BtreeMap to store a series of extents. Memory may not be contiguous. In addition, when space is allocated to traverse the interval tree, although extents in the interval tree are orderly, memory may not be contiguous or the memory span of two adjacent extents may be large. As a result, cpu-cache prefetch cannot reach the next Extent, and cpu-cache cannot be well used.

The Bitmap allocator initializes the three layers at BlueStore initialization, and the size is fixed, and the space allocation is sequential, making full use of cpu-cache capabilities. Thus improving the performance of the allocator.

Pseudo-space debris

Extent-based Stupid allocator has pseudo-space fragmentation (physical space is continuous, but the allocator is not) :

A 24K continuous space, after six 4K allocations and six out-of-order 4K releases, may become8K + 4K + 8K + 4KFour Spaces.

Among them, two 4K sections fall into different interval trees because they are the same size as the surrounding blocks, which makes it difficult to be merged. The continuous space of 24K turns into four discontinuous Spaces.

The Bitmap allocator does not have this problem because it allocates all the memory of the 3 layers at initialization, and the 3 layers are all in order and the space allocation is traversed in sequence. It is ok to set the corresponding bits when freeing the space without affecting the continuity.

According to Bitmap author’s performance comparison experiment, Bitmap allocator is better than Stupid. After Bitmap is stable, BlueStore’s default allocator can be set to Bitmap.

Author: Shi Mingya

  • [Now register didi Cloud, have the opportunity to get 30 yuan didi travel voucher without threshold] Console – Didi Cloud
  • [New purchase of cloud services 50% off in January, 4.5% off in March, as low as 40% off in June] Didi Cloud – for developers
  • [Didi Cloud messenger recruitment, recommended maximum rebate of 50%]