1 Bucket Index Background Overview

By default, each bucket index is stored in a single shard file (the number of shards is 0, which is mainly stored in levelDB using omap-keys). As the number of objects in a bucket increases, the size of the entire shard file increases. Excessive size of a shard file may cause various problems. Common problems are as follows:

  1. If the Object corresponding to the SHard is too large, the underlying storage device performance will be greatly depleted, resulting in I/O request timeout.

  2. Request Blocked occurs when the underlying Deep-Scrub takes too long, resulting in a large number of HTTP requests timeout and 50x errors, affecting the availability of the entire RGW service.

  3. When data needs to be restored due to faulty disks or OSD faults, restoring a large shard file will exhaust storage node performance or even cause an avalanche due to OSD response timeout.

This article focuses on a method to optimize the shard file size of a single bucket. Currently, due to the RGW index architecture, the shard problem can only be optimized.(You can also use Indexless bucket)

Indexless bucket is introduced and the use of can refer to the following content www.ksingh.co.in/blog/2017/0…

You can refer to the bing Index issue below

Cephnotes.ksperis.com/blog/2015/0…

2 Basic idea of index shard optimization

The following are all experiences for your reference:

  1. Index pool must be on SSD. This is the premise of optimization in this article. Without hardware support, these operations are useless.

  2. Set a proper number of shards for the bucket

  • The more shards there are, the better. If there are too many shards, some operations like list bucket will consume a large amount of I/O of the underlying storage, and some requests will take too long.

  • The number of shards also takes into account your OSD fault isolation domain and the number of copies set. For example, if you set the index pool size to 2 and there are two cabinets with a total of 24 OSD nodes, ideally two copies of each SHard should be distributed in two cabinets. For example, if you set the shard size to 8, a total of 16 SHard files need to be stored. Then the 16 shards should be evenly divided into 2 cabinets. Also, if you have more than 24 shards, this is obviously not appropriate.

  1. Control the average size of a bucket index shard. Currently, it is recommended that a shard store 10 to 15 bytes of Object information. If too many objects are stored, perform separate reshard operations on corresponding buckets (note that this operation is risky and should be used with caution). For example, if you expect a bucket to store a maximum of 100W objects, then 100W/8 = 12.5W, and setting the number of shards to 8 is reasonable. Each omapkey record in the shard file occupies about 200 bytes of capacity, so 150000*200/1024/1024 ≈ 28.61 MB, that is to say, the size of a single shard file should be controlled within 28MB.

  2. The upper limit of objects in each bucket should be controlled at the service level. It is recommended that each SHard file has an average of 10-15W objects.

3 How to determine the requirements of the online SHard

Check the index pool name

root@demo:/home/user# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 92114M 88218M 3895M 4.23 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 131 0 87674M 2 .rgw.root 16 324 0 87674M 2 .zone.rgw.root 17 1279 0 87674M 2 .zone.rgw.domain 18 1480 0 87674M 8 .zone.rgw.control 19 0 0 87674M 8 .zone.rgw.gc 20 0 0 87674M 32 .zone.rgw.buckets.index 21 0 0 87674M 32 #index pool .zone.rgw.buckets.extra 22 0 0 87674M 54 .zone.rgw.buckets 23 768M Intent -log 25 0 0 87674M 0.zone.usage 26 0 0 87674M 2.zone.users 27 34 0 87674M 3 .zone.users.email 28 0 0 87674M 0 .zone.users.swift 29 0 0 87674M 0 .zone.users.uid 30 1013 0 87674M 5Copy the code

View bucket shard Settings

This section describes shard parameters

Bucket_index_max_shards # Bucket_INDEx_MAX_shards # Bucket_INDEx_MAX_shards # Bucket_INDEx_MAX_shards # Bucket_INDEx_MAX_shards # Bucket_INDEx_MAX_shards # In addition, all RGW services must be restarted to take effect. Do not adjust the production system that has been put online except in special cases.Copy the code

The cluster mode is used as an example

Gets shard Settings in cluster mode

root@demo:/home/user# radosgw-admin region get --name client.radosgw.zone1 { "name": "zone", "api_name": "zone", "is_master": "true", "endpoints": [ "http:\/\/demo.ceph.work:80\/" ], "hostnames": [], "master_zone": "zone1", "zones": [ { "name": "zone1", "endpoints": [ "http:\/\/demo.ceph.work:80\/" ], "log_meta": "true", "log_data": "True ", "bucket_index_max_shards": 8 #shard count = 8}], "placement_targets": [{"name": "default-placement", "tags": [] } ], "default_placement": "default-placement" }Copy the code

Gets shard Settings in single-machine mode

root@demo:/home/user# ceph --admin-daemon /home/ceph/var/run/ceph-client.radosgw.zone1.asok config show|grep rgw_override_bucket_index_max_shards
    "rgw_override_bucket_index_max_shards": "0",
Copy the code

Viewing the bucket List

root@demo:/home/user# radosgw-admin bucket list --name client.radosgw.zone1
[
    "multi-upload",
    "demo-abc",
    "test1",
    "user-bucket1"
]
Copy the code

Get the ID of the multi-upload bucket

root@demo:/home/user# radosgw-admin bucket stats --bucket=multi-upload --name client.radosgw.zone1 { "bucket": "multi-upload", "pool": ".zone.rgw.buckets", "index_pool": ".zone.rgw.buckets.index", "id": Zone1.14214.10 ", "owner": "u-user", "ver": "zone1.14214.10", #bucket ID "marker": "zone1.14214.10", "owner": "u-user", "ver": "0 # 1, 1 # 3 #, 2 #, 3 # 345 1, 4 1, 5 #, 6 #, 7 # 5 681", "master_ver" : "0 # 0, 1 #, 2 #, 3 #, 4 # 0 0 0 0 0, 5 #, 6 #, 7 # 0 0" and "mtime" : "The 2017-05-05 10:23:12. 000000", "max_marker" : "00000000002.20.3 0 #, 1 #, 2 # 00000000344.367.3, 3 #, 4 #, 5 #, 6 # 00000000680.711.3, 7 # 00000000004.23.3", "usage" : {" RGW. Main ": { "size_kb": 724947, "size_kb_actual": 725308, "num_objects": 114 }, "rgw.multimeta": { "size_kb": 0, "size_kb_actual": 0, "num_objects": 51 } }, "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 } }Copy the code

The shard of the bucket is 8, so there are several files numbered from 0 to 7.

Root @ demo: / home/user# rados ls - p. zone. RGW. Buckets. The index | grep zone1.14214.10. Dir. Zone1.14214.10.5 dir.zone1.14214.10.1.dir.zone1.14214.10.7.dir.zone1.14214.10.4.dir.zone1.14214.10.3 . Dir. Zone1.14214.10.0. Dir. Zone1.14214.10.2Copy the code

The total number of omapkeys entries of all bucket shards is 1329. Each omapkeys occupies 200 bytes of storage space. Therefore, the total disk space occupied by 1329 omapkeys is 265800 bytes

Root @ demo: / home/ceph/var/lib/osd/ceph - 2 # rados ls - p. zone. RGW. Buckets. Index | grep "zone1.14214.10" | awk '{print "rados listomapkeys -p .zone.rgw.buckets.index "$1 }'|sh -x|wc -l + rados listomapkeys -p .zone.rgw.buckets.index .dir.zone1.14214.10.5 + rados listomapkeys -p zone.rgw.buckets.index.dir. Zone1.14214.10.1 + rados listomapkeys -p Zone.rgw.buckets index.dir.zone1.14214.10.7 + rados listomapkeys - p. zone.rgw.buckets index.dir.zone1.14214.10.6 + Zone1.14214.10.4 + rados listomapkeys - p. zone.rgw. Buckets. Index .dir.zone1.14214.10.3 + rados listomapkeys -p zone.rgw.buckets.index.dir. Zone1.14214.10.0 + rados listomapkeys -p . Zone. RGW. Buckets. Index. Dir. Zone1.14214.10.2 1329Copy the code

Check the number of omapkeys entries in each bucket shard file. The following uses.dir.zone1.14214.10.6 as an example. 926*200 = 185200 bytes ≈ 180KB Meets requirements

Root @ demo: / home/ceph/var/lib/osd/ceph - 2 # rados ls - p. zone. RGW. Buckets. Index | grep "zone1.14214.10" | awk '{print "rados listomapkeys -p .zone.rgw.buckets.index "$1 " |wc -l"}'|sh -x + rados listomapkeys -p .zone.rgw.buckets.index .dir.zone1.14214.10.5 + WC-L 0 + rados listomapkeys - p.zone.rgw.buckets.index.dir.zone1.14214.10.1 + WC-L 3 + rados Listomapkeys -p zone1.14214.10.7 + wc-L 5 + rados listomapkeys -p zone.rgw. Buckets. Index .dir.zone1.14214.10.6 + WC-L 926 + rados listomapkeys -p zone.rgw.buckets.index.dir.zone1.14214.10.4 + WC-L 0 + Rados listomapkeys -p zone1.14214.10.3 + wC-L 0 + rados listomapkeys -p Zone1.14214.10.0 + wc-l 0 + rados listomapkeys -p. zone.rgw. Buckets. Index .dir.zone1.14214.10.2 + WC-L 395Copy the code

If you are careful, you will find that the shard distribution is very uneven. This is because the shard is used to search for the corresponding shard using hash(Object_name) and then take more than. Bucket shard distribution is not uniform.

int RGWRados::get_bucket_index_object(const string& bucket_oid_base, const string& obj_key, uint32_t num_shards, RGWBucketInfo::BIShardsHashType hash_type, string *bucket_obj, int *shard_id) { int r = 0; switch (hash_type) { case RGWBucketInfo::MOD: if (! num_shards) { // By default with no sharding, we use the bucket oid as itself (*bucket_obj) = bucket_oid_base; if (shard_id) { *shard_id = -1; } } else { uint32_t sid = ceph_str_hash_linux(obj_key.c_str(), obj_key.size()); uint32_t sid2 = sid ^ ((sid & 0xFF) << 24); sid = sid2 % MAX_BUCKET_INDEX_SHARDS_PRIME % num_shards; char buf[bucket_oid_base.size() + 32]; snprintf(buf, sizeof(buf), "%s.%d", bucket_oid_base.c_str(), sid); (*bucket_obj) = buf; if (shard_id) { *shard_id = (int)sid; } } break; default: r = -ENOTSUP; } return r; }Copy the code

4 How can I reduce the Impact of excessive Index Shard

Next to tell