“This is the 35th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

1. HBase optimization

1.1. High availability

In HBase, the HMaster monitors the HRegionServer life cycle and balances the Load of HRegionServer. If the HMaster fails, the entire HBase cluster becomes unhealthy and does not work for a long time. Therefore, HBase supports ha configuration for HMaster.

Stop the HBase cluster. (If the HBase cluster is not enabled, skip this step.)
```
[moe@hadoop102 conf]$ bin/stop-hbase.sh
Copy the code
```

Create the backup-masters file in the conf directory

[moe@hadoop102 conf]$ vim backup-masters
Copy the code

Configure the HMaster node in the backup-masters file
```
hadoop103
hadoop104
Copy the code
```

Distribute backup-masters to synchronize to other nodes

[moe@hadoop102 conf]$ xsync backup-masters
Copy the code

Starting an HBase Cluster

[moe@hadoop102 conf]$ bin/start-hbase.sh
Copy the code

Open the page test view

1.2. Pre-division

Each region maintains StartRow and EndRow. If the added data meets the RowKey range maintained by a region, the data is submitted to the region for maintenance. Based on this principle, you can plan the partitions to which data is to be added in advance to improve HBase performance.

Manually set pre-division

Hbase> create 'staff1','info','partition1',SPLITS => ['1000','2000','3000','4000']
Copy the code

Generate hexadecimal sequence pre-partitioning

create 'staff2','info','partition2',{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
Copy the code

Prepartition according to the rules set in the file

The contents of the montes.txt file are as follows:

aaaa
bbbb
cccc
dddd
Copy the code

create 'staff3','partition3',SPLITS_FILE => 'splits.txt'
Copy the code

Create pre-partitioning using the JavaAPI

// A custom algorithm that generates a series of hash values stored in a two-dimensional array
byte[][] splitKeys = some hash function// Create an HbaseAdmin instance
HBaseAdmin hAdmin = new HBaseAdmin(HbaseConfiguration.create());
// Create an HTableDescriptor instance
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
// Create Hbase tables with pre-partitioning using HTableDescriptor instances and two-dimensional arrays of hash values
hAdmin.createTable(tableDesc, splitKeys);
Copy the code

1.3 RowKey Design

The only identifier of a piece of data is a RowKey. The partition where the data is stored depends on the pre-partition range of the RowKey. The purpose of designing a RowKey is to evenly distribute data across all regions to prevent data skew to some extent. Let’s talk about common design scenarios for RowKey.

Generate random numbers, hashes, and hash values

Such as: the original rowKey for 1001, after the SHA1 became: dd01903921ea24941c26a48f2cec24e0bb0e8cc7 original this rowKey for 3001, after the SHA1 became: 49042 c54de64a1e9bf0b33e00245660ef92dc7bd original this rowKey for 5001, after the SHA1 became: 7 b61dec07e02c188790670af43e717f0f46e8913 before doing this operation, generally we will choose sample drawn from a data set, to decide what kind of after rowKey to Hash as a threshold of each partition.Copy the code

String inversion

20170524000001 changed to 10000042507102 20170524000002 changed to 20000042507102Copy the code

This also hashes the data progressively put in to some extent.

String splicing

20170524000001_a12e
20170524000001_93i7
Copy the code

1.4. Memory optimization

HBase operations require a large amount of memory overhead. After all, tables can be cached in memory. Generally, 70% of the available memory is allocated to the HBase Java heap. However, it is not recommended to allocate a very large heap memory, because the RegionServer is unavailable for a long time during GC. Generally, 16 to 48 GB memory is required. If the system memory is insufficient due to the high memory usage of the frame, the frame will be killed by the system service.

1.5. Basic optimization

You can add content to HDFS files

HDFS – site. XML, hbase – site. XML

Attribute: dfs.support.append Description: After HDFS append synchronization is enabled, HBase data synchronization and persistence can be implemented. The default value is true.Copy the code

Optimize the maximum number of open files allowed by DataNode

hdfs-site.xml

Properties: DFS. Datanode. Max. Transfer. Threads: HBase generally at the same time operating a large number of documents, according to the number and size of cluster and data movement, is set to 4096 or higher. Default value: 4096Copy the code

Optimize the wait time for data operations with high latency

hdfs-site.xml

Properties: DFS. Image. Transfer. The timeout: If the latency for a data operation is very high and the socket needs to wait a longer time, you are advised to set this value to a larger value (60000 ms by default) to ensure that the socket will not be timeout out.Copy the code

Optimize data write efficiency

mapred-site.xml

Mapreduce.map.output.com mapreduce.map.output.com press press. Codec explanation: open the two data can greatly improve the efficiency of document writing, writing time. The first attribute values to modify to true, the second attribute value is amended as: org.apache.hadoop.io.com the GzipCodec or other compression method.Copy the code

Example Set the number of RPC listens

hbase-site.xml

Properties: Hbase regionserver. Handler. Count: 30, the default value is used to specify the RPC to monitor the number of, can be adjusted according to client requests, read and write requests is large, increase the value.Copy the code

Optimize HStore file size

hbase-site.xml

Properties: hbase hregion. Max. Filesize explanation: The default value is 10737418240 (10GB). You can reduce this value if MR tasks of HBase need to be run. One region corresponds to one map task. This value means that if the HFile size reaches this value, the region will be split into two Hfiles.Copy the code

Optimized HBase client cache

hbase-site.xml

Properties: hbase client. Write. Buffer explanation: used to specify the hbase client cache, increase the value can reduce the number of RPC calls, but will consume more memory, and conversely. Generally, we need to set a certain cache size to reduce the number of RPC.Copy the code

Next Specifies the number of lines obtained by scanning HBase

hbase-site.xml

Properties: hbase client. Scanner. Caching explanation: used to specify the scan. The next method to get the default number of rows, the larger the value, the greater the memory consumption.Copy the code

Flush, compact, split mechanisms

When the MemStore reaches the threshold, Flush MemStore data into Storefile. The compact mechanism merges small flush files into large Storefile files. Split: When a Region reaches its threshold, a large Region is split into two.

Attributes involved:

That is, 128 MB is the default threshold of Memstore
```
Hbase) hregion) memstore. Flush. Size: 134217728Copy the code
```
That is, if the total size of all memstores in a single HRegion exceeds the specified value, flush all memstores in the HRegion. RegionServer’s flush is processed asynchronously by adding requests to a queue, simulating the production-consumption model. There is a problem here, when the queue is too late to consume and there is a huge backlog of requests, it can cause memory to surge and at worst trigger OOM.
```
Hbase. Regionserver. Global. Memstore. UpperLimit: 0.4 hbase. The regionserver. Global. Memstore. LowerLimit: 0.38Copy the code
```
That is: When MemStore use total memory to hbase. Regionserver. Global. MemStore. UpperLimit specified value, there will be multiple MemStores flush to a file, MemStore flush is executed in descending order of size until flushing to MemStore uses slightly less memory than lowerLimit

Two, friendship links

Big data HBase Learning Journey 3

Big data HBase Learning Journey Part 2

Big data HBase Learning Journey 1

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Big data HBase Learning Journey 4

1. HBase optimization

1.1. High availability

1.2. Pre-division

1.3 RowKey Design

1.4. Memory optimization

1.5. Basic optimization

Two, friendship links

Big data HBase Learning Journey 4

1. HBase optimization

1.1. High availability

1.2. Pre-division

1.3 RowKey Design

1.4. Memory optimization

1.5. Basic optimization

Two, friendship links

Related Posts

Spring Cloud Zookeeper construction example

If I don’t understand the core components of Spring Cloud this time, I’m going to waste the story

Internal Tomcat startup mechanism of SpringBoot