Introduction to the

Distributed, scalable, support mass data storage NoSQL database

Logical structure

Storage structure

Architecture diagram

Writing process

  1. Client Accesses Zookeeper and obtains hbase: RegionServer where the Meta table resides
  2. Access RegionServer and obtain hbase: meta table based on the requested namespace: Table/RowKey: Queries the region of the RegionServer where the target data resides and caches the region of the table and meta table information in the Meta Cache for next access
  3. Communicates with the target RegionServer
  4. Write (append) data sequentially to WAL
  5. Write the data to the corresponding MemStore, and the data will be sorted in MemStore
  6. Send an ACK to the client
  7. Wait for the MemStore flush event to write data to HFile

MemStore flash

Reading process

  1. Client Accesses Zookeeper and obtains hbase: RegionServer where the Meta table resides
  2. Access the RegionServer, obtain the hbase: Meta table, and query the Region of the Regionser where the target data resides based on namespace: Table/RowKey. The region and meta information of the table is cached in the Meta cache of the client for next access.
  3. Communicates with the target RegionServer
  4. Query the target data in Block Cache, MemStore, and StoreFile respectively, and merge all the found data. Here, all data refers to different versions (TIMESTAMP) or different types (Put/Delete) of the same data.
  5. Data blocks (HFile data storage unit, default size 64KB) queried from files are cached to Block Cache
  6. The final result of the merge is returned to the client

StoreFile Compaction

Because memStore generates a new HFile every time it is swipes, and different versions (TIMESTAMP) and different types (Put/Delete) of the same field may not be classified into different Hfiles, all hfiles need to be traversed during query. To reduce the number of hfiles and clean up stale or deleted data, a StoreFile Compaction occurs.

Region Split

Install the deployment

Zookeeper-3.4.10 bin/ zkserver. sh start hadoop-2.7.2 sbin/start-dfs.sh hadoop-2.7.2 sbin/start-yarn.sh hbase-env.shexportJAVA_HOME = / opt/module/jdk1.6.0 _144export HBASE_MANAGES_ZK=false

hbas-site.xml
<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://hadoop102:9000/HBase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value> </property> <! -- New changes after 0.98, Port, default is 60000 --> <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop102,hadoop103,hadoop104</value> </property> <property> < name > hbase. Zookeeper. Property. The dataDir < / name > < value > / opt/module/zookeeper - 3.4.10 / zkData < value > / < / property > </configuration> hbase bin/hbase-daemon.sh start master hbase bin/hbase-daemon.sh start regionserverCopy the code

HBaseAPI

conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum"."192.168.5.102");
conf.set("hbase.zookeeper.property.clientPort"."2181");

HBaseAdmin admin = new HBaseAdmin(conf);
admin.tableExists(tableName);

HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf(tableName));
for (String cf: columnFamily) {
    descriptor.addFamily(new HColumnDescriptor(cf));
}
admin.createTable(descriptor);

admin.disableTable(tableName);
admin.deleteTable(tableName);

HTable htable = new HTable(conf, tableName);
Put put = new Put(Bytes.toBytes(rowKey));
put.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column), Bytes.toBytes(value));
htable.put(put);
htable.close();

HTable htable = new HTable(conf, tableName);
List<Delete> deleteList = new ArrayList<Delete>();
for (String row: rows) {
    Delete delete = new Delete(Bytes.toBytes(row));
    deleteList.add(delete);
}
htable.delete(deleteList);
htable.close()

HTable htable = new HTable(conf, tableName);
Scan scan = new Scan()
ResultScanner scanner = htable.getScanner(scan);
for (Result result: scanner) {
    Cell[] cells = result.rawCells();
    for (Cell cell: cells) {
        System.out.println("Certainly" button: + Bytes.toString(CellUtil.cloneRow(cell)));
        System.out.println("Column family" + Bytes.toString(CellUtil.cloneFamily(cell)));
        System.out.println("Column." + Bytes.toString(CellUtil.cloneQualifier(cell)));
        System.out.println("Values." + Bytes.toString(CellUtil.cloneValue(cell)));
    }
}
htable.close();

HTable htable = new HTable(conf, tableName);
Get get = new Get(Bytes.toBytes(rowKey));
//get.setMaxVersions(); Show all versions
//get.setTimeStamp(); Displays the version of the specified timestamp
Result result = table.get(get);
for(Cell cell : result.rawCells()){
    System.out.println("Certainly" button: + Bytes.toString(result.getRow()));
    System.out.println("Column family" + Bytes.toString(CellUtil.cloneFamily(cell)));
    System.out.println("Column." + Bytes.toString(CellUtil.cloneQualifier(cell)));
    System.out.println("Values." + Bytes.toString(CellUtil.cloneValue(cell)));
    System.out.println("Timestamp :" + cell.getTimestamp());
}
htable.close();
Copy the code

To optimize the

  1. HMaster high availability
touch conf/backup-masters
echo hadoop102 > conf/backup-masters
Copy the code
  1. pre-split
byte[][] splitKeys = HBaseAdmin hAdmin =new HBaseAdmin(HbaseConfiguration.create());
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
hAdmin.createTable(tableDesc, splitKeys);
Copy the code
  1. RowKey Design data is uniquely identified by the RowKey. Data is stored in the pre-partition where the RowKey is located. RowKey is designed to prevent data skew.
  • Generate random numbers, hashes, and hash values
  • String flip
  • String splicing
  1. Memory optimization HBase operations require a large amount of memory overhead. Generally, 70% of the available memory is allocated to HBase. Do not allocate too much memory to HBase because GC takes a long time.

  2. Optimization of basic configuration

// Allow HDFS files to append content hdFs-site. XML and hbase-site. XML dfs.support.append =true/ / raise maximum file open on a DataNode to allow several HDFS - site. XML DFS. DataNode. Max. Transfer. The threads = 4096 (default) / / delay data, raise the waiting time, To ensure that the socket is not a timeout HDFS - site. XML DFS. Image. Transfer. Timeout = 6000 (default value, Ms) / / set the output compressed mapred - site. XML press = mapreduce.map.output.comtrueMapreduce.map.output.com press. Press the codec = org.apache.hadoop.io.com. GzipCodec / / read/write data is large, Raise the RPC listening hbase - site. XML hbase. Regionserver. Handler. Count = 30 (default) / / hbase needs to run when MR tasks, because a Region corresponding to a map task, . So let's cut hbase. Hregion. Max filesize = 10737418240 (10 g, default) / / increase the value can reduce the number of RPC calls, But will consume more memory hbase - site. XML hbase. Client. Write. The buffer / / set scan. Next scanning to obtain the number of rows by hbase hbase - site. XML hbase.client.scanner.cachingCopy the code