HBase

Install the standalone

The environment

Centos7

Hbase

Install the JDK

yum install java-1.8.0-openjdk* -y
Copy the code

Download the HBASE

Mirror.bit.edu.cn/apache/hbas…

Into Linux

tar -xf hbase-1.2.8-bin.tar.gz
cd hbase-2.2.6
Copy the code

Example Modify the JAVA_HOME configuration file

Vim conf/hbase-env.sh // Note that this is the Java location on CentOS export JAVA_HOME=/etc/alternatives/java_sdk_1.8.0/
Copy the code

Start the

./bin/start-hbase.sh
Copy the code

Check the Web – the UI

http://localhost:16010 View the Hbase webui to check whether the Hbase is successfully started.

Client

Own Client

./hbase Shell # Check helphelp
Copy the code

Create a table

You need to specify the table name and column cluster name

Hbase (main):105:0> create 'mytest', 'lt' Created table mytest Took 0.7247 seconds => hbase :: table-mytestCopy the code

View the currentTable information

Hbase (main):001:0> list 'mytest' TABLE mytest 1 row(s) Took 0.2895 seconds => ["mytest"]Copy the code

See the tableThe detailed information

hbase(main):002:0> describe 'mytest' Table mytest is ENABLED mytest COLUMN FAMILIES DESCRIPTION {NAME => 'lt', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_ WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => 'QUOTAS 1 row(s) QUOTAS 0 row(s) Took 0.1686 secondsCopy the code

Write data

Write four random pieces of data

It’s not arbitrary. For an explanation, see the data model section later.

Hbase (main):003:0> put 'mytest','row1','lt:a','value1' Took 0.0392 seconds hbase(main):004:0> PUT 'mytest','row2','lt:b','value2' Took 0.0042 seconds hbase(main):005:0> put 'mytest','row3','lt:c','value3' Took 0.0025 Seconds hbase(main):006:0> put 'mytest','row3','lt:d','value4' Took 0.0054 secondsCopy the code

View all the data in the table

hbase(main):007:0> scan 'mytest' ROW COLUMN+CELL row1 column=lt:a, timestamp=1609210379880, value=value1 row2 column=lt:b, timestamp=1609210387527, value=value2 row3 column=lt:c, timestamp=1609210396636, Value =value3 Row3 column=lt:d, timestamp=1609210494696, value=value4 3 row(s) Took 0.0188 secondsCopy the code

Viewing a message

Case 1

Get 'mytest', 'row1' COLUMN CELL lt:a timestamp=1609210379880, value=value1 1 row(s) Took 0.0202 secondsCopy the code

Case 2

hbase(main):009:0> get 'mytest' , 'row3' COLUMN CELL lt:c timestamp=1609210396636, Value =value3 lt:d timestamp=1609210494696, value=value4 1 row(s) Took 0.0059 secondsCopy the code

As you can see, row1,row2 are actually keys. The retrieved data is also different because, at write time, one more attribute is written to Row3.

other

Type help to view more commands.

HBASE data model

Table

This corresponds to the case myTest above

An Hbase table consists of multiple rows

Row

Corresponding to case row1 above…

A row in Hbase consists of one or more columns with values. Row is sorted alphabetically, so the design of the Row is important. This design allows the related lines to be very close together. Usually the line design is to reverse the domain name of the site, such as org.apache. WWW, org.apache.mail, org.apache.jira, so that all apache domain names are close together.

Column

Corresponding to the above case lt:

A column consists of a column cluster and the column identifier (column cluster: column IDENTIFIER). You do not need to specify a column identifier when creating a table

Column Family

Lt :a, lt:b….

A column cluster physically contains many columns and column values, and each column cluster has some stored properties that can be configured. For example, whether to use cache, compression type, number of storage versions, etc. In a table, every row has the same column cluster, although some of the column clusters store nothing.

Column Qualifier

A qualifier for a column cluster, understood as a unique identifier for a column. However, column identifiers can be changed, so each row may have different column identifiers

Cell

{row key,column (=

+

), version} is a unique Cell consisting of row,column family,column qualifier, including timestamp and value.

Generally, the latest version of get is displayed. You can also specify the following to display the data of the latest two versions

hbase(main):004:0> get 'mytest','row3',{COLUMNS=>['lt:c','lt:d'],VERSIONS=>2} COLUMN CELL lt:c timestamp=1609210396636, Value =value3 lt:d timestamp=1609210494696, value=value4 1 row(s) Took 0.0050 secondsCopy the code

Timestamp

Timestamp =1609210494696…

The timestamp is usually written next to value and represents the version number of a value. The default timestamp is the moment you write data, but you can specify a different timestamp when you write data

About the index

HBase is a sparse, distributed, persistent, multidimensional, and sorted mapping. It is indexed by row keys, column keys, and timestamps.

About the order

When Hbase stores data, two sortedmaps are used to sort data by rowkey, and then Column.

hbase(main):009:0> get 'mytest' , 'row3' COLUMN CELL lt:c timestamp=1609210396636, Value =value3 lt:d timestamp=1609210494696, value=value4 1 row(s) Took 0.0059 secondsCopy the code

Select * from row where lt:* from row3 where lt:* from row3

Comparison between Hbase and relational databases

attribute Hbase RDBMS
The data type Just strings Rich data types
Data manipulation Add, delete, change and check do not support JOIN Various functions join tables
Storage mode Column-based storage Based on table structure and row storage
Data protection The old version remains after the update replace
scalability Easily add nodes Requires an intermediate layer, sacrificing performance

Considerations for Hbase design

Hbase key concepts include tables, rowkeys, column clusters, and timestamps

  • How many column clusters the table should have
  • What data does the column cluster use
  • How many columns does each column cluster have
  • What is the column name that you need to know about reading and writing data even though you don’t have to define it when you’re building a table
  • What data should the unit store
  • How many versions of time are stored per cell
  • What is a rowKey structure and what information should it contain

Design points

Row key design

The key part is directly related to the access performance of subsequent services. If the design is not reasonable, the efficiency of follow-up query service will decrease exponentially.

  • Avoid monotonous incremental data entry. Hbase data entry is arranged in an orderly manner. As a result, most data entry operations may be performed on a Region in a period of time, and the load is distributed on one node. It can be designed as: [metric_type][event_timestamp]. Different metric_type can distribute pressure to different regions
  • Row keys are short enough to be readable, and because querying short keys does not perform much better than long keys, there is a length tradeoff in design.
  • Xingjian can not be changed, the only way to change is to delete and then insert

Column cluster design

A column cluster is a collection of columns whose members have the same prefix, delimited by a colon (:).

  • Currently, Hbase cannot process more than two or three column clusters properly. Therefore, keep the number of column clusters as small as possible. If A table has multiple column clusters, column cluster A has 1 million rows and column cluster B has 1 billion rows.
  • The length of column cluster name should be as small as possible. One column cluster name should save space and speed up efficiency. For example, D stands for data and V stands for value

Column cluster property configuration

HFile data block. The default value is 64KB. The size of the database affects the size of the data block index. If the data block is large, the more data is loaded into the memory at one time, the better the scan query effect is. However, if the data block is small, random query performance is better.

> create 'mytable',{NAME => 'lt1', BLOCKSIZE => '65536'}
Copy the code

Block cache: Block cache is turned on by default, and can be turned off for less accessible data

> create 'mytable',{NAME => 'lt1', BLOCKCACHE => 'FALSE'}
Copy the code

Data compression, compression will improve disk utilization, but will increase CPU load, control according to the situation

> create 'mytable',{NAME => 'lt1', COMPRESSION => 'SNAPPY'}
Copy the code

Hbase table design is based on requirements. However, following certain hard specifications of table design helps improve performance. This section describes key points for Hbase table design.