HBase
Install the standalone
The environment
Centos7
Hbase
Install the JDK
yum install java-1.8.0-openjdk* -y
Copy the code
Download the HBASE
Mirror.bit.edu.cn/apache/hbas…
Into Linux
tar -xf hbase-1.2.8-bin.tar.gz
cd hbase-2.2.6
Copy the code
Example Modify the JAVA_HOME configuration file
Vim conf/hbase-env.sh // Note that this is the Java location on CentOS export JAVA_HOME=/etc/alternatives/java_sdk_1.8.0/
Copy the code
Start the
./bin/start-hbase.sh
Copy the code
Check the Web – the UI
http://localhost:16010 View the Hbase webui to check whether the Hbase is successfully started.
Client
Own Client
./hbase Shell # Check helphelp
Copy the code
Create a table
You need to specify the table name and column cluster name
Hbase (main):105:0> create 'mytest', 'lt' Created table mytest Took 0.7247 seconds => hbase :: table-mytestCopy the code
View the currentTable information
Hbase (main):001:0> list 'mytest' TABLE mytest 1 row(s) Took 0.2895 seconds => ["mytest"]Copy the code
See the tableThe detailed information
hbase(main):002:0> describe 'mytest' Table mytest is ENABLED mytest COLUMN FAMILIES DESCRIPTION {NAME => 'lt', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', RE PLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_ WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => 'QUOTAS 1 row(s) QUOTAS 0 row(s) Took 0.1686 secondsCopy the code
Write data
Write four random pieces of data
It’s not arbitrary. For an explanation, see the data model section later.
Hbase (main):003:0> put 'mytest','row1','lt:a','value1' Took 0.0392 seconds hbase(main):004:0> PUT 'mytest','row2','lt:b','value2' Took 0.0042 seconds hbase(main):005:0> put 'mytest','row3','lt:c','value3' Took 0.0025 Seconds hbase(main):006:0> put 'mytest','row3','lt:d','value4' Took 0.0054 secondsCopy the code
View all the data in the table
hbase(main):007:0> scan 'mytest' ROW COLUMN+CELL row1 column=lt:a, timestamp=1609210379880, value=value1 row2 column=lt:b, timestamp=1609210387527, value=value2 row3 column=lt:c, timestamp=1609210396636, Value =value3 Row3 column=lt:d, timestamp=1609210494696, value=value4 3 row(s) Took 0.0188 secondsCopy the code
Viewing a message
Case 1
Get 'mytest', 'row1' COLUMN CELL lt:a timestamp=1609210379880, value=value1 1 row(s) Took 0.0202 secondsCopy the code
Case 2
hbase(main):009:0> get 'mytest' , 'row3' COLUMN CELL lt:c timestamp=1609210396636, Value =value3 lt:d timestamp=1609210494696, value=value4 1 row(s) Took 0.0059 secondsCopy the code
As you can see, row1,row2 are actually keys. The retrieved data is also different because, at write time, one more attribute is written to Row3.
other
Type help to view more commands.
HBASE data model
Table
This corresponds to the case myTest above
An Hbase table consists of multiple rows
Row
Corresponding to case row1 above…
A row in Hbase consists of one or more columns with values. Row is sorted alphabetically, so the design of the Row is important. This design allows the related lines to be very close together. Usually the line design is to reverse the domain name of the site, such as org.apache. WWW, org.apache.mail, org.apache.jira, so that all apache domain names are close together.
Column
Corresponding to the above case lt:
A column consists of a column cluster and the column identifier (column cluster: column IDENTIFIER). You do not need to specify a column identifier when creating a table
Column Family
Lt :a, lt:b….
A column cluster physically contains many columns and column values, and each column cluster has some stored properties that can be configured. For example, whether to use cache, compression type, number of storage versions, etc. In a table, every row has the same column cluster, although some of the column clusters store nothing.
Column Qualifier
A qualifier for a column cluster, understood as a unique identifier for a column. However, column identifiers can be changed, so each row may have different column identifiers
Cell
{row key,column (=
+
), version} is a unique Cell consisting of row,column family,column qualifier, including timestamp and value.
Generally, the latest version of get is displayed. You can also specify the following to display the data of the latest two versions
hbase(main):004:0> get 'mytest','row3',{COLUMNS=>['lt:c','lt:d'],VERSIONS=>2} COLUMN CELL lt:c timestamp=1609210396636, Value =value3 lt:d timestamp=1609210494696, value=value4 1 row(s) Took 0.0050 secondsCopy the code
Timestamp
Timestamp =1609210494696…
The timestamp is usually written next to value and represents the version number of a value. The default timestamp is the moment you write data, but you can specify a different timestamp when you write data
About the index
HBase is a sparse, distributed, persistent, multidimensional, and sorted mapping. It is indexed by row keys, column keys, and timestamps.
About the order
When Hbase stores data, two sortedmaps are used to sort data by rowkey, and then Column.
hbase(main):009:0> get 'mytest' , 'row3' COLUMN CELL lt:c timestamp=1609210396636, Value =value3 lt:d timestamp=1609210494696, value=value4 1 row(s) Took 0.0059 secondsCopy the code
Select * from row where lt:* from row3 where lt:* from row3
Comparison between Hbase and relational databases
attribute | Hbase | RDBMS |
---|---|---|
The data type | Just strings | Rich data types |
Data manipulation | Add, delete, change and check do not support JOIN | Various functions join tables |
Storage mode | Column-based storage | Based on table structure and row storage |
Data protection | The old version remains after the update | replace |
scalability | Easily add nodes | Requires an intermediate layer, sacrificing performance |
Considerations for Hbase design
Hbase key concepts include tables, rowkeys, column clusters, and timestamps
- How many column clusters the table should have
- What data does the column cluster use
- How many columns does each column cluster have
- What is the column name that you need to know about reading and writing data even though you don’t have to define it when you’re building a table
- What data should the unit store
- How many versions of time are stored per cell
- What is a rowKey structure and what information should it contain
Design points
Row key design
The key part is directly related to the access performance of subsequent services. If the design is not reasonable, the efficiency of follow-up query service will decrease exponentially.
- Avoid monotonous incremental data entry. Hbase data entry is arranged in an orderly manner. As a result, most data entry operations may be performed on a Region in a period of time, and the load is distributed on one node. It can be designed as: [metric_type][event_timestamp]. Different metric_type can distribute pressure to different regions
- Row keys are short enough to be readable, and because querying short keys does not perform much better than long keys, there is a length tradeoff in design.
- Xingjian can not be changed, the only way to change is to delete and then insert
Column cluster design
A column cluster is a collection of columns whose members have the same prefix, delimited by a colon (:).
- Currently, Hbase cannot process more than two or three column clusters properly. Therefore, keep the number of column clusters as small as possible. If A table has multiple column clusters, column cluster A has 1 million rows and column cluster B has 1 billion rows.
- The length of column cluster name should be as small as possible. One column cluster name should save space and speed up efficiency. For example, D stands for data and V stands for value
Column cluster property configuration
HFile data block. The default value is 64KB. The size of the database affects the size of the data block index. If the data block is large, the more data is loaded into the memory at one time, the better the scan query effect is. However, if the data block is small, random query performance is better.
> create 'mytable',{NAME => 'lt1', BLOCKSIZE => '65536'}
Copy the code
Block cache: Block cache is turned on by default, and can be turned off for less accessible data
> create 'mytable',{NAME => 'lt1', BLOCKCACHE => 'FALSE'}
Copy the code
Data compression, compression will improve disk utilization, but will increase CPU load, control according to the situation
> create 'mytable',{NAME => 'lt1', COMPRESSION => 'SNAPPY'}
Copy the code
Hbase table design is based on requirements. However, following certain hard specifications of table design helps improve performance. This section describes key points for Hbase table design.