Cloud HBase publishes full-text index services to facilitate complex queries

Cloud HBase has released the full-text index service. For cloud HBase instances created after January 25, 2019, the full-text index service can be enabled for free on the console. With this function, users can build more diversified search services on HBase, no longer limited to simple KV query, no longer have to worry about designing various rowkeys, and no longer have to worry about changing complex HBase query services. The full-text index service is designed to enhance the query capability of cloud HBase and automatically synchronizes data. Users only need to pay attention to how to enrich their service architecture with the powerful search function.

Why is the HBase search capability enhanced

When using HBase, we all face the problem of designing the HBase Rowkey. However, despite our excellent engineers, who sorted out all the business search requirements and tailored and compromised this and that business, we still could not design a comprehensive Rowkey to meet the various business query requirements. For example, in a logistics management system, we need to search for any combination of recipient name/mobile phone/address, sender name/mobile phone/address, waybill number/start time/end time, postman name/mobile phone and other conditions. In such a complex query case, the original KV query of HBase cannot be satisfied, and the randomness of query conditions cannot be satisfied despite how we design rowkey. In addition, fuzzy queries such as name, address, and mobile phone number may be involved in these queries, which cannot be satisfied by HBase RowKey. For example, if a new retail service needs to query the product title or description by keyword, fuzzy query can be implemented only in HBase. However, fuzzy query is inefficient in HBase. It is suitable for keyword query in title or description. HBase does not provide the keyword query function. In addition, in the new query retail business, in order to improve the user experience, often will increase the demand of the search results are classified statistics, such as in e-commerce sites, we search the key word “fashion”, show on the keyword matching results of goods, in accordance with the clothes, electronics, daily and other types are classified statistical matching results, In this way, users can select corresponding categories for secondary query and quickly find the desired products, thus improving user experience. HBase also cannot provide this function. In order to adapt to the query characteristics of the HBase system, services are compromised. Only some KV query services are retained, and other query services that can improve user experience are cut off.

To sum up, we listed several pain points encountered in HBase query service design:

Any combination of queries cannot be satisfied
Fuzzy queries are not supported efficiently
Keyword segmentation query is not supported
Multi-dimensional sorting/paging is not supported efficiently
The result set of the query cannot be classified for statistics

Cloud HBase full-text index service to enhance HBase search capability

Full-text index service is designed to enhance HBase query capability, which enriches HBase’s query capability under complex conditions in addition to its powerful KV capability. The following scenarios are abstracts:

Complex conditions arbitrary query
Multidimensional sort
Complex conditional paging
Word segmentation keyword query
Matching result set classification statistics
Min/Max/AVG /sum and other stats statistics are commonly used

The cloud HBase full-text index service is simple to use. It only needs to be established in the DDL phase and then automatically synchronizes data indexes. The architecture is as follows:

And self-built

function	Full-text index is enabled in cloud HBase	Self-built HBase + indexer + solr	HBase
Simple Rowkey query	support	support	support
Complex queries	support	support	Does not support
The index synchronization	support	support	Does not support
Random sequence synchronization	support	Does not support	— –
Strong consistent	support	Does not support	— –
Dynamic XML columns	support	Does not support	— –

In addition, there are several bugs in hbase+ Indexer + Solr, which leads to data loss reported by many users. Cloud HBase has made a number of bugfixes and improvements to this.

How do I use the cloud HBase full-text index service

Use the cloud HBase full-text index service. After this service is enabled, users only need to create indexes with simple DDL and insert infinite synchronization management. Users only need to pay attention to subsequent queries and use the HBase API or Solr API to construct rich service queries. Let’s take a quick look at the process.

Open the service

Full-text Index Service is a free extension service of cloud HBase. On the cloud HBase instance console created after January 25, 2019, click the full-text Index Service details page on the left of the instance to enable the service, as follows:

After application, the Solr access address and WebUI connection are shown as follows:

The Solr ZK address can be accessed by cloud Solr Client, which has its own load balancing function. The method of accessing the Solr WebUI is the same as that of accessing the cloud HBase WebUI. Set the user password and whitelist for the first time, and click the link to switch to the Solr WebUI.

indexing

Download the index management client tool

Wget http://public-hbase.oss-cn-hangzhou.aliyuncs.com/installpackage/solr-7.3.1-ali-1.0.tgz tar ZXVF Solr - 7.3.1 - ali - 1.0. TGZCopy the code

Change the ZK_HOST of the solr-7.3.1-ali-1.0/bin/solr.in.sh file to the following:

ZK_HOST=zk1:2181,zk2:2181,zk3:2181/solrCopy the code

The ZK address is the SOLR ZK access address after the full-text indexing service is opened on the console above.

Create an HBase table and enable replication

create  'solrdemo',{NAME=>'info',  REPLICATION_SCOPE=> '1'}Copy the code

XML /schema. If you do not need to modify the solrconfig. XML /schema, use the default Demo config to upload the solrconfig. XML /schema as follows:

Solr - 7.3.1 - ali - 1.0 / bin/solr zk upconfig-d _democonfig  -n democollection_config -z zk1:2181/solrCopy the code

Step 2, create the DemoCollection using the newly uploaded configuration as follows:

curl "http://hostname:8983/solr/admin/collections? action=CREATE&name=democollection&numShards=1&replicationFactor=1&collection.configName=democollection_config"Copy the code

Where hostname can be replaced with ZK hostname of master3-1 infix.

Configure the field mapping index relationship between HBase SolrDemo and Solr DemoCollection. First, edit index_conf. XML to configure the mapping relationship, for example:

<? xml version="1.0"? > <indexer table="solrdemo">
<field name="name_s" value="info:q2" type="string"/>
<field name="age_i" value="info:q3" type="int"/>
<param name="update_version_l" value="true"/>
</indexer>Copy the code

This configuration describes how to map the info: Q2 info:3 of hbase table SolrDemo to name_s and age_I fields in Solr DemoCollection. And specify that the info: Q2 column be parsed as string to be saved in the name_s field and the info: Q3 column be parsed as int to be saved in age_i. The types of name_s and age_i of solr collection are determined according to the configuration of Solr collection. Dynamic type inference is adopted by default, that is, the storage is determined according to the name suffix of the collection field. Common type _i, _s, _l, _b, _f, _d corresponding int/string/long/Boolean/float/double. Of course, the user can also specify the field type directly. The last update_version_l is a fixed version that holds the latest update date at the document level. Second, use a tool to set the index mapping relationship between the index_conf. XML table solrDemo and Solr table DemoCollection. Run the following command:

Solr-7.3.1-ali-1.0 /bin/solr-indexer add \ -n demoindex \-f indexer_conf.xml  \
     -c democollectionCopy the code

At this point, we have completed the relational mapping of the index. Then, we can insert hbase normally, and do not need to worry about index synchronization. It will automatically synchronize the corresponding fields of the hbase SolrDemo table to the corresponding fields of the Solr DemoCollection table. The above example is mapped as follows:

The rowKey of the HBase table maps to the ID field in the Solr table.

The query to retrieve

The query is simple and fully compatible with the open-source HBase API and Solr API. Solr is used for conditional query based on services. In the result set, the ID field is all eligible HBase Rowkeys, and only this ID is converted into rowkeys. And use the HBase API to read the original data belonging to the row. The flowchart is roughly as follows:

Looking forward to

Index management is easier to use
SQL entry access full-text indexing service
Full text engine new generation more efficient copy mechanism
In addition to asynchronous indexes, synchronous indexes will be supported later

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.

Cloud HBase publishes full-text index services to facilitate complex queries

Why is the HBase search capability enhanced

Cloud HBase full-text index service to enhance HBase search capability

And self-built

How do I use the cloud HBase full-text index service

Open the service

indexing

The query to retrieve

Looking forward to

Related Posts

Read 30+ top machine learning papers from GIthub.com

Airtest launcher nifty – add custom command line arguments

LInux environment Nginx build use