one HBase feature application and kernel improvement
Some simple HBase operations, such as Scan and Get, are performed in didi. Each operation can be applied to different scenarios, for example, Scan can derive timing and reports. Timing can be applied to trajectory design, establishing timing sequence with business ID, timestamp and trajectory position as a whole. In addition, in asset management, asset status is divided into different stages, and asset ID, time stamp, asset status and other information are established time sequence. Scan is also widely used in reports. In fact, there are many methods. The main method is phoenix, and the standard SQL operation Hbase is used for online transaction processing. In this method, attention should be paid to primary key and secondary index design. In the report, users’ historical behavior, historical events and historical orders will be designed in detail.
Assume that the original cluster has three hosts: ReplicationSource01, ReplicationSource02, and ReplicationSource03. The target cluster has four hosts: RS01, RS02, RS03, and RS04. If the original cluster sends a replication request, traditional logic sends the request randomly. If the tables in the target cluster are stored on two hosts in GROUP A, sending them randomly may result in the hosts not receiving replication requests and instead sending them to groups unrelated to the business. Therefore, the execution policy is optimized to match requests that may be sent to other clusters in the original cluster and obtain the allocation of the target cluster GROUP so that the requests can be sent to the corresponding GROUP host to prevent other services from being affected.
Based on the HBase version of Didi, users may encounter Connection problems. For example, after multiple connections are established, a large number of ZKS may exist. Therefore, you need to manage the Connection. The ClusterConnection is created in RS to minimize Connection creation. This applies in a Phoenix secondary index scenario, described in the Phoenix section.
4. Optimize ACL permission authentication
Acls match user names and passwords with IP addresses. Userinfo is created to store user password information, rowKey is the user name, and Column Family (CF) is the password. HBase:ACL This table has the/ACL node in ZK. Similarly, the userinfo node is created.
1. Principle and architecture of Phoenix
Phoenix provides a FRAMEWORK for SQL operations based on HBase to manage source data. RegionServer3 stores the System. CATALOG table. On each RegionServer, Coprocessor performs operations such as query, aggregation, and secondary index establishment.
1. GeoMesa architecture principles
Didi has recently started research on GeoMesa, but has not yet gained extensive online experience. GeoMesa is a large-scale spatio-temporal data query and analysis engine based on distributed storage, computing systems. In other words, you can select a storage area and enter data in a variety of formats, such as Spark Kafka. The architecture is shown below:
In order to achieve stability and capacity planning, Didi mainly carried out the following work.
The original link
This article is the original content of the cloud habitat community, shall not be reproduced without permission.