1. HBase Basics and Storage Advantages HBase has five core advantages, including massive storage, column storage, easy expansion, high concurrency, and sparse matrix. HBase can be applied to object storage, user portrait recommendation, real-time chat message flow, indexes, reports, trajectory data, and monitoring data.
2. HBase Application Sharing This section describes the HBase application in face recognition. On JD.com, face recognition is mainly used in unmanned supermarkets, dynamic billboards and AR trial mirrors. Face recognition can be divided into two aspects: offline face recognition and online face recognition. Separately, offline face recognition includes face attribute recognition, including gender, age and expression recognition, followed by small scene face recognition, such as mobile phone face recognition, small scene face recognition speed is relatively high, data and models are stored locally. The third is the number of face detection, such as judging and analyzing the passenger flow of a certain area through the camera. For online face recognition, it is necessary to input face information, followed by face search, such as the commonly used access control scene. The third is face calibration, such as id card and my corresponding recognition.
Logging For offline face recognition, data only needs to be saved locally, and data needs to be periodically written to the server in batches instead of being matched in the cloud. For the face search such scenarios, such as entrance guard, when people go to brush a face, the client will take pictures of human faces figure, first determine whether anyone in the picture, if there is a person would go to the cloud for multi-threaded search, as long as one of the threads can search success is considered successful, then can open the door. All of the above are log batch write scenarios.
Second is the column storage scenario, some data may exist crossover, but not exactly the same, such as the six scenarios in the figure above, face search, face inspection, face verification, face property detection, face features and live face, etc.. In jd.com’s hbase-based face recognition practice, the RowKey of the main table is designed using data_time_UUID, where time is the maximum time minus the current time, so that the latest data is placed at the top, and uUID is a random ID. Because it is difficult to query data directly from the primary table, different index tables are created according to different business dimensions. The index table stores rowkeys of the current primary table. Therefore, you can find the index table according to different business dimensions, and then find the data in the main table.
3. HBase data analysis As shown in the figure below, at the beginning, statistics are made according to different types, different times and different users, and different works are written. The more dimensions of statistics, the more works, and the more tasks need to be used when the demand is infinite. And this makes task management very cumbersome.
Therefore, improvements to the above sections are made as shown in the figure below. Upper-layer applications write data to HBase and access Kafka directly to the data warehouse in real time. Additionally, the data warehouse increments the HBase data on a daily basis.
The following figure shows the design of the data warehouse, which needs to be cleaned after the data is extracted. Because some fields have no effect, they need to be cleaned, divided according to the dimensions, and finally classified into different applications to provide services according to the needs.
HBase data is mainly used for statistical reports. Offline query is implemented based on Hive, which is convenient. Another aspect is the large screen of real-time data. Data needs to be written to HBase in real time and API services are provided externally.
There are also many problems in the process of using HBase. The first one is that the project is implemented based on Spring. However, due to the poor performance of Spring HBase, the read and write performance becomes poor when the amount of data increases. The main reason for this is the frequent need to reconnect, which is why a native API was chosen in practice. In practice, HBase hotspot problems are solved by pre-partitioning.
HBase has many advantages, so it is suitable for data storage applications in big data scenarios. For example, popular face recognition scenarios include online and offline face recognition. Each scheme has different service features. Currently, HBase also faces many challenges in project practices, such as performance and hotspot issues. However, solutions are often found.