1. HBase application in Ad Tracking
1.1 Service scenario of Ad Tracking
Ad Tracking is TalkingData’s mobile Ad monitoring product, whose core business model is attribution. Ad Tracking tracks click events (when a user clicks on an Ad) and activation events (when a user installs an App) after an App user clicks on an Ad and then installs the App or game to which the Ad jumps.
Attribution need to do is to receive activation, by activating events before the match received the click event, if the activation attribution to click, then the activated event is to click on ads, is attribution to the corresponding advertising promotional activities, and promotion activity corresponds to a channel, attribution to the promotion of attribution to the advertising channels, etc. All subsequent effect point events (such as in-app registration and login events) will also find the corresponding promotion information through the corresponding activation event.
Raw information about activation and various effect point events, including the corresponding device, attributed promotion, etc., can be provided to Ad Tracking users as a reference – Ad Tracking data export feature.
1.2 HBase and Data Export
As a distributed column storage device, HBase provides powerful data writing capabilities. As the column storage device is used, fields can be dynamically added as requirements increase, which is very suitable for Ad Tracking data export scenarios.
With a properly designed Rowkey, HBase can ensure fast query speed. After exporting data using the Ad Tracking background, users can download data in seconds, ensuring a good export experience.
This section describes the HBase architecture and application of the Ad Tracking data export function.
2. HBase architecture
Figure: HBase basic architecture
-
Master:
▫ Table operations, such as modifying column family configurations
▫ Region allocation, merge, and split
-
Zookeeper:
▫ Indicates whether the server is alive or accessible
▫ master HA
▫ Record the location where HBase metadata information is stored
-
Region Server: Writes and queries data
-
HDFS: Stores data. Region Does not directly interact with disks. HDFS is used to drop and read data
3. Data writing
3.1 Data writing process
Figure: Data write overview
-
WAL: write ahead log. Data is first written to log to ensure data loss. This log is also saved in the HDFS
-
MemStore: Data is stored in memory and sorted by rowkey
-
HFile: When data in MemStore reaches a certain amount or time, HFile is created and dropped
3.2 data format
All data stored in HBase is byte arrays. Therefore, data that can be converted into byte arrays can be stored in HBase.
4. Storage model
Figure: HBase storage concept model
-
Table: A table consists of one or more column families
-
Row: A row contains multiple columns, which are classified by column families, and each row has a unique primary key rowkey
-
Column family: A column family consists of several columns that are physically stored together, so the columns in a column family typically need to be read together during a query. Attributes of the data, such as timeouts, compression algorithms, and so on, need to be defined on the column family
-
Columns: A row contains multiple columns, which are maintained in one or more column families
-
Cell: The contents of a column are stored in a cell and, if updated, in multiple versions
5, storage implementation
Figure: HBase storage structure
5.1 region
Table data is distributed on all servers in the form of regions. Regions exist to solve the problem of horizontal scaling.
5.1.1 Region Splitting
By evenly distributing data to all machines, you can make full use of the capabilities of each server to improve query speed. As data is written, regions increase. If a region is too large, the query performance deteriorates. Therefore, hbase automatically splits regions.
The following are two common region splitting strategies:
-
ConstantSizeRegionSplitPolicy: older versions of Hbase use the split of strategy and carried out in accordance with the fixed size split, the default is 10 g. Disadvantages: too rigid, too simple, whether the amount of data written is large or small, are judged by the fixed value
-
IncreasingToUpperBoundRegionSplitPolicy: the new version of the default strategy, this strategy can with data growth, the dynamic change of split threshold.
5.1.2 region的merge
Scenario: A large amount of data in a region is deleted. Therefore, you do not need to merge multiple regions
5.2 store
A region has multiple stores. Stores are column family level. If a table has three column families, a region on a server has three stores.
5.2.1 MemStore
Each column family /store corresponds to an independent MemStore, that is, a block of memory space. After data is written, the contents of the column family are entered into the corresponding MemStore, sorted by rowkey, and an index similar to Btree — RMS-tree is created.
Lums-tree (Log-structured Merge Tree)
LMS USES the index of the Tree structure is the same as the B + Tree, and by bulk storage technology to evade disk random write problem, because the data up, first of all will be sorted in the memory, constructing index, after reaching a certain amount of time, flush to disk, along with the increase in the small file on disk, the background will be automatically merged, Combining too many small files into one large file can effectively speed up the query.
Figure: Merging of LMS trees
Flush time:
-
The size reaches the flush threshold
-
The memStore sum of the RegionServer reached the threshold. Procedure
-
Memstore reaches the scrub interval
-
The number of WAL is greater than maxLogs
-
Manually triggered Flush
5.2.2 HFile
HBase data file. All HBase data is saved in HFile. Query is also performed in HFile.
HFile contains multiple data blocks that store data within a column family, as well as associated indexes:
-
Scan Block: Part that needs to be read during scan query
▫ Data block: data KV storage
▫ Leaf Index block: indicates the leaf node of a Btree
▫ Bloom Block: Bloom filter
-
none scan block
▫ meta block
▫intermediate Index block: intermediate node of a Btree
-
Load on open: The HFile needs to be loaded into the memory
▫root Index block: indicates the root node of the Btree
▫ meta index
▫ file info
▫ Bloom filter metadata: index of bloom filter
-
Trailer: The offset of each part above is recorded. When HFile is read, this part will be read first, and then the location of other parts will be obtained
Hfile compaction:
Every memStore swipe generates a new HFile, and HFile is, after all, something stored on the hard disk. Reading anything stored on the hard disk involves one operation: addressing, which in the case of a traditional hard disk is moving head addressing, which is a slow operation. When there are more hfiles, there are more addressing actions each time data is read and the query speed becomes slower. To avoid addressing too many files, you need to minimize fragmentation and run a constant compaction in the background.
Classification of compaction:
-
Small compaction: Compacts a small HFile into a large one
-
When this compaction happens, a document marked for deletion will be deleted only after a new compaction occurs
The process of compaction:
-
Reads hfiles from the compaction list
-
Create a scanner for reading data
-
Read the contents of hfile into a temporary file
-
Temporary files replace multiple Hfiles that were created before compaction
6,Data query
6.1 Query Sequence
1. Query block cache first: The load on Open part of hfiles is resident in the memory. Data blocks are stored on disks. When a data block is located, HBase loads the entire data block to the block cache. Check whether the block cache exists. If yes, query the block cache first. Block cache can be used safely because of the immutability of hfiles. Subsequent modification and deletion operations do not modify hfiles directly, but append new files, so as long as the Hfile exists, the corresponding block cache remains unchanged.
2. If the block cache fails to be queried, search for Region (memstore + hfile) : Use the hbase metadata table to find the Region server where the rowkey to be queried resides and locate the memstore and hfile
6.2 Region Search Process
Figure: Region lookup process
A table has multiple regions distributed on different machines. Therefore, a mechanism is required to determine the region to be searched
-
Sever: The meta table location is stored in zK. The meta table contains the rowkey range of each region and the region location
-
Use meta to query the server where the region to be queried resides
-
Query information on the server
The client caches meta information to speed up query.
6.3 query API
-
Get: Queries the column corresponding to a rowkey
-
Scan: scan for a specified rowkey range (setStartRow, setStopRow)
-
Filter: Filters the contents during scan
Specifying the rowkey range is the most effective way to speed up query, while not specifying the Rowkey range requires full table sweep
7 HBase design for Ad Tracking
Rowkey structure: partition key-pID-eventtime-spreadid-sequence
-
Partition key: number of Hashcode/hbase regions with a unique key (random string)
-
Pid: indicates the unique primary key of the application
-
EventTime: indicates the time of the event
-
Spreadid: The increment unique primary key of the promotion
-
Sequence: A random sequence that guarantees that events with the same fields as above will not overwrite writes
The HBASE Rowkey of Ad Tracking is designed based on service fields. Data of the same application is stored in the same region, which enables fast query. However, the amount of query varies with the amount of user data. Currently, the space usage of each region of Ad Tracking HBase is uneven to some extent, but it is acceptable.
Generally, the HBase Rowkey contains more or less information related to services. The random rowkey is completely irrelevant to services. Therefore, only all tables can be scanned during query, resulting in low query efficiency. The key to rowKey design is to balance the relationship between query speed and data balancing. The following are suggestions for Rowkey design.
7.1 RowKey Length Design Suggestions
-
The persistent data file HFile is stored according to KeyValue. If the rowkey is too long, for example, more than 100 bytes and 1000W rows of data, the rowkey will occupy 100* 1000W =1 billion bytes, or nearly 1 gigabyte of data. This will greatly affect the storage efficiency of HFile.
-
The MemStore caches part of the data to the memory. If the Rowkey field is too long, the memory utilization decreases and the system cannot cache more data, which reduces the retrieval efficiency.
-
Currently, most operating systems are 64-bit systems with 8-byte memory alignment, and rowKey length is recommended to be controlled at 16 bytes (integer multiple of 8 bytes) to make full use of the best features of the operating system.
7.2 RowKey Design With salt
Figure: Rowkey design – salt
Use fixed random prefixes:
-
Advantages: Data balancing
-
Cons: Can’t get fast because prefixes are random; And scan speed is ok
7.3 RowKey Design Mode: Hash
Figure: Rowkey design – hashing
Select the first five MD5 values after the rowkey hash:
-
Advantages: Scatter data, prefix can be obtained by rowKey during query, can be quickly get
-
Disadvantages: Rowkeys with the same prefix are scattered, and scan becomes slow
7.4 RowKey Design Mode: Reverse
Figure: RowKey design – reverse
Reverse a fixed length rowkey, or the entire reverse. The three urls in the preceding figure belong to the same domain name. However, if they are not reversed, they are scattered to different regions, making search difficult.
end
The resources
– [hbase-io-hfile-input-output](http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/)
] – [a deep understanding of HBase system architecture (https://blog.csdn.net/Yaokai_AssultMaster/article/details/72877127)
] – [HBase underlying storage elements (https://www.cnblogs.com/panpanwelcome/p/8716652.html)
] – [HBase – explore HFile index mechanism (http://hbasefly.com/2016/04/03/hbase_hfile_index/)
– [HBase – HFile structure analysis of storage files](
http://hbasefly.com/2016/03/25/hbase-hfile/)
By TalkingData