When I was working as a crawler in the company, we stored a large number of articles and author data in Hbase. As we all know, Hbase is positioned as an OLTP real-time database with large data scale, high concurrency, and millisecond response. However, when cross-region query is performed, the query duration will increase. Here’s one solution I used at the time: filters
What is a filter?
Hbase provides various filters to improve data processing efficiency. Users can filter data using built-in or user-defined filters. All filters take effect on the server, that is, predicate push-down. This ensures that the filtered data will not be transmitted to the client, reducing the pressure on the network transmission and the client.
Second,How to use a filter?
All built-in filters in Hbase directly or indirectly inherit the Abstract class FilterBase. The abstract class FilterBase implements the Filter interface. So how do you use filters? That’s even easier, just set it through the setFilter of Scan or Get.
Three, how to classify filters?
HBase built-in filter mainly can be divided into three categories: respectively is a filter, filters, special filters and packaging by Song Mou below and meet them together ~ ~ this article main introduce comparatively filter HBase, combination between the other two filters and filter in a future article will introduce ~
Four, comparison filter
All comparison filters inherit from the CompareFilter interface, and creating a comparison filter requires only two parameters, as shown in the figure, the comparison operator and the comparator instance
The parameters are described as follows:
1, CompareOp
We can also use the source code to understand the drop ~ as shown in the following figure, the source code in the way of enumeration clearly explains the types available for us to use.
2, ByteArrayComparable
This is an abstract class, we use the comparison filter, just need to pass in the implementation of our requirements can ~ can also inherit this abstract class, specific to achieve their own comparator.
Fifth, use
That said so much, there must be a small partner or will be more confused, that comparison filter specific what? What specific role can he play? Don’t worry, the dry stuff will be here soon
Comparison filters in HBase are classified as follows: RowFilter filters data based on row keys FamilyFilter filters data based on column families QualifierFilter filters data based on column qualifiers ValueFilter filters data based on cell content DependentColumnFilter Specifies a filter for a reference column to filter the other columns based on the timestamp of the reference columnCopy the code
For example, if we want to query the value of a column in Hbase, and we have given a baseline, the column value should not be greater than 111, we can write it like this
Filter filter = new QualifierFilter(
CompareFilter.CompareOp.GREATER,
new BinaryComparator(Bytes.toBytes(111))
);
Scan scan = new Scan();
scan.setFilter(filter);
Copy the code
Finally, I would like to emphasize to you that the Hbase filter takes effect on the server side. If properly used, it will greatly improve our work efficiency and greatly reduce the network pressure in the process of data query. Next, I will slowly share the use of dedicated filters, packaging filters and multiple filters