HyperLogLog is an algorithm for cardinality statistics.
So let’s see what the cardinal number is. For example, a dataset {1, 3, 5, 7, 5, 7, 8} would have a cardinality of {1, 3, 5, 7, 8} and a cardinality (not repeating elements) of 5.
If, now, you need to count the UV of a web page, then it will involve de-duplication, which is a good scenario for using HyperLogLog.
That’s the set, right? I can use set to find elements that are not repeated.
Yes, you can, but when the amount of data is very large, does your set take up too much memory? HyperLogLog works fine because it takes a certain amount of space to calculate the cardinality. With only 12KB, you can calculate the cardinality of nearly 2^64 different elements.
But note that under this order of magnitude, there will be an error rate of 0.81%, so it depends on whether the business can accept such an error rate. For UV scenarios like the one above, this error rate is negligible.
A, pfadd
Add all element parameters to the HyperLogLog data structure.
pfadd mypf 1 2 3 a b c 3 4 5 c d a
Copy the code
Second, the pfcount
Returns the cardinality estimate for the given HyperLogLog.
pfcount mypf
Copy the code
As you can see, 9 is returned, which means the number of elements that are not repeated is 9.
Third, pfmerge
Multiple HyperLogLog are merged into one HyperLogLog, and the cardinality estimate of the combined HyperLogLog is calculated by union of all the given HyperLogLog.
pfmerge mypftotal mypf3 mypf4
Copy the code
Merge mYPF3 and mYPF4 into mypfTotal.