The problem
- To record the number of independent IP addresses visited by the website every day, we can use the collection to store the IP addresses of each visitor. Through the nature of the collection, multiple independent IP addresses can be obtained, and then the number of independent IP addresses can be obtained by calling the SCARD command. However, using strings to store each IP address needs a lot of memory
Redis added HyperLogLog in version 2.8.9 to better address issues like independent IP address computation
introduce
- HyperLogLog can accept multiple elements as inputs and give an estimate of the cardinality of the input elements:
• Cardinality: The number of different elements in a set. For example {‘apple’, ‘banana’, ‘cherry’, ‘banana’, ‘apple’} has base 3. • Estimated value: The cardinality given by the algorithm is not exact. It may be slightly more or less than the actual cardinality, but it will be controlled within the range of the synthesis.
- The advantage of HyperLogLog is that even if the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small
- In Redis, each HyperLogLog key costs only 12 KB of memory to calculate the cardinality of close to 2^64 different elements. This is in stark contrast to collections where the more elements you have, the more memory you consume when calculating cardinality
- Because HyperLogLog only calculates cardinality from the input elements and does not store the input elements themselves, HyperLogLog cannot return individual elements of the input, as collections do
Basic operation
-
PFADD key element [element …] To add any number of elements to a specified HyperLogLog the command may modify the HyperLogLog to reflect the new cardinality estimate. If the HyperLogLog cardinality estimate has changed since the command was executed, the command returns 1, otherwise it returns 0
-
PFCOUNT key [key …] When only one HyperLogLog is given, the command returns the cardinality estimate of the given HyperLogLog. When multiple HyperLogLog are given, the command first computs the union of the given HyperLogLog to obtain a combined HyperLogLog. Then return the cardinality estimate of the merged HyperLogLog as the result of the command (the merged HyperLogLog will not be stored and will be deleted after use).
-
PFMERGE destkey sourcekey [sourcekey …] Multiple HyperLogLog are merged into one HyperLogLog, and the cardinality estimate of the combined HyperLogLog is calculated by union of all the given HyperLogLog