Definition of PV and UV

  • PV(traffic) : Page View, userEvery timeThe visit is counted once, the pageviewsorClick on the quantity
  • UV(Unique visitors) : Unique Visitor, same client within a dayWill onlybeCalculate a

Basic concepts of radix counting

  • Cardinality counting is used to count the number of unique elements ina set, such as the NUMBER of UVs of a website or the number of keywords a user searches for a website.
  • The simplest way to implement cardinality counting is to record all the non-repeating sets of elements in a collection
    • As the amount of statistical data increases, the corresponding storage memory increases linearly
    • As the set gets larger, the cost of determining whether it contains the newly added element becomes larger
  • In fact, I haven’t found a better one yetBig dataEfficient algorithms for calculating cardinality accurately in scenarios, so use without seeking absolute accuracyProbability algorithmIs a good solution.
  • Probability algorithmInstead of storing the data set itself directly, the cardinality value can be estimated by a certain probability and statistics method, which can greatly save memory and ensure the error control within a certain range
  • Note:HyperLogLog uses a probabilistic algorithmic statistical model, which allows some error.

Why is the size of a Hyperloglog key fixed at 12KB?

  • Hyperloglog key structure composition:
struct hllhdr {
    char magic[4];      // Fix 'HYLL' to identify the Hyperloglog key
    uint8_t encoding;   // Coding mode, which is concise and sparse
    uint8_t notused[3]; // Unused field, save for later use
    uint8_t card[8];    // Cardinality cache, which stores the cardinality of the last calculation
    uint8_t registers[]; // Number of buckets, used to store data, Redis size is 16384
};
Copy the code
  • In the Hyperloglog structure of Redis, the card array is 64 bits, which can theoretically store nearly 2^64 values.
  • Redis specifies that there are 2^14 buckets, that is, 16,384 buckets. The data in each bucket is stored in 6 bits. Redis stores the value of the position where 1 appears for the first time.
  • Registers (16384*6+7) /8 = 12288 bytes, plus the HLL header (16 bytes), the total size is 12304 bytes, i.e., a hyperloglog key occupies 12KB. You can calculate the cardinality of up to 2^64 different elements.

HyperLogLog command

  • HyperLogLog supports only three commands: PFADD, PFCOUNT, and PFMERGE

PFADD – add

  • Add the element to the HyperLogLog data structure, and the command returns 1 if the cardinality estimate for the HyperLogLog has changed since the command was executed, or 0 otherwise.

PFCOUNT- Statistical base

  • Returns the cardinality of the given HyperLogLogestimates.
127.0.0.1:6379> pfadd uv 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4
(integer) 1
127.0.0.1:6379> pfadd uv 127.0.0.1
(integer) 0
127.0.0.1:6379> pfcount uv
(integer) 4
Copy the code

HLL PFMERGE – merger

  • Combining multiple HyperLogLog into one HyperLogLog, the cardinality of the merged HyperLogLog is obtained by combining all HyperLogLog.
127.0.0.1:6379> pfadd uv1 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4
(integer) 1
127.0.0.1:6379> pfcount uv1
(integer) 4
127.0.0.1:6379> pfadd uv2 127.0.0.1 127.0.1.2  127.0.1.3 127.0.1.4
(integer) 1
127.0.0.1:6379> pfadd uv3 127.0.0.1  127.0.2.2 127.0.2.3 127.0.2.4
(integer) 1
127.0.0.1:6379> pfmerge newuv uv1 uv2 uv3
OK
127.0.0.1:6379> pfcount newuv
(integer) 10
Copy the code
  • This command is very important, for example you can countA weekA monthUv can be easily implemented using this command.

Case combat: UV calculation based on Redis

Step 1: Simulate UV access

@Service
@Slf4j
public class TaskService {

    @Autowired
    private RedisTemplate redisTemplate;

    /** * simulates UV access */
    @PostConstruct
    public void init(a){
        log.info("Simulated UV access..........");
        new Thread(()->this.refreshData()).start();
    }

    /** * Refresh the statistics for the day */
    public void refreshDay(a){
        Random rand = new Random();
        String ip=rand.nextInt(999) +"."+
                rand.nextInt(999) +"."+
                rand.nextInt(999) +"."+
                rand.nextInt(999);
        / / pfadd redis command
        long n=this.redisTemplate.opsForHyperLogLog().add(Constants.PV_KEY,ip);
        log.debug("ip={} , returen={}",ip,n);
    }

    /** * Analog UV statistics by day */
    public void refreshData(a){
        while (true) {// Refresh the statistics for the day
            this.refreshDay();
            //TODO In distributed systems, it is recommended to use xxlJob to implement timing
            try {
                Thread.sleep(5000);
            } catch(InterruptedException e) { e.printStackTrace(); }}}}Copy the code

Step 2: implement UV statistics function

@RestController
@Slf4j
public class Controller {

    @Autowired
    private RedisTemplate redisTemplate;
    
    @GetMapping(value = "/uv")
    public long uv(a) {
        / / pfcount redis command
        long size=this.redisTemplate.opsForHyperLogLog().size(Constants.PV_KEY);
        returnsize; }}Copy the code

Redis distributed cache series

Redis Distributed Cache (24) – micro – relational computing solutions