Redis advanced features HyperLoglog

Redis advanced features HyperLoglog

Hyperloglog algorithm, using very little space, to achieve relatively large data statistics; For example, in the process of introducing bitmap, we talked about daily activity statistics. When the amount of data reaches millions, the best storage method is Hyperloglog. This paper will introduce the basic principle of Hyperloglog and the use posture in Redis

I. Basic use

1. The configuration

We used SpringBoot 2.2.1.RELEASE to set up the project environment and added redis dependencies directly to pom.xml

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
Copy the code

If our Redis is the default configuration, we can add no additional configuration. It can also be directly in the application.yml configuration, as follows

spring:
  redis:
    host: 127.0. 01.
    port: 6379
    password:
Copy the code

2. Use posture

Let’s look at the use of posture, the principle is explained later

In Redis, hyperlolog is very simple to use. Generally, there are two operation commands: add pfadd + count pfcount; There is also a less common merge

a. add

Add a record

public boolean add(String key, String obj) {
    // pfadd key obj
    return stringRedisTemplate.opsForHyperLogLog().add(key, obj) > 0;
}
Copy the code

b. pfcount

Imprecise counting statistics

public long count(String key) {
    // pfcount Indicates the count of the inexact statistics key
    return stringRedisTemplate.opsForHyperLogLog().size(key);
}
Copy the code

a. merge

Merge multiple Hyperloglog into a new Hyperloglog; It doesn’t take much to feel the scene

public boolean merge(String out, String... key) {
    // pfmerge out key1 key2 --> merge key1 key2 into a new hyperloglog out
    return stringRedisTemplate.opsForHyperLogLog().union(out, key) > 0;
}
Copy the code

3. Principle description

I won’t go into details about the principle of HyperLogLog here. To be honest, I don’t quite understand the algorithm and harmonic averaging formula myself; Here is my personal simple understanding

HyperLogLog in Redis is divided into 2^14= 16,384 buckets, each of which occupies 6 bits

A piece of data, before stuffing into HyperLogLog, is hashed to produce a 64-bit binary

Take the lower 14 bits to locate the index of the bucket
The top 50 digits, counting from the lowest to the highest, find the first position n where 1 appears
- If the median value of the bucket is greater than n, the bucket is discarded
- Otherwise, set the value in the bucket to n

So how do we count statistics?

Take the values in all the buckets and plug them into the formula below

Where does this formula come from?

Before you see an article, it feels good, interested in principle, to: www.jianshu.com/p/55defda6d…

4. Application scenarios

Hyperloglog is usually used for inexact counting statistics. In the case of daily active statistics, bitmap was used for data statistics at that time. However, it is not applicable when userids are not evenly distributed and small ones are extremely small and large ones are extremely large

Hyperloglog has a huge advantage in the case of a large amount of data. The storage space it takes up is fixed at 2^14.

The design idea of daily activity statistics using HyperLogLog is relatively simple

A key is generated every day
After a user access, executepfadd key userId
Total statistics:pfcount key

II. The other

0. Project

Series of blog posts

【DB series 】Redis advanced features publish subscription
【DB series 】Redis advanced features Bitmap posture and application scenarios
【DB series 】Redis pipeline Pipelined using posture
DB series Redis cluster environment configuration
【DB series 】 Build a simple site statistics service with Redis (Application)
【DB series 】 Using Redis to achieve the ranking function (Application)
Redis ZSet data structure using posture
【DB series 】Redis Set data structure using posture
Redis Hash data structure using posture
Redis List data structures use postures
【DB series 】Redis String data structure read and write
【DB series 】Redis Jedis configuration
DB series basic configuration of Redis

Engineering source

Project: github.com/liuyueyi/sp…
Project source: github.com/liuyueyi/sp…

1. An ashy Blog

As far as the letter is not as good, the above content is purely one’s opinion, due to the limited personal ability, it is inevitable that there are omissions and mistakes, if you find bugs or have better suggestions, welcome criticism and correction, don’t hesitate to appreciate

Below a gray personal blog, record all the study and work of the blog, welcome everyone to go to stroll

A grey Blog Personal Blog blog.hhui.top
A Grey Blog-Spring feature Blog Spring.hhui.top