Redis advanced features HyperLoglog
Hyperloglog algorithm, using very little space, to achieve relatively large data statistics; For example, in the process of introducing bitmap, we talked about daily activity statistics. When the amount of data reaches millions, the best storage method is Hyperloglog. This paper will introduce the basic principle of Hyperloglog and the use posture in Redis
I. Basic use
1. The configuration
We used SpringBoot 2.2.1.RELEASE to set up the project environment and added redis dependencies directly to pom.xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
Copy the code
If our Redis is the default configuration, we can add no additional configuration. It can also be directly in the application.yml configuration, as follows
spring:
redis:
host: 127.0. 01.
port: 6379
password:
Copy the code
2. Use posture
Let’s look at the use of posture, the principle is explained later
In Redis, hyperlolog is very simple to use. Generally, there are two operation commands: add pfadd + count pfcount; There is also a less common merge
a. add
Add a record
public boolean add(String key, String obj) {
// pfadd key obj
return stringRedisTemplate.opsForHyperLogLog().add(key, obj) > 0;
}
Copy the code
b. pfcount
Imprecise counting statistics
public long count(String key) {
// pfcount Indicates the count of the inexact statistics key
return stringRedisTemplate.opsForHyperLogLog().size(key);
}
Copy the code
a. merge
Merge multiple Hyperloglog into a new Hyperloglog; It doesn’t take much to feel the scene
public boolean merge(String out, String... key) {
// pfmerge out key1 key2 --> merge key1 key2 into a new hyperloglog out
return stringRedisTemplate.opsForHyperLogLog().union(out, key) > 0;
}
Copy the code
3. Principle description
I won’t go into details about the principle of HyperLogLog here. To be honest, I don’t quite understand the algorithm and harmonic averaging formula myself; Here is my personal simple understanding
HyperLogLog in Redis is divided into 2^14= 16,384 buckets, each of which occupies 6 bits
A piece of data, before stuffing into HyperLogLog, is hashed to produce a 64-bit binary
- Take the lower 14 bits to locate the index of the bucket
- The top 50 digits, counting from the lowest to the highest, find the first position n where 1 appears
- If the median value of the bucket is greater than n, the bucket is discarded
- Otherwise, set the value in the bucket to n
So how do we count statistics?
- Take the values in all the buckets and plug them into the formula below
Where does this formula come from?
Before you see an article, it feels good, interested in principle, to: www.jianshu.com/p/55defda6d…
4. Application scenarios
Hyperloglog is usually used for inexact counting statistics. In the case of daily active statistics, bitmap was used for data statistics at that time. However, it is not applicable when userids are not evenly distributed and small ones are extremely small and large ones are extremely large
Hyperloglog has a huge advantage in the case of a large amount of data. The storage space it takes up is fixed at 2^14.
The design idea of daily activity statistics using HyperLogLog is relatively simple
- A key is generated every day
- After a user access, execute
pfadd key userId
- Total statistics:
pfcount key
II. The other
0. Project
Series of blog posts
- 【DB series 】Redis advanced features publish subscription
- 【DB series 】Redis advanced features Bitmap posture and application scenarios
- 【DB series 】Redis pipeline Pipelined using posture
- DB series Redis cluster environment configuration
- 【DB series 】 Build a simple site statistics service with Redis (Application)
- 【DB series 】 Using Redis to achieve the ranking function (Application)
- Redis ZSet data structure using posture
- 【DB series 】Redis Set data structure using posture
- Redis Hash data structure using posture
- Redis List data structures use postures
- 【DB series 】Redis String data structure read and write
- 【DB series 】Redis Jedis configuration
- DB series basic configuration of Redis
Engineering source
- Project: github.com/liuyueyi/sp…
- Project source: github.com/liuyueyi/sp…
1. An ashy Blog
As far as the letter is not as good, the above content is purely one’s opinion, due to the limited personal ability, it is inevitable that there are omissions and mistakes, if you find bugs or have better suggestions, welcome criticism and correction, don’t hesitate to appreciate
Below a gray personal blog, record all the study and work of the blog, welcome everyone to go to stroll
- A grey Blog Personal Blog blog.hhui.top
- A Grey Blog-Spring feature Blog Spring.hhui.top