Definition of PV and UV
- PV(
traffic
) : Page View, userEvery time
The visit is counted once, the pageviews
orClick on the quantity
- UV(
Unique visitors
) : Unique Visitor, same client within a dayWill only
beCalculate a
Basic concepts of radix counting
- Cardinality counting is used to count the number of unique elements ina set, such as the NUMBER of UVs of a website or the number of keywords a user searches for a website.
- The simplest way to implement cardinality counting is to record all the non-repeating sets of elements in a collection
- As the amount of statistical data increases, the corresponding storage memory increases linearly
- As the set gets larger, the cost of determining whether it contains the newly added element becomes larger
- In fact, I haven’t found a better one yet
Big data
Efficient algorithms for calculating cardinality accurately in scenarios, so use without seeking absolute accuracyProbability algorithm
Is a good solution. Probability algorithm
Instead of storing the data set itself directly, the cardinality value can be estimated by a certain probability and statistics method, which can greatly save memory and ensure the error control within a certain range- Note:
HyperLogLog uses a probabilistic algorithmic statistical model, which allows some error
.
Why is the size of a Hyperloglog key fixed at 12KB?
- Hyperloglog key structure composition:
struct hllhdr {
char magic[4]; // Fix 'HYLL' to identify the Hyperloglog key
uint8_t encoding; // Coding mode, which is concise and sparse
uint8_t notused[3]; // Unused field, save for later use
uint8_t card[8]; // Cardinality cache, which stores the cardinality of the last calculation
uint8_t registers[]; // Number of buckets, used to store data, Redis size is 16384
};
Copy the code
- In the Hyperloglog structure of Redis, the card array is 64 bits, which can theoretically store nearly 2^64 values.
- Redis specifies that there are 2^14 buckets, that is, 16,384 buckets. The data in each bucket is stored in 6 bits. Redis stores the value of the position where 1 appears for the first time.
- Registers (16384*6+7) /8 = 12288 bytes, plus the HLL header (16 bytes), the total size is 12304 bytes, i.e., a hyperloglog key occupies 12KB. You can calculate the cardinality of up to 2^64 different elements.
HyperLogLog command
- HyperLogLog supports only three commands: PFADD, PFCOUNT, and PFMERGE
PFADD – add
- Add the element to the HyperLogLog data structure, and the command returns 1 if the cardinality estimate for the HyperLogLog has changed since the command was executed, or 0 otherwise.
PFCOUNT- Statistical base
- Returns the cardinality of the given HyperLogLog
estimates
.
127.0.0.1:6379> pfadd uv 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4
(integer) 1
127.0.0.1:6379> pfadd uv 127.0.0.1
(integer) 0
127.0.0.1:6379> pfcount uv
(integer) 4
Copy the code
HLL PFMERGE – merger
- Combining multiple HyperLogLog into one HyperLogLog, the cardinality of the merged HyperLogLog is obtained by combining all HyperLogLog.
127.0.0.1:6379> pfadd uv1 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4
(integer) 1
127.0.0.1:6379> pfcount uv1
(integer) 4
127.0.0.1:6379> pfadd uv2 127.0.0.1 127.0.1.2 127.0.1.3 127.0.1.4
(integer) 1
127.0.0.1:6379> pfadd uv3 127.0.0.1 127.0.2.2 127.0.2.3 127.0.2.4
(integer) 1
127.0.0.1:6379> pfmerge newuv uv1 uv2 uv3
OK
127.0.0.1:6379> pfcount newuv
(integer) 10
Copy the code
- This command is very important, for example you can count
A week
或A month
Uv can be easily implemented using this command.
Case combat: UV calculation based on Redis
Step 1: Simulate UV access
@Service
@Slf4j
public class TaskService {
@Autowired
private RedisTemplate redisTemplate;
/** * simulates UV access */
@PostConstruct
public void init(a){
log.info("Simulated UV access..........");
new Thread(()->this.refreshData()).start();
}
/** * Refresh the statistics for the day */
public void refreshDay(a){
Random rand = new Random();
String ip=rand.nextInt(999) +"."+
rand.nextInt(999) +"."+
rand.nextInt(999) +"."+
rand.nextInt(999);
/ / pfadd redis command
long n=this.redisTemplate.opsForHyperLogLog().add(Constants.PV_KEY,ip);
log.debug("ip={} , returen={}",ip,n);
}
/** * Analog UV statistics by day */
public void refreshData(a){
while (true) {// Refresh the statistics for the day
this.refreshDay();
//TODO In distributed systems, it is recommended to use xxlJob to implement timing
try {
Thread.sleep(5000);
} catch(InterruptedException e) { e.printStackTrace(); }}}}Copy the code
Step 2: implement UV statistics function
@RestController
@Slf4j
public class Controller {
@Autowired
private RedisTemplate redisTemplate;
@GetMapping(value = "/uv")
public long uv(a) {
/ / pfcount redis command
long size=this.redisTemplate.opsForHyperLogLog().size(Constants.PV_KEY);
returnsize; }}Copy the code
Redis distributed cache series
Redis Distributed Cache (24) – micro – relational computing solutions