In what is a Bloom Filter? In this paper, the false positive probabilistic (FPP) rate was mentioned several times.

So how much is the misidentification rate and how should it be calculated?

Assume that the Bloom filter is m bits in size, stores N elements, and uses k hash functions to calculate where the elements are stored.

  • If 1 element is added, then the probability that any bit is 1 is:1/m, the probability that any bit is 0:1-1/m;
  • The probability that any bit will be zero after k hashes after adding 1 element:(1-1/m)^k, the probability that any bit is 1:1-(1-1/m)^k;
  • The probability that any bit is zero if n elements are added:(1-1/m)^kn, the probability that any bit is 1:1-(1-1/m)^kn;
  • If a new element is added to a Bloom filter that already has n elements, then the probability that any bit is already 1 is the same as above, and the probability is:1-(1-1/m)^kn. Therefore, the probability that all k bits are 1 is:(1-(1-1/m)^kn)^k, which is the false identification rate of newly inserted elements.

When n is large, (1-(1-1/m)kn)kIt’s approximately :(1-e)-kn/m)k

Where e is a mathematical constant and the base of the natural logarithm function. Sometimes called Euler’s number, after the Swiss mathematician Euler. It is an infinite non-repeating decimal with a value of approximately 2.71828182846.

Assuming that the Bloom filter size m is 16 bits, k is 8, and storage element n is 1, the probability of false identification (false positive) is 5 in 10,000 (5/10,000).

At this point, m/n=16, which actually means that one element uses 16 bits of space to store it.

Therefore, when K is the same and M /n is 16/1, 32/2 and 64/4, the false recognition rate is the same.

Plug the numbers into the formula and calculate where the probability of 5 in 10,000 comes from:

Kn =8*1=8 KN /m=8/16=1/2 the -1 power of e is converted to the reciprocal: 1/ E =0.36787944117 and then the 1/2 power of 1/ E is calculated, that is, take the square root of 0.36787944117 and get: 0.60653065971144443045693327628786 to calculate the values in brackets: 1-0.60653065971144443045693327628786 = 0.39346934028855556954306672371214 and then calculate k power: 0.39346934028855556954306672371214 the eighth = 5.7449622219356465294987619912758 e-4Copy the code

Calculated recognition of 5.7449622219356465294987619912758 e-4, namely five over ten thousand.

The following is from pages.cs.wisc.edu/~cao/papers… It is very convenient to query the false recognition rate when m, n and K take different values respectively.

m/
nand
kError recognition rate of bloom filter when taking different values.
m/n k k= 1 k= 2 k= 3 k= 4 k= 5 k= 6 k= 7 k= 8
2 1.39 0.393 0.400            
3 2.08 0.283 0.237 0.253          
4 2.77 0.221 0.155 0.147 0.160        
5 3.46 0.181 0.109 0.092 0.092 0.101      
6 4.16 0.154 0.0804 0.0609 0.0561 0.0578 0.0638    
7 4.85 0.133 0.0618 0.0423 0.0359 0.0347 0.0364    
8 5.55 0.118 0.0489 0.0306 0.024 0.0217 0.0216 0.0229  
9 6.24 0.105 0.0397 0.0228 0.0166 0.0141 0.0133 0.0135 0.0145
10 6.93 0.0952 0.0329 0.0174 0.0118 0.00943 0.00844 0.00819 0.00846
11 7.62 0.0869 0.0276 0.0136 0.00864 0.0065 0.00552 0.00513 0.00509
12 8.32 0.08 0.0236 0.0108 0.00646 0.00459 0.00371 0.00329 0.00314
13 9.01 0.074 0.0203 0.00875 0.00492 0.00332 0.00255 0.00217 0.00199
14 9.7 0.0689 0.0177 0.00718 0.00381 0.00244 0.00179 0.00146 0.00129
15 10.4 0.0645 0.0156 0.00596 0.003 0.00183 0.00128 0.001 0.000852
16 11.1 0.0606 0.0138 0.005 0.00239 0.00139 0.000935 0.000702 0.000574
17 11.8 0.0571 0.0123 0.00423 0.00193 0.00107 0.000692 0.000499 0.000394
18 12.5 0.054 0.0111 0.00362 0.00158 0.000839 0.000519 0.00036 0.000275
19 13.2 0.0513 0.00998 0.00312 0.0013 0.000663 0.000394 0.000264 0.000194
20 13.9 0.0488 0.00906 0.0027 0.00108 0.00053 0.000303 0.000196 0.00014
21 14.6 0.0465 0.00825 0.00236 0.000905 0.000427 0.000236 0.000147 0.000101
22 15.2 0.0444 0.00755 0.00207 0.000764 0.000347 0.000185 0.000112 7.46 e-05
23 15.9 0.0425 0.00694 0.00183 0.000649 0.000285 0.000147 8.56 e-05 5.55 e-05
24 16.6 0.0408 0.00639 0.00162 0.000555 0.000235 0.000117 6.63 e-05 4.17 e-05
25 17.3 0.0392 0.00591 0.00145 0.000478 0.000196 9.44 e-05 5.18 e-05 3.16 e-05
26 18 0.0377 0.00548 0.00129 0.000413 0.000164 7.66 e-05 4.08 e-05 2.42 e-05
27 18.7 0.0364 0.0051 0.00116 0.000359 0.000138 6.26 e-05 3.24 e-05 1.87 e-05
28 19.4 0.0351 0.00475 0.00105 0.000314 0.000117 5.15 e-05 2.59 e-05 1.46 e-05
29 20.1 0.0339 0.00444 0.000949 0.000276 9.96 e-05 4.26 e-05 2.09 e-05 1.14 e-05
30 20.8 0.0328 0.00416 0.000862 0.000243 8.53 e-05 3.55 e-05 1.69 e-05 9.01 e-06
31 21.5 0.0317 0.0039 0.000785 0.000215 7.33 e-05 2.97 e-05 1.38 e-05 7.16 e-06
32 22.2 0.0308 0.00367 0.000717 0.000191 6.33 e-05 2.5 e-05 1.13 e-05 5.73 e-06

reference

The Beauty of Mathematics

Pages.cs.wisc.edu/~cao/papers…

Zh.wikipedia.org/zh-cn/E_ (% E…

En.wikipedia.org/wiki/Bloom_…

About me

Public number: Binary road

Tutorial: 996 geek.com

Blog: binarylife. Icu