In what is a Bloom Filter? In this paper, the false positive probabilistic (FPP) rate was mentioned several times.
So how much is the misidentification rate and how should it be calculated?
Assume that the Bloom filter is m bits in size, stores N elements, and uses k hash functions to calculate where the elements are stored.
- If 1 element is added, then the probability that any bit is 1 is:
1/m
, the probability that any bit is 0:1-1/m
; - The probability that any bit will be zero after k hashes after adding 1 element:
(1-1/m)^k
, the probability that any bit is 1:1-(1-1/m)^k
; - The probability that any bit is zero if n elements are added:
(1-1/m)^kn
, the probability that any bit is 1:1-(1-1/m)^kn
; - If a new element is added to a Bloom filter that already has n elements, then the probability that any bit is already 1 is the same as above, and the probability is:
1-(1-1/m)^kn
. Therefore, the probability that all k bits are 1 is:(1-(1-1/m)^kn)^k
, which is the false identification rate of newly inserted elements.
When n is large, (1-(1-1/m)kn)kIt’s approximately :(1-e)-kn/m)k
Where e is a mathematical constant and the base of the natural logarithm function. Sometimes called Euler’s number, after the Swiss mathematician Euler. It is an infinite non-repeating decimal with a value of approximately 2.71828182846.
Assuming that the Bloom filter size m is 16 bits, k is 8, and storage element n is 1, the probability of false identification (false positive) is 5 in 10,000 (5/10,000).
At this point, m/n=16, which actually means that one element uses 16 bits of space to store it.
Therefore, when K is the same and M /n is 16/1, 32/2 and 64/4, the false recognition rate is the same.
Plug the numbers into the formula and calculate where the probability of 5 in 10,000 comes from:
Kn =8*1=8 KN /m=8/16=1/2 the -1 power of e is converted to the reciprocal: 1/ E =0.36787944117 and then the 1/2 power of 1/ E is calculated, that is, take the square root of 0.36787944117 and get: 0.60653065971144443045693327628786 to calculate the values in brackets: 1-0.60653065971144443045693327628786 = 0.39346934028855556954306672371214 and then calculate k power: 0.39346934028855556954306672371214 the eighth = 5.7449622219356465294987619912758 e-4Copy the code
Calculated recognition of 5.7449622219356465294987619912758 e-4, namely five over ten thousand.
The following is from pages.cs.wisc.edu/~cao/papers… It is very convenient to query the false recognition rate when m, n and K take different values respectively.
m/n | k | k= 1 | k= 2 | k= 3 | k= 4 | k= 5 | k= 6 | k= 7 | k= 8 |
2 | 1.39 | 0.393 | 0.400 | ||||||
3 | 2.08 | 0.283 | 0.237 | 0.253 | |||||
4 | 2.77 | 0.221 | 0.155 | 0.147 | 0.160 | ||||
5 | 3.46 | 0.181 | 0.109 | 0.092 | 0.092 | 0.101 | |||
6 | 4.16 | 0.154 | 0.0804 | 0.0609 | 0.0561 | 0.0578 | 0.0638 | ||
7 | 4.85 | 0.133 | 0.0618 | 0.0423 | 0.0359 | 0.0347 | 0.0364 | ||
8 | 5.55 | 0.118 | 0.0489 | 0.0306 | 0.024 | 0.0217 | 0.0216 | 0.0229 | |
9 | 6.24 | 0.105 | 0.0397 | 0.0228 | 0.0166 | 0.0141 | 0.0133 | 0.0135 | 0.0145 |
10 | 6.93 | 0.0952 | 0.0329 | 0.0174 | 0.0118 | 0.00943 | 0.00844 | 0.00819 | 0.00846 |
11 | 7.62 | 0.0869 | 0.0276 | 0.0136 | 0.00864 | 0.0065 | 0.00552 | 0.00513 | 0.00509 |
12 | 8.32 | 0.08 | 0.0236 | 0.0108 | 0.00646 | 0.00459 | 0.00371 | 0.00329 | 0.00314 |
13 | 9.01 | 0.074 | 0.0203 | 0.00875 | 0.00492 | 0.00332 | 0.00255 | 0.00217 | 0.00199 |
14 | 9.7 | 0.0689 | 0.0177 | 0.00718 | 0.00381 | 0.00244 | 0.00179 | 0.00146 | 0.00129 |
15 | 10.4 | 0.0645 | 0.0156 | 0.00596 | 0.003 | 0.00183 | 0.00128 | 0.001 | 0.000852 |
16 | 11.1 | 0.0606 | 0.0138 | 0.005 | 0.00239 | 0.00139 | 0.000935 | 0.000702 | 0.000574 |
17 | 11.8 | 0.0571 | 0.0123 | 0.00423 | 0.00193 | 0.00107 | 0.000692 | 0.000499 | 0.000394 |
18 | 12.5 | 0.054 | 0.0111 | 0.00362 | 0.00158 | 0.000839 | 0.000519 | 0.00036 | 0.000275 |
19 | 13.2 | 0.0513 | 0.00998 | 0.00312 | 0.0013 | 0.000663 | 0.000394 | 0.000264 | 0.000194 |
20 | 13.9 | 0.0488 | 0.00906 | 0.0027 | 0.00108 | 0.00053 | 0.000303 | 0.000196 | 0.00014 |
21 | 14.6 | 0.0465 | 0.00825 | 0.00236 | 0.000905 | 0.000427 | 0.000236 | 0.000147 | 0.000101 |
22 | 15.2 | 0.0444 | 0.00755 | 0.00207 | 0.000764 | 0.000347 | 0.000185 | 0.000112 | 7.46 e-05 |
23 | 15.9 | 0.0425 | 0.00694 | 0.00183 | 0.000649 | 0.000285 | 0.000147 | 8.56 e-05 | 5.55 e-05 |
24 | 16.6 | 0.0408 | 0.00639 | 0.00162 | 0.000555 | 0.000235 | 0.000117 | 6.63 e-05 | 4.17 e-05 |
25 | 17.3 | 0.0392 | 0.00591 | 0.00145 | 0.000478 | 0.000196 | 9.44 e-05 | 5.18 e-05 | 3.16 e-05 |
26 | 18 | 0.0377 | 0.00548 | 0.00129 | 0.000413 | 0.000164 | 7.66 e-05 | 4.08 e-05 | 2.42 e-05 |
27 | 18.7 | 0.0364 | 0.0051 | 0.00116 | 0.000359 | 0.000138 | 6.26 e-05 | 3.24 e-05 | 1.87 e-05 |
28 | 19.4 | 0.0351 | 0.00475 | 0.00105 | 0.000314 | 0.000117 | 5.15 e-05 | 2.59 e-05 | 1.46 e-05 |
29 | 20.1 | 0.0339 | 0.00444 | 0.000949 | 0.000276 | 9.96 e-05 | 4.26 e-05 | 2.09 e-05 | 1.14 e-05 |
30 | 20.8 | 0.0328 | 0.00416 | 0.000862 | 0.000243 | 8.53 e-05 | 3.55 e-05 | 1.69 e-05 | 9.01 e-06 |
31 | 21.5 | 0.0317 | 0.0039 | 0.000785 | 0.000215 | 7.33 e-05 | 2.97 e-05 | 1.38 e-05 | 7.16 e-06 |
32 | 22.2 | 0.0308 | 0.00367 | 0.000717 | 0.000191 | 6.33 e-05 | 2.5 e-05 | 1.13 e-05 | 5.73 e-06 |
reference
The Beauty of Mathematics
Pages.cs.wisc.edu/~cao/papers…
Zh.wikipedia.org/zh-cn/E_ (% E…
En.wikipedia.org/wiki/Bloom_…
About me
Public number: Binary road
Tutorial: 996 geek.com
Blog: binarylife. Icu