In what is a Bloom Filter? In this paper, the false positive probabilistic (FPP) rate was mentioned several times.

So how much is the misidentification rate and how should it be calculated?

Assume that the Bloom filter is m bits in size, stores N elements, and uses k hash functions to calculate where the elements are stored.

If 1 element is added, then the probability that any bit is 1 is:1/m, the probability that any bit is 0:1-1/m;
The probability that any bit will be zero after k hashes after adding 1 element:(1-1/m)^k, the probability that any bit is 1:1-(1-1/m)^k;
The probability that any bit is zero if n elements are added:(1-1/m)^kn, the probability that any bit is 1:1-(1-1/m)^kn;
If a new element is added to a Bloom filter that already has n elements, then the probability that any bit is already 1 is the same as above, and the probability is:1-(1-1/m)^kn. Therefore, the probability that all k bits are 1 is:(1-(1-1/m)^kn)^k, which is the false identification rate of newly inserted elements.

When n is large, (1-(1-1/m)^kn)^kIt’s approximately :(1-e)^-kn/m)^k

Where e is a mathematical constant and the base of the natural logarithm function. Sometimes called Euler’s number, after the Swiss mathematician Euler. It is an infinite non-repeating decimal with a value of approximately 2.71828182846.

Assuming that the Bloom filter size m is 16 bits, k is 8, and storage element n is 1, the probability of false identification (false positive) is 5 in 10,000 (5/10,000).

At this point, m/n=16, which actually means that one element uses 16 bits of space to store it.

Therefore, when K is the same and M /n is 16/1, 32/2 and 64/4, the false recognition rate is the same.

Plug the numbers into the formula and calculate where the probability of 5 in 10,000 comes from:

Kn =8*1=8 KN /m=8/16=1/2 the -1 power of e is converted to the reciprocal: 1/ E =0.36787944117 and then the 1/2 power of 1/ E is calculated, that is, take the square root of 0.36787944117 and get: 0.60653065971144443045693327628786 to calculate the values in brackets: 1-0.60653065971144443045693327628786 = 0.39346934028855556954306672371214 and then calculate k power: 0.39346934028855556954306672371214 the eighth = 5.7449622219356465294987619912758 e-4Copy the code

Calculated recognition of 5.7449622219356465294987619912758 e-4, namely five over ten thousand.

The following is from pages.cs.wisc.edu/~cao/papers… It is very convenient to query the false recognition rate when m, n and K take different values respectively.

m/
nand
kError recognition rate of bloom filter when taking different values.
m/n	k	k= 1	k= 2	k= 3	k= 4	k= 5	k= 6	k= 7	k= 8
2	1.39	0.393	0.400
3	2.08	0.283	0.237	0.253
4	2.77	0.221	0.155	0.147	0.160
5	3.46	0.181	0.109	0.092	0.092	0.101
6	4.16	0.154	0.0804	0.0609	0.0561	0.0578	0.0638
7	4.85	0.133	0.0618	0.0423	0.0359	0.0347	0.0364
8	5.55	0.118	0.0489	0.0306	0.024	0.0217	0.0216	0.0229
9	6.24	0.105	0.0397	0.0228	0.0166	0.0141	0.0133	0.0135	0.0145
10	6.93	0.0952	0.0329	0.0174	0.0118	0.00943	0.00844	0.00819	0.00846
11	7.62	0.0869	0.0276	0.0136	0.00864	0.0065	0.00552	0.00513	0.00509
12	8.32	0.08	0.0236	0.0108	0.00646	0.00459	0.00371	0.00329	0.00314
13	9.01	0.074	0.0203	0.00875	0.00492	0.00332	0.00255	0.00217	0.00199
14	9.7	0.0689	0.0177	0.00718	0.00381	0.00244	0.00179	0.00146	0.00129
15	10.4	0.0645	0.0156	0.00596	0.003	0.00183	0.00128	0.001	0.000852
16	11.1	0.0606	0.0138	0.005	0.00239	0.00139	0.000935	0.000702	0.000574
17	11.8	0.0571	0.0123	0.00423	0.00193	0.00107	0.000692	0.000499	0.000394
18	12.5	0.054	0.0111	0.00362	0.00158	0.000839	0.000519	0.00036	0.000275
19	13.2	0.0513	0.00998	0.00312	0.0013	0.000663	0.000394	0.000264	0.000194
20	13.9	0.0488	0.00906	0.0027	0.00108	0.00053	0.000303	0.000196	0.00014
21	14.6	0.0465	0.00825	0.00236	0.000905	0.000427	0.000236	0.000147	0.000101
22	15.2	0.0444	0.00755	0.00207	0.000764	0.000347	0.000185	0.000112	7.46 e-05
23	15.9	0.0425	0.00694	0.00183	0.000649	0.000285	0.000147	8.56 e-05	5.55 e-05
24	16.6	0.0408	0.00639	0.00162	0.000555	0.000235	0.000117	6.63 e-05	4.17 e-05
25	17.3	0.0392	0.00591	0.00145	0.000478	0.000196	9.44 e-05	5.18 e-05	3.16 e-05
26	18	0.0377	0.00548	0.00129	0.000413	0.000164	7.66 e-05	4.08 e-05	2.42 e-05
27	18.7	0.0364	0.0051	0.00116	0.000359	0.000138	6.26 e-05	3.24 e-05	1.87 e-05
28	19.4	0.0351	0.00475	0.00105	0.000314	0.000117	5.15 e-05	2.59 e-05	1.46 e-05
29	20.1	0.0339	0.00444	0.000949	0.000276	9.96 e-05	4.26 e-05	2.09 e-05	1.14 e-05
30	20.8	0.0328	0.00416	0.000862	0.000243	8.53 e-05	3.55 e-05	1.69 e-05	9.01 e-06
31	21.5	0.0317	0.0039	0.000785	0.000215	7.33 e-05	2.97 e-05	1.38 e-05	7.16 e-06
32	22.2	0.0308	0.00367	0.000717	0.000191	6.33 e-05	2.5 e-05	1.13 e-05	5.73 e-06

reference

The Beauty of Mathematics

Pages.cs.wisc.edu/~cao/papers…

Zh.wikipedia.org/zh-cn/E_ (% E…

En.wikipedia.org/wiki/Bloom_…

About me

Public number: Binary road

Tutorial: 996 geek.com

Blog: binarylife. Icu

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Derivation of FPP formula for false identification rate of Bloom filter

reference

About me

Derivation of FPP formula for false identification rate of Bloom filter

reference

About me

Related Posts

Java8 makes reading and writing Excel easier and more efficient

Chapter 1: Monitoring based on JDK command line tools

[Linux][go +] Set SSH security. Log in to the server using the key or password-free or pull the Git code