What is a Bloom filter

Use this in the JAVA project, today let’s talk about it!

The name is like every law, you ask why it’s Called Newton’s law, because Newton invented it or discovered it.

What can he do? It maps a binary vector to a function. Bloom filters can be used to detect the presence of elements in a set or for quick retrieval.

Disadvantages: There are certain deletion issues and error recognition rates

Advantages: Query time and space are far more than the ordinary algorithm

When Item or element is added, create a hash function and a KEY to form a mapping, set the data =1, as long as the retrieval judgment =1 to know whether the data exists, with this method, when the query is found to have 0, it will prove that it does not exist. So on the other hand if it’s one that means that the element is likely to exist,

Notice why it is said that it is likely to exist, because it has a certain identification error, but this error can be ignored in the actual production process, after all, the advantages outweigh the disadvantages.

Look at the text dizzy, motionless on the drawing, to see should understand a lot of.

Bloom filter [global aerial view]

People speaking

What exactly does a Bloom filter do?

Special ID not mention ha, database ID is basically self-increasing right! We pass the ID back end to the DB query, which makes perfect sense.

But what if we use negative numbers? One or two doesn’t matter. What if there are thousands? Basically the database will be a lot of pressure to bear, server configuration, not to mention, slow down the system running speed and even downtime are possible, so is not a bit of bloom filter show incisively and vividly. “Dog”

So hanging, also has a price, because it is also uncertain, there is a certain degree of error of judgment!

Q: Why the miscarriage of justice?

A: The search key is not in the container, but the hash key is always 1. If the Bloom filter has a blacklist, then create a whitelist.

Q: Why isn’t it easy to delete?

Counting Bloom Filter Key =1; Counting Bloom Filter Key =1;

How to achieve 1: estimated quantity n and expected error rate FPP

2: size of hash function and bit set

Bit Set Size Size

Function hash select, estimate n and bit array length m to get hash function Key

How does it work? Maven project added com.google.guava Guava 23.0

A piece of test code I wrote

/ * *

  • Bloom filter – used for redis cache penetration cases

*/ public class TestBloomFilterByDZZ {

private static int total = 19999;
private static BloomFilter<Integer> bfilter = BloomFilter.create(Funnels.integerFunnel(), total);
Copy the code

Public static void main(String[] args) {for (int I = 0; i < total; i++) { bfilter.put(i); }

For (int I = 0; i < total; i++) { if (! Bfilter.mightcontain (I)) {system.out.println ("有 关 闭 关 闭 中 文 大 学 学 报 ") {System. ); Int count = 0; for (int i = total; i < total + 10000; i++) { if (bfilter.mightContain(i)) { count++; }} system.out.println (" + count "); }Copy the code

} Applicable service scenario 1: If a large amount of data is imported, the data can be used as a name or unique item for checking. If the data exists, the database is imported

2: filtering spam, which is a calculation you can combine with your own business to understand.

Picture daily attention, quality of a key three. Write at least 3 original articles a week to leave something behind when you’re being tortured by business. Recently like a sentence “Dao no art, art can be sought. If there is art, there is no tao.