Before we have talked about the principle of the Bloom filter [actual combat problem] – cache penetration of the Bloom filter (1), all understand that this is the operation, then generally we use the bloom filter, how to use it? If you do it yourself, how do you do it?
[TOC]
Bloom filter
Say the definition again:
The Bloom Filter, proposed by Burton Howard Bloom in 1970, is actually a long binary vector and a series of random hash mapping functions (in plain English, binary arrays store data features).
Take the following example: if there are three hash functions, the “old six” will be hashed separately by three hash functions, and mod the length of the bit array to three positions.
If you still don’t understand how this works, check out my last article.
Handwritten Bloom filter
So when we write a Bloom filter, we first need a bit array, and in Java we have a bit array wrapped, BitSet.
A brief introduction to bitsets, or bitmaps, which use compact storage space to represent large bits of data. When used, we can specify the size directly, which is equivalent to creating an array of bits of the specified size.
BitSet bits = new BitSet(size);
Copy the code
BitSet also provides a number of apis, including basic operations:
- Empties the bit array of data
- Flip a bit of data
- Sets the data for a certain bit
- Get data for a certain bit
- Get the current
bitSet
Number of bits
Here are the points to consider when writing a simple Bloom filter:
- The size of the bit array needs to be specified. Otherwise, the larger the bit array is,
hash
The less likely there is to be conflict. - multiple
hash
Function, we need to usehash
Array to store,hash
How do you set up the function? To avoid conflict, we should use several different primes as seeds. - Method: the main implementation of two methods, one to add elements to the Bloom filter, the other is to determine whether the bloom filter contains an element.
The following is a concrete implementation, which is only a simple simulation and cannot be used in production environment. The hash function is relatively simple, mainly using hash to perform xor for high and low bits, then multiply the seed, and then take the remainder of the size of the bit array:
import java.util.BitSet;
public class MyBloomFilter {
// Default size
private static final int DEFAULT_SIZE = Integer.MAX_VALUE;
// The smallest size
private static final int MIN_SIZE = 1000;
// The size is the default
private int SIZE = DEFAULT_SIZE;
// Seed of the hash function
private static final int[] HASH_SEEDS = new int[] {3.5.7.11.13.17.19.23.29.31};
// An array of bits, 0/1, representing features
private BitSet bitSet = null;
/ / the hash function
private HashFunction[] hashFunctions = new HashFunction[HASH_SEEDS.length];
// No parameter initialization
public MyBloomFilter(a) {
// Use the default size
init();
}
// Initialize with parameters
public MyBloomFilter(int size) {
// The size is initialized to be less than the minimum size
if (size >= MIN_SIZE) {
SIZE = size;
}
init();
}
private void init(a) {
// Initialize bit size
bitSet = new BitSet(SIZE);
// Initialize the hash function
for (int i = 0; i < HASH_SEEDS.length; i++) {
hashFunctions[i] = newHashFunction(SIZE, HASH_SEEDS[i]); }}// Add an element to an array
public void add(Object value) {
for (HashFunction f : hashFunctions) {
// Set the hash position to true
bitSet.set(f.hash(value), true); }}// Determine if the element's characteristics exist in the bitarray
public boolean contains(Object value) {
boolean result = true;
for (HashFunction f : hashFunctions) {
result = result && bitSet.get(f.hash(value));
// If one of the hash functions evaluates to false, it returns
if(! result) {returnresult; }}return result;
}
/ / the hash function
public static class HashFunction {
// Bit array size
private int size;
/ / hash seeds
private int seed;
public HashFunction(int size, int seed) {
this.size = size;
this.seed = seed;
}
/ / the hash function
public int hash(Object value) {
if (value == null) {
return 0;
} else {
/ / hash value
int hash1 = value.hashCode();
// High hash value
int hash2 = hash1 >>> 16;
// Merge hash values (equivalent to combining high and low features)
int combine = hash1 ^ hash1;
// Multiply the remainder
returnMath.abs(combine * seed) % size; }}}public static void main(String[] args) {
Integer num1 = new Integer(12321);
Integer num2 = new Integer(12345);
MyBloomFilter myBloomFilter =newMyBloomFilter(); System.out.println(myBloomFilter.contains(num1)); System.out.println(myBloomFilter.contains(num2)); myBloomFilter.add(num1); myBloomFilter.add(num2); System.out.println(myBloomFilter.contains(num1)); System.out.println(myBloomFilter.contains(num2)); }}Copy the code
Operating results, in line with expectations:
false
false
true
true
Copy the code
However, this approach does not support the expected error rate, but can specify the size of the bit array.
Of course, we can also provide the amount of data and the expected approximate error rate for initialization. The approximate initialization code is as follows:
// Initialize with parameters
public BloomFilter(int num,double rate) {
// Calculate the size of the bit array
this.size = (int) (-num * Math.log(rate) / Math.pow(Math.log(2), 2));
// Number of hsah functions
this.hashSize = (int) (this.size * Math.log(2) / num);
// Initializes the bit array
this.bitSet = new BitSet(size);
}
Copy the code
Redis implementation
Usually we can choose to use Redis features in bloom filter, why? Because Redis has bitset-like instructions, such as setting the value of the bit array:
setbit key offset value
Copy the code
The key is the key, offset is the offset, and value is either 1 or 0. For example, the following example sets key1’s position 7 to 1.
To obtain a digit, use the following command:
gitbit key offset
Copy the code
We can implement a good Bloom filter with the help of Redis, but we don’t need to write it ourselves. The Redisson client already has a good implementation. To build a project using Maven, you first need to guide the package to POM.xml:
<dependencies>
<dependency>
<groupId>org.redisson</groupId>
<artifactId>redisson</artifactId>
<version>3.11.2</version>
</dependency>
</dependencies>
Copy the code
The code is as follows, I use docker, remember to set the password when starting, change the password when running does not work:
docker run -d --name redis -p 6379:6379 redis --requirepass "password"
Copy the code
The code is as follows, first need to connect to Redis, and then create redission, use redission to create bloom filter, directly use it. (You can specify the expected number and expected miscarriage rate)
import org.redisson.Redisson;
import org.redisson.api.RBloomFilter;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;
public class BloomFilterTest {
public static void main(String[] args) {
Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379");
config.useSingleServer().setPassword("password");
// create a redis connection
RedissonClient redisson = Redisson.create(config);
RBloomFilter<String> bloomFilter = redisson.getBloomFilter("myBloomFilter");
// Initialize with an expected number of elements of 100000000 and an expected error rate of 4%
bloomFilter.tryInit(100000000.0.04);
// Insert the number 10086 into the Bloom filter
bloomFilter.add("12345");
System.out.println(bloomFilter.contains("123456"));//false
System.out.println(bloomFilter.contains("12345"));//true}}Copy the code
The results are as follows: It is worth noting that this is a single machineredis
If it isredis
Clusters do things a little differently.
Google GUAVA implementation
Bloem filters are also available in the Guava package provided by Google, introducing POM files:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>18.0</version>
</dependency>
Copy the code
The specific code to implement the call is as follows, and you can also specify the specific amount of storage and the expected error rate:
import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class GuavaBloomFilter {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = BloomFilter.create(
Funnels.stringFunnel(Charsets.UTF_8),1000000.0.04);
bloomFilter.put("Sam");
System.out.println(bloomFilter.mightContain("Jane"));
System.out.println(bloomFilter.mightContain("Sam")); }}Copy the code
The results are as follows and are in line with expectations
The above three are handwriting, redis, Guava practice bloom filter, just a simple usage, in fact, the implementation of Redis and Guava can also see, interested can understand, I first set a Flag.
About the author
Qin Huai, author of public number [Qin Huai Grocery store], the road of technology is not at that time, the mountain is high and the water is long, even if slow, and not stop. Personal Writing Direction: Java source code analysis, JDBC, Mybatis, Spring, Redis, distributed, sword Offer, LeetCode, etc., carefully write each article, do not like the title party, do not like the flowery, mostly write a series of articles, can not guarantee that I write are completely correct, But I guarantee that what I write is done through practice or research. We hope to correct any omissions or mistakes.
What did I write about 2020?
Open Source Programming Notes