1. Definition:
A Bloom filter is a set-like data structure that is spatially more efficient than traditional set-like data structures, such as hash tables or trees. A Bloom filter can be 100 percent sure that something is not in a collection, but it cannot be 100 percent sure that something is in a collection.
2. Type:
- Local memory based Bloom filter
- File-based Bloom filter
- Redis based Bloom filter (distributed available) (plug-in (not considered), redis comes with its own BIT)
3. Application Scenarios
Prevent cache penetration
4. Implementation:
|
5. Pressure test report:
Configuration: memory 16GB Tool: JMeter
The test uses 100000 threads
Test results:
The query does not have data
QPS | Error rate | |
The use of Bloom | 13832.4 / SEC | 0% |
Do not use the Bloom | 2470/sec | 0% |
QPS(Bloom)/QPS(Bloom)≈5.60
Conclusion: Using bloom filters improves the response time for user input with no data returned, and also protects the database from cache penetration
6. Question:
- Problem setting the initial bitset size
|
- Hashing algorithm selection problem
Hashing algorithm collision rate (common)
100000 | 500000 | 1 million | 5 million | 10 million | 100 million | Average execution time of 10 million | The average execution time of 100 million | The average length of 100 million times | |
---|---|---|---|---|---|---|---|---|---|
MurmurHash | 0 | 0 | 0 | 0 | 0 | 0 | 0.0066868 | 0.01194736 | 19 |
CityHash | 0 | 0 | 0 | 0 | 0 | 0 | 0.0066179 | 0.01129171 | 19 |
crc64 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0064459 | 0.01242473 | 19 |
This implementation mainly uses Murmur3 algorithm
- Is it better to set it before or after caching
This parameter is selected based on service scenarios. To improve user access efficiency, add it after cache
- How to ensure the consistency of data in bloom filter and database data, and how to prevent leakage
Flush all existing data in the database into the Bloom filter
- How to synchronize new data to bloom filter
Use coroutines to synchronize newly added data to the Bloom filter