Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
Redis is our daily work in the use of a lot of caching solutions, the use of caching, can improve the performance of our applications, while greatly reducing the pressure on the database. However, if used improperly, it can also cause many problems. The three classic problems include cache penetration, cache breakdown, and cache avalanche. Do you sound confused? It doesn’t matter. You’ll understand after reading this.
The cache to penetrate
Cache penetration is when a user looking for a piece of data looks up data that doesn’t exist. According to the cache design process, we first query the Redis cache and find that there is no such data, so we directly query the database and find that there is no such data, so the query result ends in failure.
When there are a large number of such requests or malicious use of non-existent data access attacks, a large number of requests will directly access the database, resulting in database stress or even direct breakdown. Take the e-commerce mall as an example. If a non-existent ID is used to query goods, each attack will access the database.
Here’s what to do:
1. Cache empty objects
Modify the database write back cache logic. For data that does not exist in the cache and does not exist in the database, we still cache it and set a cache expiration time.
As shown in the figure above, an empty object (key, NULL) is still cached with the queried key when the database fails. But there are still problems:
A. A null object is returned when the key value is searched in the cache. Note that this empty object may not be needed by the client, so it needs to process the empty result and then return it to client B, occupying a large amount of memory in Redis. Because empty objects can be cached, Redis will use a large amount of memory to store these empty keys. If the data of this key is stored in the database after writing to the cache, since the cache has not expired, the fetched value is still empty, so there may be a temporary data inconsistency problem
2, Bloom filter
A Bloom filter is a binary vector, or array of binary, or array of bits.
Because it’s a binary vector, each of its bits can only hold a 0 or a 1. When you need to add a data map to the Bloom filter, instead of adding the original data, you use multiple hash functions to generate multiple hashes, and set the subscript position of each generated hash to 1. So, forget pulling data from the Bloom filter, we don’t store raw data at all.
For example, the three hash functions of a” Hydra” generate subscripts 1, 3, and 6, and so on. So what does this data structure do? We can use this bit vector to determine whether the data exists or not.
Specific process:
A. Compute multiple hashes of data;
B. Determine whether these bits are 1. If all bits are 1, data may exist;
C. If one or more bits are not 1, the data does not exist.
It is important to note that the Bloom filter is prone to miscalculation because as the amount of data stored increases, the number of bits set to 1 increases. Therefore, it is possible to query a non-existent data by chance and all bits have already been set to 1 by other data, i.e. a hash collision occurs. Therefore, the Bloom filter can only determine whether the data is likely to exist, but can not achieve 100 percent certainty.
Google’s Guava package provides a stand-alone version of the Bloom filter implementation, so let’s see how it works
First we introduce maven dependencies:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.1 the jre</version>
</dependency>
Copy the code
1 million pieces of data are fed into the Bloom filter simulation, given the misjudgment rate, and then non-existent data are used for judgment:
public class BloomTest {
public static void test(int dataSize,double errorRate){
BloomFilter<Integer> bloomFilter=
BloomFilter.create(Funnels.integerFunnel(), dataSize, errorRate);
for(int i = 0; i< dataSize; i++){
bloomFilter.put(i);
}
int errorCount=0;
for(int i = dataSize; i<2* dataSize; i++){
if(bloomFilter.mightContain(i)){
errorCount++;
}
}
System.out.println("Total error count: "+errorCount);
}
public static void main(String[] args) {
BloomTest.test(1000000.0.01);
BloomTest.test(1000000.0.001); }}Copy the code
Test results:
Total error count: 10314
Total error count: 994
Copy the code
It can be seen that 10314 misjudgments were made when the misjudgment rate was given as 0.01, and 994 when the misjudgment rate was 0.001, which basically met our expectations.
But because Guava’s Bloom filter runs in JVM memory, it only supports singleton applications and does not support microservice distribution. Then there is no support for distributed Bloom filter, then Redis stood out, self-created problems to solve themselves!
Redis’s BitMap supports bitwise operations, where a single bit represents the value or state of an element.
Setbit key offset value Getbit key offset Value GetBit Key offset Value Setbit key offset value Getbit Key offsetCopy the code
Since the Bloom filter is assigned to bits, we can use the setbit and getbit commands provided by BitMap to implement it very simply, and the setbit operation can achieve automatic array expansion, so there is no need to worry about the situation of insufficient array bits in the use process.
/ / source reference https://www.cnblogs.com/CodeBear/p/10911177.html
public class RedisBloomTest {
private static int dataSize = 1000;
private static double errorRate = 0.01;
//bit The length of the array
private static long numBits;
// Number of hash functions
private static int numHashFunctions;
public static void main(String[] args) {
numBits = optimalNumOfBits(dataSize, errorRate);
numHashFunctions = optimalNumOfHashFunctions(dataSize, numBits);
System.out.println("Bits length: "+numBits);
System.out.println("Hash nums: "+numHashFunctions);
Jedis jedis = new Jedis("127.0.0.1".6379);
for (int i = 0; i <= 1000; i++) {
long[] indexs = getIndexs(String.valueOf(i));
for (long index : indexs) {
jedis.setbit("bloom", index, true);
}
}
num:
for (int i = 1000; i < 1100; i++) {
long[] indexs = getIndexs(String.valueOf(i));
for (long index : indexs) {
Boolean isContain = jedis.getbit("bloom", index);
if(! isContain) { System.out.println(i +"Doesn't exist.");
continue num;
}
}
System.out.println(i + "It may exist."); }}// Get the bitmap subscript based on key
private static long[] getIndexs(String key) {
long hash1 = hash(key);
long hash2 = hash1 >>> 16;
long[] result = new long[numHashFunctions];
for (int i = 0; i < numHashFunctions; i++) {
long combinedHash = hash1 + i * hash2;
if (combinedHash < 0) {
combinedHash = ~combinedHash;
}
result[i] = combinedHash % numBits;
}
return result;
}
private static long hash(String key) {
Charset charset = Charset.forName("UTF-8");
return Hashing.murmur3_128().hashObject(key, Funnels.stringFunnel(charset)).asLong();
}
// Calculate the number of hash functions
private static int optimalNumOfHashFunctions(long n, long m) {
return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
}
// Calculate the length of the bit array
private static long optimalNumOfBits(long n, double p) {
if (p == 0) {
p = Double.MIN_VALUE;
}
return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2))); }}Copy the code
The number of hash functions and the length of the bit array are calculated dynamically in bitmap-based distributed Bloom filter implementation. It can be said that the lower the given fault tolerance rate, the more hash functions, the longer the array length, and the higher the memory overhead of Redis used.
The maximum size of the bloem filter array in guava is determined by the upper limit of the int value, which is about 2.1 billion, while redis’s bit array is 512MB, which is 2^32 bits, so the maximum size can reach 4.2 billion, which is twice the size of guava.
Well, this article we first introduced this, the next article will talk about the remaining two questions ~
The last
If you think it is helpful, you can like it and forward it. Thank you very much
Public number agriculture ginseng, add a friend, do a thumbs-up friend ah