Either swim or sink, this is I see a doctor in the book of the words, sounds feel quite extreme, a lot of pressure, I summarized a truth, all inspirational words let you believe to there is a light, after efforts to the future is beautiful, that is to say, the practice of this sentence, either in swimming, or in the dark, Maybe this is the ups and downs that life needs, whether it’s the ups and downs in order to motivate people, or the ups and downs in life.

Daily BB End

Today, talk about bloom filters

Bloom filters are used to test whether an element is in a collection. It’s actually a long binary vector and a series of random mapping functions

Take a look at Google’s implementation

Private static int total = 1000000; private static int total = 1000000; private static org.slf4j.Logger logger = LoggerFactory.getLogger(Bloom.class); /** * Create a BloomFilter */ private static BloomFilter<Integer> bf = bloomfilter. create(funnel.integerfunnel (), total); Public static void main(String[] args) {for (int I = 0; i < total; i++) { bf.put(i); } int j = 5000; Boolean flag = bf.mightContain(j); Logger. info("{} result: {}", j, flag); For (int I = 0; i < total; i++) { if (! Bf.mightcontain (I)) {logger.info(" ~~~"); Int count = 0; int count = 0; for (int i = total; i < total + 10000; i++) { if (bf.mightContain(i)) { count++; }} logger.info(" Error count: "+ count); }Copy the code

The code is pretty simple. Let’s talk about the dry stuff

* query optimization, there is a kind of can be cached in processing * actually still can put the results of a query in the map inside * bloom filter can solve this, he is a similar array * on the inside of the type of data structure values, by k times hash on an array of k points, and the k points are marked as 1 * search, Is going to find the value of the by k a hash function, and to find the corresponding position, from an array and if one of the k position is zero, then it must be no * if is 1 then may not exist (as for a bloom filter does not exist in the value, he went to do the hash computation, given a bit of the array, The disadvantage is that it is not accurate, and it does not support deletion (because deletion requires setting the position of the element in the array to 0, but this position is shared, so it cannot be deleted). In addition, the number of keys to use can be specified, which can be calculated using a formula. This API does not need to be used because it has already been calculated * in addition to the following is using Google Gava library of it is single does not support distributed * so it is better to use Redis bloom filter implementation in several forms * Lua script * reBloom The idea of deploying the service * and through Pyton * is the sameCopy the code

HashMap is not used because it takes up memory. It also saves a lot of data in it.

A HashMap has a high memory footprint. Given the load factor, it is usually impossible to run out of space, and once you have a lot of values, such as hundreds of millions, the memory footprint of a HashMap can become very large. This means that you need to save some space for Hash, not all of them, but the Hash can store all of them.

And a HashMap cannot be written all at once when there are many elements