This is the 31st day of my participation in the August Text Challenge.More challenges in August
1 HyperLogLog(Cardinal statistics)
The main application scenario of HyperLogLog is cardinality statistics. Instead of actually storing the value of each element, it uses a probabilistic algorithm that counts elements by storing the first 1 of the element’s hash value. HyperLogLog can perform independent statistics in a very small space. The command is as follows:
The command | role |
---|---|
pfadd key element … | Add all elements to the key |
pfcount key | Estimated value of statistics key (imprecise) |
pgmerge new_key key1 key2 … | Merge key to new key |
The application case
How many different accounts visit the Google homepage every day?
For a webpage with huge page views like Google, there is not much difference between the number of page views of one billion or one billion and one million. Therefore, in such a business scenario, in order to save costs, only a approximate value can be calculated, but it is not necessary to calculate an accurate value.
For the above scenario, you can use HashMap, BitMap, and HyperLogLog. Here’s a comparison of these three solutions:
HashMap
: Simple algorithm, high statistical accuracy, recommended for a small amount of data, but for a large amount of data will occupy a large memory spaceBitMap
: bitmap algorithm, the specific content can refer to my article, high statistical accuracy, although the memory footprint thanHashMap
Small, but still takes up a lot of memory for a lot of dataHyperLogLog
: Has a certain error and occupies about 12 KB of memory. 2^64 elements can be counted. You are advised to use this parameter in the application scenario described in the preceding example
2 Geo(Geospatial Information)
Geo is mainly used to store geographic location information and perform operations on the stored information (adding, obtaining, calculating the distance between two locations, obtaining the location set within a specified range, obtaining the location set within a specified range). Redis supports storing Geo information in ordered sets (Zsets) and populating them with Geohash algorithms. The command is as follows:
The command | role |
---|---|
geoadd key latitude longitude member | Add the member location (latitude, longitude, name) to the key |
geopos key member … | Gets the member’s GEO coordinates |
geodist key member1 member2 [unit] | Calculates the distance between member positions. Return null if one of the two positions does not exist |
georadius | Query based on latitude and longitude coordinate range |
georadiusbymember | Query based on member location range |
geohash | Compute the latitude and longitude hash |
GEORADIUS
GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count]
Copy the code
Returns all of the location elements contained by the key that are within a given maximum distance from the center, centered at a given latitude and longitude. A range can use one of the following units:
- M is in meters
- Km is expressed in kilometers
- Mi means miles
- Ft is in feet
The command returns additional information when given the following options:
WITHDIST
: Returns the location element and the distance between the location element and the center. The units of distance and range should be the sameWITHCOORD
: Returns the longitude and dimension of the positional element as wellWITHHASH
: Returns the original Geohashed ordered set score of the positional element as a 52-bit signed integer. This option is mainly used for low-level applications or debugging, and is not useful in practice
The command returns unsorted positional elements by default. The user can specify the order of the elements of the returned position with the following two parameters:
ASC
: Returns the position element from near to far based on the position of the centerDESC
: Returns the position element from far to near based on the position of the center
By default, the GEORADIUS command returns all matching positional elements. Although the user can use the COUNT < COUNT > option to retrieve the first N matched elements, because the command may internally need to process all matched elements, even using the COUNT option to retrieve a small number of elements when searching a very large area, Commands can also be executed very slowly. On the other hand, using the COUNT option to reduce the number of elements that need to be returned is still very useful for reducing bandwidth.
3 Pub/Sub(Publish and Subscribe)
Publish subscriptions are similar to broadcast functions. Redis publishing subscriptions include publishers, subscribers, and channels. Common commands are as follows:
The command | role | Time complexity |
---|---|---|
subscribe channel | Subscribe to a channel | O(n) |
unsubscribe channel … | Unsubscribe from one or more channels | O(n) |
publish channel msg | Sends the message to the specified channel | O(n+m), where n is the number of subscribers for the channel and M is the number of clients using Subscribed Patterns |
pubsub CHANNELS | Viewing subscription and publication system status (Multi-seed mode) | O(n) |
psubscribe | Subscribe to multiple channels | O(n) |
unsubscribe | Unsubscribe from multiple channels | O(n) |
4 Bitmap
A Bitmap is a Bitmap, which is actually a byte array. It is represented by a series of consecutive binary digits (0 or 1), each of which is offset. A Bitmap uses each binary bit to store or mark the corresponding value of an element. It is usually used to determine whether certain data exists or not. Since bitmaps are stored in bits, they can greatly save storage space. Common commands are as follows:
The command | role | Time complexity |
---|---|---|
setbit key offset val | Assign val to offset of the specified key | O(1) |
getbit key offset | Gets the offset bit of the specified key | O(1) |
bitcount key start end | Returns the number of 1 in [start,end] of the specified key | O(n) |
bitop operation destkey key | Bit operations on different binary storage data (AND, OR, NOT, XOR) | O(n) |
The application case
There are 100 million users, 50 million logins, so count the number of daily logins. Each one identifies a user ID, and when a user visits our site, we set the Bitmap that identifies the user to 1. Using a set and Bitmap store comparison:
The data type | Each userID takes up space | The number of users to be stored | Take up all the memory |
---|---|---|---|
set | 32-bit is 4 bytes (assuming userID is an integer, but many sites use long integers) | 50000000 | 32-bit x 50,000,000 = 200 MB |
Bitmap | 1 bit (bit) | 100000000 | 1 bit * 100,000,000 = 12.5MB |
Application scenarios
- User Online Status
- User check-in status
- Collecting unique Users
5 BloomFilter
When an element is added to the set, K hash functions are used to map the element to K points in a bit array (multiple hash functions are used to hash the element key (bloom does not contain value) to calculate an integer index value, and then modulo the length of the bit array to obtain a position. Each unbiased hash function gets a different position), set them to 1. When we search, we know (roughly) if it is in the set by seeing if all of the points are 1:
- If any of these points is 0, the element being checked must be absent
- If they’re all 1’s, it doesn’t necessarily mean that the element is there, but maybe they’re 1’s because other elements are there, and that’s why bloom’s filter miscalculates
Application scenarios
- Solve cache penetration: put all existing keys into Redis BloomFilter in advance, which is used for existence detection. If BloomFilter does not exist, the data must not exist. If BloomFilter exists, the actual data may not exist
- Blacklist check: assuming that the number of blacklist is hundreds of millions, storage is very expensive storage space, Bloom filter is a better solution. Put all blacklists in the Bloom filter, and then check whether the email address is in the Bloom filter
- Web interceptor: for the first request, the user puts the request parameters into the Bloom filter. For the second request, the user determines whether the request parameters are hit by the Bloom filter, thus improving the cache hit ratio
5.1 Based on Bitmap data structure
import com.google.common.base.Preconditions;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import java.util.Collection;
import java.util.Map;
import java.util.concurrent.TimeUnit;
@Service
public class RedisService {
@Resource
private RedisTemplate<String, Object> redisTemplate;
/** * Adds the value */ based on the given Bloom filter
public <T> void addByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) { Preconditions.checkArgument(bloomFilterHelper ! =null."BloomFilterHelper cannot be empty");
int[] offset = bloomFilterHelper.murmurHashOffset(value);
for (int i : offset) {
redisTemplate.opsForValue().setBit(key, i, true); }}/** * Determines the presence of */ based on the given Bloom filter
public <T> boolean includeByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) { Preconditions.checkArgument(bloomFilterHelper ! =null."BloomFilterHelper cannot be empty");
int[] offset = bloomFilterHelper.murmurHashOffset(value);
for (int i : offset) {
if(! redisTemplate.opsForValue().getBit(key, i)) {return false; }}return true; }}Copy the code
5.2 Based on RedisBloom module
The RedisBloom module provides four data types:
- Bloom Filter
- Cuckoo Filter
- Count-Mins-Sketch
- Top-K
Bloom filters and Cuckoo are used to determine (with a given degree of certainty) whether an item exists in a set. Count-Min Sketch is used to estimate the number of items in the sublinear space, and top-K is used to maintain a list of K most frequent items.
# 1. The git download
[root@test ~]# git clone https://github.com/RedisBloom/RedisBloom.git
[root@test ~]# cd redisbloom
[root@test ~]# make
# 2. The wget to download
[root@test ~]Wget # https://github.com/RedisBloom/RedisBloom/archive/v2.0.3.tar.gz
[root@test ~]# tar - ZXVF RedisBloom - 2.0.3. Tar. Gz
[root@test ~]# CD RedisBloom - 2.0.3 /
[root@test ~]# make
# 3. Modify the Redis Conf
[root@test ~]#vim /etc/redis.conf
# Add downlink to fileLoadmodule/root/RedisBloom - 2.0.3 / RedisBloom. SoStart Redis Server
[root@test ~]# /redis-server /etc/redis.conf
Or load the OS file when starting the service
[root@test ~]# /redis-server /etc/redis.conf --loadmodule /root/RedisBloom/redisbloom.so
# 5. Test RedisBloom
[root@test ~]# redis-cli127.0.0.1:6379> bf.add bloomFilter foo 127.0.0.1:6379> bf.exists bloomFilter foo 127.0.0.1:6379> cf. Add cuckooFilter foo 127.0.0.1: > 6379 cf. The exists cuckooFilter fooCopy the code