This is the transcript of my speech at the GIAC technology Conference in Shenzhen, which I attended on June 23rd.

Audience friends, I am Qian Wenpin, an engineer from Ireader, and the author of the gold-digging pamphlet Redis Deep Adventure. Today I bring to share the topic is: Redis optimization practice under massive data and high concurrency. Redis is no stranger to Internet technical engineers. Almost all large and medium-sized enterprises are using Redis as a cache database, but for the vast majority of enterprises, they only use its most basic KV cache function, and many advanced functions of Redis may not have been seriously practiced. Today, IN this one hour, I will focus on Redis and share 9 classic cases in daily business development. I hope this sharing can help you better apply the advanced features of Redis to daily business development.

Ireader has a total user base of around 500 million, with a monthly output of 5kw and a daily output of nearly 2kw. The server has more than 1000 Redis instances, 100+ clusters, and the memory of each instance is controlled under 20GB.

KV cache

The first one is the most basic and commonly used KV function. We can use Redis to cache user information, session information, commodity information and so on. The following code is generic cache read logic.

def get_user(user_id):
    user = redis.get(user_id)
    if notGet (user_id) redis. Setex (user_id, TTL, user) // Set the cache expiration timereturn user

def save_user(user):Redis.setex (user.id, TTL, user) // Set the cache expiration time db.save_async(user) // Write to the database asynchronously. dbCopy the code

This expiration time is very important and is usually proportional to the user’s single session length to ensure that the user can use the cached data as long as possible during a single session. Of course, if your company has deep pockets and is extremely focused on performance experience, you can set the time to be longer or even not set the expiration time at all. When the amount of data grows, Codis or Redis-cluster is used to expand the capacity.

Redis also provides a cache mode. The Set directive does not have to Set an expiration date, and it can also flush out these key pairs according to a certain strategy. The command to turn on the cache mode is config set maxMemory 20GB, so that when the memory reaches 20GB, Redis will start the phase-out strategy to make room for new key/value pairs. Redis also provides many kinds of this strategy. To sum up, this strategy can be divided into two parts: delimiting elimination scope and selecting elimination algorithm. For example, the strategy we use online is AllKeys-lRU. Allkeys indicates that allkeys within Redis are deprecated, regardless of whether they have an expiration date, whereas volatile only deprecates those with an expiration date. The elimination function of Redis is just like the need for enterprises to tighten their belts and carry out a cruel round of talent optimization during the winter of economic winter. Will it choose to optimize only temporary workers, or will everyone be optimized equally? So when this range is defined, it’s going to pick a number of places, and how do you choose them? This is the elimination algorithm. The most commonly used algorithm is the LRU algorithm, which has a weakness that people who do a good job on the surface can escape optimization. If you take the opportunity to make a good show of yourself to your boss, you’ll be safe. Therefore, IN Redis 4.0, LFU algorithm is introduced, and it is necessary to assess your usual performance. It is no longer enough to only do superficial efforts, but also to see whether you are diligent or not. Finally, there is a less commonly used algorithm, the random lottery algorithm, which may eliminate the CEO, so it is generally not used.

A distributed lock

Let’s look at the second feature, distributed locking, which is the most commonly used feature in addition to the KV cache. For example, a very capable senior engineer, who develops quickly and has high code quality, is the star of the team. So a lot of product managers have to bother him, let him do their own needs. If a bunch of product managers come to him at the same time, his thinking will fall into chaos, even the best programmer, the brain concurrency is not much better. So he put a do not Disturb sign on the door knob of his office, and when a product manager came in, he would check if there was a do not disturb sign on the door knob, and if there wasn’t, he could come in and talk to the engineer about his needs, hang the sign up before he talked, and then take it off. So when other product managers come to bother him, if they see the sign hanging there, they can either go to sleep and wait or do something else first. So the star engineer found peace.

The distributed lock is used in a simple way with the Set directive’s extended arguments as follows

# lock
set lock:$user_id owner_id nx ex=5
# releases the lock
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end
# is equivalent to
del_if_equals lock:$user_id owner_id
Copy the code

Be sure to set this expiration date, because in exceptional circumstances — such as an earthquake (process kill -9, or machine downtime), the product manager may choose to jump out of a window without a chance to delist, resulting in deadlock starvation, making the good engineer a deadbeat and a terrible waste of resources. Also note the owner_id, which represents who added the lock — the product manager’s id. In case your lock gets accidentally removed. The owner_ID must be matched before the lock is released. The owner_ID is usually a random number stored in a ThreadLocal variable (the stack variable).

This is not officially recommended, as it can cause lock loss issues in clustered mode when a master/slave switch occurs. The official recommended distributed lock is called RedLock. The author believes that this algorithm is relatively secure and recommends us to use it. However, Ireader has always used the simplest distributed lock above, why don’t we use RedLock, because its operation and maintenance costs will be higher, requiring more than 3 independent Redis instances, which will be more complicated to use. In addition, the probability of primary/secondary switchover in Redis cluster is not high, and the probability of lock loss is very low even if the primary/secondary switchover occurs, because there is usually a process of primary/secondary switchover, and the time of this process usually exceeds the expiration time of the lock, so abnormal lock loss will not occur. There is also a distributed lock encounter lock conflict opportunities are not much, this is just like a company star programmers are also relatively limited, always encounter lock queue that means the need to optimize the structure.

Delay queue

Let’s move on to the third feature, delayed queuing. As we mentioned earlier, there are several strategies a product manager can use when encountering a “Do not Disturb” sign. 1. 2. Sleep. 2. Give up. 3. Dry waiting is a spinlock, which burns CPU and drives up Redis QPS. To sleep is to sleep for a while and try again, which wastes thread resources and increases response times. To give up is to tell the front-end user to try again later. Now the system is under great pressure and a little busy, which affects the user experience. And finally, the strategy that I’m going to talk about now — I’ll come back to it later, but it’s the most common strategy in the real world. This strategy is commonly used in the consumption of message queues. What should I do when I encounter lock conflicts? It cannot be discarded without processing, nor is it suitable for immediate retry (spinlock), where messages can be thrown into a delayed queue and processed later.

There are many specialized messaging middleware that support delayed messaging, such as RabbitMQ and NSQ. Redis also works, and we can use Zset to implement this delay queue. Zset stores value/score key-value pairs. Value is stored as serialized task message, and score is stored as deadline of the next task message. Then, task messages whose score value is greater than now in Zset are polled for processing.

# Production delay message
zadd(queue-key, now_ts+5, task_json)
# consume delayed messages
while True:
  task_json = zrevrangebyscore(queue-key, now_ts, 0.0.1)
  if task_json:
    grabbed_ok = zrem(queue-key, task_json)
    if grabbed_ok:
      process_task(task_json)
  else:
    sleep(1000) / / rest1s
Copy the code

When the consumer is multi-threaded or multi-process, there is a competitive waste problem. The current thread polls task_JSON from Zset, but it can’t get it when it tries to grab it from ZREM. You can use LUA scripts to solve this problem by atomizing polling and scrambling operations so that you can avoid race waste.

local res = nil
local tasks = redis.pcall("zrevrangebyscore", KEYS[1], ARGV[1].0."LIMIT".0.1)
if #tasks > 0 then
  local ok = redis.pcall("zrem", KEYS[1], tasks[1])
  if ok > 0 then
    res = tasks[1] 
  end
end
return res
Copy the code

The reason WHY I’m talking about distributed locks and delayed queues is because there was an early failure on the line. When the fault occurred, the length of a Redis queue on the line was too large. As a result, many asynchronous tasks could not be executed and service data was faulty. Later we found out the cause, that is, because the distributed lock is not good enough to cause the deadlock, and when the lock fails to sleep infinite retry results in the asynchronous task completely into the sleep state can not process the task. So how did this distributed lock work? Using setnx + expire, the result was that stopping the process during a service upgrade directly caused individual requests to execute SETnx, but expire was not executed, resulting in individual user deadlocks. If the user fails to lock, the thread will retry indefinitely. If the user fails to lock, the thread will completely stop working. Because of this incident, we have the correct form of distributed locking and the invention of delayed queues today, as well as graceful downtime, because if there were logic for graceful downtime, then service upgrades would not result in requests being interrupted halfway through execution, unless the process was killed by -9 or crashed.

Timing task

There are many ways to implement distributed scheduled tasks, and the most common one is the Master-workers model. The master is responsible for managing the time and dropping the task messages into the message middleware when the time is up. Then workers are responsible for listening to the message queues to consume messages. The famous Python timed task framework Celery does this. There is a problem with Celery though that the master is single point and if the master dies the whole cQ system will stop working.

Another implementation is the multi-master model. What does this model mean? It is similar to the Quartz framework in Java, which uses database locks to control task concurrency. There will be multiple processes, each of which will manage the time, and when the time is up, they will use the database lock to compete for the task execution right. The captured process will get the opportunity to execute the task, and then start executing the task. This solves the master’s single point problem. One disadvantage of this model is that it creates a problem of race waste, but in general most business systems don’t have that many timed tasks, so this race waste is not significant. Another problem is that it depends on the consistency of distributed machine time. Inconsistent times on multiple machines can cause tasks to be executed multiple times, which can be mitigated by increasing the duration of database locks.

Now that we have Redis distributed locks, we can implement a simple timed task framework on top of Redis.

Register a scheduled task
hset tasks name trigger_rule
Get the scheduled task list
hgetall tasks
# Scramble for tasks
set lock:${name} true nx ex=5
# Task list changes (rolling upgrade)
# poll the version number, reload the task list if there is any change, and reschedule the task if the time is changed
set tasks_version $new_version
get tasks_version
Copy the code

If you find the code inside Quartz unreadable and distributed documentation almost nonexistent, try Redis, which will help you lose less hair.

Life is Short, I use Redis https://github.com/pyloque/taskinoCopy the code

Frequency control

If you’ve ever done community work, you’ll inevitably encounter spam. You wake up to some inexplicable AD post on your front page. The user experience can be severely affected if the appropriate mechanisms are not in place to control it.

There are many strategies to control AD spam. The advanced one is through AI. The simplest way is through keyword scanning. Another common method is frequency control, which limits the content production speed of a single user. Different levels of users will have different frequency control parameters.

Frequency control can be achieved by using Redis. We understand the user’s behavior as a time series, and we must ensure that the length of the time series of a single user is limited within a certain period of time, beyond which the user’s behavior is prohibited. It can be implemented using Redis’s zset.

The green section in the figure is the time series information of a time period that we want to keep, and the gray section will be cut off. Counting the number of time series records in the green segment will tell whether the frequency threshold has been exceeded.

The following code controls the user’s UGC behavior to a maximum of N times per hour

hist_key = "ugc:${user_id}"
with redis.pipeline() as pipe:
  # Record current behavior
  pipe.zadd(hist_key, ts, uuid)
  Keep the behavior sequence for 1 hour
  pipe.zremrangebyscore(hist_key, 0, now_ts - 3600)
  Get the number of actions within the hour
  pipe.zcard(hist_key)
  Set expiration time to save memory
  pipe.expire(hist_key, 3600)
  # batch execution
  _, _, count, _ = pipe.exec()
  return count > N
Copy the code

Service discovery

A slightly more technologically sophisticated enterprise will have an infrastructure for service discovery. Distributed configuration databases such as ZooKeeper, ETCD, Consul are usually used to store service lists. They have very timely notification mechanisms to notify service consumers that the list of services has changed. So how do we use Redis for service discovery?

Here we will use the Zset data structure again, where we use zset to hold a single list of services. Multiple service lists are stored using multiple Zsets. The value and score of zset stores the address of the service and the heartbeat time, respectively. The service provider needs to use the heartbeat to report its own survival, calling Zadd every few seconds. When a service provider stops a service, it uses ZREM to remove itself.

zadd service_key heartbeat_ts addr
zrem service_key addr
Copy the code

This is not enough, because the service may terminate unexpectedly and never have a chance to execute the hook, so an extra thread is needed to clean up the expired items in the service list

zremrangebyscore service_key 0 now_ts - 30  No heartbeat for 30 seconds
Copy the code

The next important issue is how to notify consumers that the list of services has changed, again using the version number polling mechanism. Incrementing the version number when the service list changes. Consumers reload the list of services by polling for changes in version numbers.

if zadd() > 0 || zrem() > 0 || zremrangebyscore() > 0:
  incr service_version_key
Copy the code

However, there is another problem. If the consumer relies on a large list of services, it will need to poll for many version numbers, which makes IO inefficient. In this case, we can add a global version number, which is incremented when the version number of any service list changes. In normal circumstances, consumers only need to poll for the global version number. When the global version number changes, it compares the sub-version numbers of the dependent service list one by one, and then loads the changed service list.

Github.com/pyloque/cap…

The bitmap

The check-in system of Ireader was developed earlier, when the number of users had not yet increased. The design is relatively simple, which is to store the user’s check-in status with the Hash structure of Redis, and record one entry in the hash structure for each check-in. There are three states of check-in: unsigned, signed and re-signed, which are 0, 1 and 2 respectively.

hset sign:${user_id} 2019-01-01 1
hset sign:${user_id} 2019-01-02 1
hset sign:${user_id}The 2019-01-03 2...Copy the code

This is a waste of user space. Later, when the sign-in date was over ten million, the Redis storage problem became prominent, directly exceeding 30G+ memory. Our online instance usually started to alarm when it was over 20G, and 30G has been seriously exceeded.

This is where we start to solve the problem, to optimize the storage. We chose to use a bitmap to record check-in information. One check-in status requires two bits to record, and only 8 bytes of storage space for a month. This allows you to store a month’s worth of check-in records using a very short string.

After optimization, the effect is very obvious, the memory is directly reduced to 10 GB. Because the API is called frequently to query the check-in status for the entire month, the traffic on the interface is also much lower.

However, bitmap also has a disadvantage. The bottom layer of bitmap is string, which is continuous storage space, and the bitmap will automatically expand. For example, a large bitmap has 8m bits.

Hence the growl bitmap data structure, which segmented the large bitmap, eliminating the need to store all zeros. In addition, a sparse storage structure is designed for each segment. If there are not many 1 bits on the segment, only their offset integers can be stored. This is a very significant compression of bitmap storage space.

This roar bitmap is very valuable in the field of accurate counting of big data.

Juejin. Cn/post / 684490…

Fuzzy count

With the check-in system mentioned earlier, what if the product manager needs to know the day and month of the check-in? Usually we just dump the blame — go to the data department. However, the data of the data department is often not very real-time, often the data of the previous day can only be released the next day, and offline calculation is usually scheduled once a day. So how do you implement a real-time active count?

The simplest solution is to maintain a set in Redis, take a user, just sadd, the final set size is the UV number we need. But this is a huge waste of space, and it doesn’t seem worth it to store such a huge collection just for one number. So what to do?

At this point you can use the HyperLogLog fuzzy counting function provided by Redis, which is a probability counting with a certain error of about 0.81%. But the space footprint is small, with a bitmap at the bottom, which only takes up 12K of storage space at most. And when the count is relatively small, the bitmap uses sparse storage, which takes up even less space.

# log user
pfadd sign_uv_${day} user_id
Get the number of records
pfcount sign_uv_${day}
Copy the code

Wechat public number of articles can be read using it, UV statistics of web pages can be completed. But if a product manager is very concerned about numerical accuracy, such as a statistical requirement that is directly tied to money, then you might want to consider the roar bitmap mentioned earlier. It is a bit more complicated to use, requiring integer serialization of the user ID ahead of time. Redis doesn’t offer roar bitmaps natively, but there is an open source Redis Module that you can use.

Github.com/aviggiano/r…

Bloom filter

Finally, bloom filters can be very valuable if a system is about to be flooded with new users, and can significantly reduce cache penetration and reduce database stress. This influx of new users is not necessarily a massive roll-out of the business system, but could also be due to cache penetration attacks from outside.

def get_user_state0(user_id):
  state = cache.get(user_id)
  if not state:
    state = db.get(user_id) or {}
    cache.set(user_id, state)
  return state

def save_user_state0(user_id, state):
  cache.set(user_id, state)
  db.set_async(user_id, state)
Copy the code

For example, the above is the user status query interface code of this business system. Now a new user comes, it will first search the cache for the status data of this user. Because it is a new user, it must not be in the cache. And then it goes to the database, and there’s no database. If such a large influx of new users is instantaneous, it is predictable that the pressure on the database will be high and there will be a large number of empty queries.

We really want Redis to have a set that holds all the user ids so that we can query the set to see if a new user is coming. When the user base is very large, the storage space required to maintain such a collection is very large. In this case, you can use a Bloom filter, which is equivalent to a set but, unlike a set, requires much less storage. For example, if you store a user ID in 64 bytes, bloom filters store a user ID in just over 1 byte. However, it does not store the user ID, but the fingerprint of the user ID, so there is a small probability of misjudgment. It is a container with fuzzy filtering ability.

When it says that the user ID is not in the container, it is definitely not. When it says the user ID is in the container, 99% of the time it’s correct, and 1% of the time it makes a misjudgment. However, in this case, this misjudgment is not a problem, the cost of misjudgment is only cache penetration, equivalent to 1% of the new users are not protected by the Bloom filter directly through the database query, while the remaining 99% of the new users can be effectively blocked by the Bloom filter, avoiding cache penetration.

def get_user_state(user_id):
  exists = bloomfilter.is_user_exists(user_id)
  if not exists:
    return {}
  return get_user_state0(user_id)

def save_user_state(user_id, state):
  bloomfilter.set_user_exists(user_id)
  save_user_state0(user_id, state)
Copy the code

A good metaphor for how a Bloom filter works is that if you walk on a snow-covered surface in winter, you leave footprints. If there are footprints on the ground, it is highly likely that you have been to this place, but not necessarily, perhaps someone else’s shoes are exactly the same as yours. But if there are no footprints on the ground, it is 100% certain that you have not been to this place.