Everyone is busy. Get busy living or get busy dying. There are only two choices.

The Shawshank Redemption

preface

First in the public number, after the synchronization of nuggets and personal website!!

Interview questions from: One and a half years of social recruitment experience sharing (including Ali, Meituan, headlines, JD, Didi)

Today I would like to share some common code optimization techniques in my daily work. I hope it will be helpful for you!

The article was first published on the official account (Moon with Flying fish) and then synchronized to the personal website: Xiaoflyfish.cn /

Feel fruitful, hope help to like, retweet ha, thank you, thank you

Why cache

The use of caching techniques is frequently mentioned for high concurrent requests, most directly because the disk IO and network overhead is thousands or thousands of times higher than the memory IO of direct requests

As a simple calculation, if we need some data that takes 0.0045s to read from the database disk and 0.0005s to transfer through the network request, then each request will take at least 0.005s to complete. The data server can only respond to 200 requests per second. If the data is stored in native memory, It only takes 100us to read out, so it can respond to 10,000 requests per second. By storing data closer to the CPU, data transfer time is reduced and processing efficiency is improved. This is what caching is all about.

In what scenarios is caching appropriate

  • Read-intensive applications
  • Hot data applications exist
  • Applications requiring high response time
  • Scenarios that do not have strict requirements on consistency
  • When you need to implement distributed locking

In what scenarios is caching not appropriate

  • Scenarios that have strict consistency requirements
  • In scenarios where updates are frequent and data is updated frequently, the cost of frequently synchronizing data in the cache may be greater than the cost of not using the cache at all
  • In low-read scenarios, it makes no sense to use a cache for a low-read system, instead using a cache to read data more efficiently

Cache benefit cost comparison

earnings

  1. Speed up reading and writing. Because caches are usually full memory systems, the use of caches can improve user access speed and optimize user experience.
  2. Reduce back-end load. By adding caching, you can reduce the load on the back end by reducing the number of visits and complex calculations if the program is fine and the hit ratio is fine.

The cost of

  1. Data inconsistency. No matter how well the design is done, there is always a window of time when the cached data is inconsistent with the real data source
  2. Code maintenance costs. With the cache, the code will add the related code to the original data source. For example, it was just some SQL, but now it will increase the maintenance cost of the code.
  3. Architecture complexity. The presence of caches requires dedicated administrators to maintain the master and slave cache systems, which also increases the complexity and maintenance costs of the architecture.

Common problems in high concurrency scenarios

In high concurrency scenarios, caching causes the following problems:

1. Cache consistency

Cache concurrency (cache breakdown)

The cache to penetrate

4. Cache Avalanche (cache failure)

Let’s say you’re a very rich person, you have baidu Cloud, You have Tencent Video, you have all kinds of membership, but you don’t have Netflix membership, and you post those accounts and passwords to a website that you made, and then you have a friend who checks your website every ten seconds, After finding out your site doesn’t have a Netflix membership and calling you to ask for it. You’re a database. The website is Redis. This is called cache penetration.

Everyone likes to watch The King of Comedy on Tencent Video, which was released on Xingchi last week. However, your membership suddenly expired, and everyone could not see the account of Tencent Video on your website, so they called you one after another to ask about it. This is the cache breakdown

When all of your members suddenly expire at the same time, it’s a cache avalanche.

Here is an introduction!

Cache consistency issues

When data timeliness is high, ensure that the data in the cache is consistent with that in the database, and ensure that the data in the cache node and the replica are consistent. This compares the caching dependent expiration and update policies. Generally, when the data changes, proactively update the data in the cache or remove the corresponding cache.

All of the following scenarios lead to data consistency issues

For a detailed introduction and solution to this topic, please refer to my previous article: Meituan Interview Question: Cache Consistency, here is my answer!

Cache breakdown problem

For some keys with set expiration time, it is possible that these keys will be accessed excessively concurrently at some point in time, which is a kind of very “hot” data. At this point, consider the problem of a “breakdown” of the cache. The difference between a cache avalanche and a cache avalanche is that it is a key cache, whereas a cache avalanche is a number of keys.

As shown in the figure:

Solution:

A common practice in the industry is to use mutex.

1. Use a mutex key

The idea is simple: let one thread build the cache, and the other threads wait for the building thread to finish, and then retrieve the data from the cache again

Synchronized or lock can be used in single-node environments, and distributed locks can be used in distributed environments (Redis setnx, ZooKeeper add node operation).

Redis pseudocode is as follows:

public String get(String key) {
        String value = storeClient.get(key);
        StoreKey key_mutex = new MutexStoreKey(key);
        if (value == null) {// Indicates that the cache value has expired
            // Set a timeout of 2 minutes to prevent the failure to obtain DB data when the cache expires next time
            if (storeClient.setnx(key_mutex, 1.2 * 60)) {  // Indicates that the setting is successful
                value = db.get(key);
                storeClient.set(key, value, 3 * 3600);
                storeClient.delete(key_mutex);
            } else {
                sleep(1000);  // This indicates that other threads at the same time have retrieved the DB data and returned it to the cache
                return get(key);  / / try again}}return value;
    }
Copy the code

2. Use mutex keys “ahead of time”

In value, set a timeout value (timeout1), which is smaller than the actual Redis timeout(timeout2).

When timeout1 is read from the cache and found to have expired, it is immediately extended and reconfigured to the cache. The data is then loaded from the database and set to the cache.

The difference between option 2 and Option 1 is that if there is data in the cache but it has expired, the mutex is used in advance and the DB is queried for the latest data before being cached

The pseudocode looks like this, and notice the logic in the else code.

public String get(String key) {
        MutexDTO value = storeClient.get(key);
        StoreKey key_mutex = new MutexStoreKey(key);
        if (value == null) {
            if (storeClient.setnx(key_mutex, 3 * 60 * 1000)) {
                value = db.get(key);
                storeClient.set(key, value);
                storeClient.delete(key_mutex);
            } else {
                sleep(50); get(key); }}else {
            if (value.getTimeout() <= System.currentTimeMillis()) {
                if (storeClient.setnx(key_mutex, 3 * 60 * 1000)) {
                    value.setTimeout(value.getTimeout() + 3 * 60 * 1000);
                    storeClient.set(key, value, 3 * 3600 * 2);

                    value = db.get(key);// Get the last DB update
                    value.setTimeout(value.getTimeout() + 3 * 60 * 1000);
                    storeClient.set(key, value, 3 * 3600 * 2);
                    storeClient.delete(key_mutex);
                } else {
                    sleep(50); get(key); }}}return value.getValue();
    }
Copy the code

3. Cache “never expires”

The term “never expires” has two meanings:

  • In redis, no expiration time is set, which ensures that hot key issues will not expire, i.e. “physical” will not expire.
  • Functionally, if it’s not expired, isn’t it static? Therefore, we store the expiration time in the value corresponding to the key. If it is found to be expired, the cache is built through a background asynchronous thread, which is the “logical” expiration.
public String get(String key) {
        MutexDTO mutexDTO = storeClient.get(key);
        String value = mutexDTO.getValue();
        if (mutexDTO.getTimeout() <= System.currentTimeMillis()) {
            ExecutorService singleThreadExecutor = Executors.newSingleThreadExecutor();// Asynchronous update background execution is abnormal
            singleThreadExecutor.execute(new Runnable() {
                public void run(a) {
                    StoreKey mutexKey = new MutexStoreKey(key);
                    if (storeClient.setnx(mutexKey, "1")) {
                        storeClient.expire(mutexKey, 3 * 60); String dbValue = db.get(key); storeClient.set(key, dbValue); storeClient.delete(mutexKey); }}}); }return value;
    }
Copy the code

The three methods are compared as follows:

The solution advantages disadvantages
Use mutex 2. Ensure consistency 1. Code complexity increases. 2
Use mutex in advance 1. Ensure consistency Same as above
Cache never expires 1. Build the cache asynchronously without blocking the thread pool 1. Consistency is not guaranteed. 2. Code complexity increases (maintaining a timekey for each value). 3. It occupies a certain amount of memory space (each value should maintain a timekey).

Cache penetration problem

Cache penetration refers to querying a data that must not exist. Because the cache is written passively when the data is not hit, and for the consideration of fault tolerance, if the data cannot be found from the storage layer, the data will not be written to the cache. As a result, every request for the data that does not exist has to be queried from the storage layer, thus losing the significance of caching.

When the traffic is high, if the DB cannot withstand the instantaneous traffic impact, the DB may die.

As shown in the figure:

Solution:

There are several effective ways to solve the problem of cache penetration. A rough and easy way is to cache empty data. If a query returns empty data (the data does not exist in the database), the empty result is still cached (usually with a short expiration time).

Another method is to use the common Bloom filter to hash all possible data into a bitmap large enough that a certain non-existent data will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system.

1. Empty data is cached

After a Client requests a MISS, the empty object is still saved in the Cache (for a period of time, depending on the problem). The next Request (with the same key) will obtain data from the Cache, protecting the BACK-END DB. The pseudocode is as follows:

public class CacheNullService {
    private Cache cache = new Cache();
    private Storage storage = new Storage();
    /** * Simulate normal mode *@param key
     * @return* /
    public String getNormal(String key) {
        // Get the data from the cache
        String cacheValue = cache.get(key);
        // The cache is empty
        if (StringUtils.isBlank(cacheValue)) {
            // Get it from storage
            String storageValue = storage.get(key);
            // If the stored data is not empty, set the stored value to the cache
            if (StringUtils.isNotBlank(storageValue)) {
                cache.set(key, storageValue);
            }
            return storageValue;
        } else {
            // The cache is not empty
            returncacheValue; }}/** * Simulate anti-penetration mode *@param key
     * @return* /
    public String getPassThrough(String key) {
        // Get the data from the cache
        String cacheValue = cache.get(key);
        // The cache is empty
        if (StringUtils.isBlank(cacheValue)) {
            // Get it from storage
            String storageValue = storage.get(key);
            cache.set(key, storageValue);
            // If the stored data is empty, set an expiration time (300 seconds)
            if (StringUtils.isBlank(storageValue)) {
                cache.expire(key, 60 * 5);
            }
            return storageValue;
        } else {
            // The cache is not empty
            returncacheValue; }}}Copy the code

2. Bloom filter

BloomFilter is a very interesting data structure that can not only block illegal key attacks, but also judge massive amounts of data at low cost and high performance, such as a system with hundreds of millions of users and tens of billions of news feeds. You can use BloomFilter to determine whether a user is reading a news feed

Before accessing all resources (cache, DB), the existing key is saved with a Bloom filter in advance for layer 1 interception. A simple diagram of the algorithm is as follows:

In fact, the Bloom filter only needs to determine whether the userCode sent by the client exists. Hash1, Hash2 and Hash3 in the figure respectively represent three kinds of hash algorithms. Different Usercodes correspond to different data bits. Determine whether the byte bits obtained by each algorithm are the same, as long as one bit is different, then we can think that this userCode does not exist.

Generally, BloomFilter needs to cache the full number of keys, which requires that the number of keys is not large. The best value is less than 1 billion pieces of data, because 1 billion pieces of data takes up about 1.2 GB of memory. BloomFilter can also be used to cache illegal keys. Every time a non-existent illegal key is found, it will be recorded in BloomFilter. This recording scheme will lead to continuous rapid growth of keys stored by BloomFilter. In order to avoid the increase of misjudgment rate due to too many recorded keys, it is necessary to clear the zero periodically

Bloom use cases are as follows:

import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;

import java.util.*;

public class BFDemo {

    private static final int insertions = 1000000;//100W

    public static void main(String[] args) {
        BloomFilter bf = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), insertions);
        Set sets = new HashSet(insertions);
        List<String> lists = new ArrayList(insertions);
        for (int i = 0; i < insertions; i++) {
            String uuid = UUID.randomUUID().toString();
            bf.put(uuid);
            sets.add(uuid);
            lists.add(uuid);
        }
        int wrong = 0;
        int right = 0;
        for (int i = 0; i < 10000; i++) {
            String test = i % 100= =0 ? lists.get(i / 100) : UUID.randomUUID().toString();  // Generate a random 1W string
            if (bf.mightContain(test)) {
                if (sets.contains(test)) {
                    right++;
                } else {
                    wrong++;
                }
            }
        }
        System.out.println("========right=======" + right);
        System.out.println("========wrong======="+ wrong); }}Copy the code

Compare the two methods mentioned above:

The solution Applicable scenario The advantages and disadvantages
Cache empty data 1. Low data hit 2. Frequent data changes and high real-time performance 1. Easy maintenance 2. More cache space required 3
Bloom filter 1. Low data hit. 2 1. Complex code maintenance 2

Cache avalanche problem

Cache avalanches generally occur for two reasons

The first reason is that a large amount of data in the cache expires at the same time, causing a large number of requests to be unprocessed.

As shown in the figure:

Solution:

Design different expiration times

We can avoid setting the same expiration time for a large number of data. If the business layer does require some data failure at the same time, you can use the EXPIRE in order to give each data set the expiration time, these data the expiration date of the increase to a small random number (for example, random increase 1 ~ 3 minutes), as a result, the expiration time of different data is different, but the difference is not too big, avoided large amounts of data already expired at the same time, At the same time, it also ensures that these data will fail at a similar time and still meet business requirements

Service degradation

When a cache avalanche occurs, different data are processed in different ways.

  • When the service application accesses non-core data, it stops querying the data from the cache temporarily and directly returns predefined information, null value, or error information.
  • When the business application is accessing core data, it is still allowed to query the cache and continue reading through the database if the cache is missing.

Add multiple copies to the cache

After a cache exception or a miss request, other cache replicas are read, and multiple cache replicas are deployed in different racks to ensure that the cache system can provide services normally in any case

Monitor the cache system in real time

When the slow ratio of request access exceeds the threshold, it will alarm in time and recover in time through machine replacement and service replacement. You can also use various automatic failover policies to automatically stop abnormal interfaces, edge services, and some non-core functions to ensure the normal operation of core functions in extreme scenarios.

The second reason is:

If the Cache layer crashes for some reason (down, Cache service down or not responding), it means that all the requests will reach the DB layer, and all DB calls will increase, so it will be a bit overwhelmed and even fail.

As shown in the figure:

After the avalanche, the cache has crashed. The solution is to make DB bear more data pressure. We can expand DB to solve this problem, but it is not a long-term solution.

How to prevent it, we can start from the following aspects:

Ensure high availability of the Cache service.

If our cache is also highly available, even if individual instances fail, the impact will not be significant (master/slave switchover or maybe some traffic to the back end), and the operation and maintenance will be automated.

For example, memcache’s consistent hash, Redis’s Sentinel and cluster mechanisms

The dependency isolation component is back-end traffic limiting.

In fact, no matter cache, mysql, hbase, or even other people’s RPC will have problems. We can regard these as resources, as a system with a large amount of concurrency. If a resource is inaccessible, even if the timeout time is set, all threads will still be held, causing other resources and interfaces to be inaccessible.

Practice and estimate in advance.

Before the project goes online, observe the load of the whole system and DB after the cache crash through the drill, and make plans in advance.

The last

Feel fruitful, hope help to like, retweet ha, thank you, thank you

Wechat search: Month with flying fish, make a friend

Reply 666 in the background of the public account, you can get free e-books