Summary of test points for problems caused by cache design

What is caching

A cache is essentially a temporary container for storing data. Data in this container can be read faster.

Two, cache classification

The cache is divided into hardware cache and software cache

The hardware cache is the temporary storage that sits between the CPU and memory

Software cache is divided into memory cache, database cache and network cache

Memory caches Memory caches are very broad in scope, but I will only discuss RAM caches, which are data written in advance to containers (lists, maps, sets) and other data storage units and stored in RAM caches. One of the most commonly used in general application design is Redis. Database caches Data caches are caches of the database itself, not external caches such as Redis/Memcache, etc. Database data is divided into cold data and hot database, popular cold data is stored on the disk is not often queried data; Hot data is frequently queried and is cached in memory. Network caching Network caching is a network technique that reduces Internet traffic and improves end-user response times. For example, the cache in the CPU is used to improve the rate of memory access; Various operating systems also use caches to speed up disk access; Distributed file systems also typically use caches to increase the rate between client and server, such as browser caches, CDN, DNS caches, and so on.

Third, the purpose of using caching

The main purpose of using caching is to achieve high performance and high concurrency

For example, suppose you have a query function where a query request comes in and the application links to the database query result, and the database returns the query result. The whole process from the start of the query to the response takes 100ms, and the queried data is constant or infrequently updated in a short time. When 10000 identical queries come in at a given time, the total processing time for all requests is 100*10000ms.

In the cache scheme, the results of the first query are stored in the memory in the form of K-V. When there are the same query requests, the results are directly queried from the memory through K, and the query result from the memory takes 1ms. When there are 10000 identical queries, the total processing time is 100+1*9999ms. Performance is about 100 times better than without caching.Copy the code

Potential problems with caching

1. Problem 1. Consistency between cache and database

When using cache, it is inevitable to involve the consistency of cache and database data read and write, and how to keep the consistency of cached data and database data. Serialized read/write caching and databases can ensure strong data consistency. However, serialized read/write is inefficient and time-consuming, and cannot support scenarios with high concurrent requests. The reason for the inconsistency between cache data and database data

The inconsistency between cache and database data occurs when data is updated in the following ways.

Method 1: Update the data in the database and delete the corresponding data from the cache after the data is updated

Problems: When database data is successfully updated but cache data fails to be updated or deleted, the data obtained by subsequent query operations is old data

The solution

The database is updated first, the message is sent to the message queue after the success, and the cache is deleted after the message is consumed. By means of the retry mechanism of the message queue, the final consistency is achieved.

Method 2: Delete data from the cache before updating data. After the data is successfully deleted, update the corresponding data in the database

Existing problems: In high concurrency scenario two requests arrive almost at the same time a request is updated, first remove the cache, are querying the database, request two is query operation, at the time of request a database query request second query cache data is not found, will go to the database query request at this time a haven’t updated completely, request the second check data and put the old data in the cache, The data queried by subsequent query operations is all old data

Solution: Delay the double-delete. In order to avoid updating the database, other threads can not read the data from the cache and update the cache. After updating the database, Sleep for a period of time, and then determine whether there is data in the cache and delete the data again.

The time of sleep = the time of reading data by other threads + the time of writing cache

3. Problem 2. Cache penetration

Cache penetration generally occurs when the application is attacked by hackers. When the hacker intentionally requests the data that does not exist in the cache, and then looks it up in the database, there is no data in the database, so it cannot be added to the cache. When the hacker frantically initiates this kind of request, super-concurrent queries will cause the database to crash.

Solution: 1) use mutex, cache invalidation, first to get the lock, get the lock, then to the database request. If no lock is obtained, sleep for a period of time and retry. 2) The database does not find the data, also write a null value to the cache, but set a short invalid time, to prevent malicious attacks.

4. Problem 3. Cache breakdown

Cache breakdown means that when the hotspot data in the cache just expires (invalid), a large number of concurrent query requests arrive. At this time, the application queries the data directly in the database without reading the data in the cache, causing the instantaneous increase of the database pressure, resulting in excessive pressure, which may lead to the database crash.

Solution:

When multiple threads query the data in the database at the same time, then we can use a mutex lock on the first request to query the data. Other threads can not get the lock and wait until the first thread queries the data, and then do the cache. 2, you can set the cache expiration time of the burst to permanent.

5, Problem 4, cache avalanche

A cache avalanche occurs when a cache component in normal use is suddenly unavailable. As a result, the system processing capacity is greatly reduced. In this case, when a large number of requests need to be processed, the database pressure is too high and the system crashes. There are many situations that can cause the cache to suddenly become unavailable, such as cache hotspot data invalidation, cache node failure, and so on

Solution: Cache warm-up is to load the relevant cache data directly into the cache system after the system goes online. This avoids the problem of first querying the database and then caching the data when the user requests it. Set different expiration times so that the cache invalidates as evenly as possible. Set the hotspot data to never expire.

5. Cache test points

These problems are inevitable in a system that uses a caching scheme. Before the test, it is necessary to know clearly the cache implementation scheme and update strategy adopted by the system under test, and write specific test cases according to the characteristics of the scheme and strategy.

Cache problems are not easy to be found during the test process, because the users who use the system in the test environment are all testers, and the pressure on the system is very small. Only in the load test scenario, cache problems may occur.

Test scenario:

1. Impact on services caused by data inconsistency

2. Whether to consider the system processing scheme in the case of cache breakdown

3. Whether the processing scheme of the system in the case of cache penetration has been considered

4. Whether the system can hold up under normal business volume when the service of cache component is abnormal or all the caches are invalid.

Welcome to follow my subscription number, will regularly share some articles about the test related, have a question also welcome to discuss learning!