Previously

The first two articles (Cache stability and cache correctness) discussed the “stability” and “correctness” of caches. The remaining common cache problems are “observability” and “specification Landing & Tool Building”.

  • The stability of
  • correctness
  • observability
  • Standardize landing and tool construction

After last week’s post, many students discussed the questions I left in depth. I believe that after in-depth thinking, you will have a deeper understanding of cache consistency!

First of all, there are a lot of discussions in each Go group and Go-Zero group, but no one has found a very satisfactory answer.

Let’s examine several possible solutions to this problem:

  • Use distributed locks to make each update an atomic operation. This method is the least desirable, which is equivalent to throwing away martial arts and giving up high concurrency capability to pursue strong consistency. Don’t forget that I emphasized in my previous article that “this series of articles is only for high concurrency scenarios that are not required to pursue strong consistency, financial payment and other students make their own judgment”, so we give up this solution first.

  • Removing A from the cache plus A delay, such as one second before doing so. The downside is that in order to solve this extremely low probability situation, all updates will only fetch old data in 1 second. It’s not ideal, and we don’t want to use it.

  • Delete A from the cache and set A special placeholder instead, and have B set the cache using redis setnx instruction, and then request the cache again when subsequent requests encounter this special placeholder. This method is equivalent to adding a new state when deleting the cache, as shown in the figure below

    Is it coming around again, because A request must force the cache or determine if the content is A placeholder when it encounters A placeholder? So it doesn’t solve the problem.

Let’s take a look at how Go-Zero dealt with this situation. Isn’t it surprising that we chose not to deal with it? So let’s go back to the beginning and analyze how this happens:

  • The read request data is not cached (never loaded into the cache or the cache is invalid), triggering a DB read
  • There is an update operation to the data
  • The following order must be met: B requests to read DB, A requests to write DB, A requests to delete cache, and B requests to set cache

As we all know, the DB write operation needs to lock the row record, which is a slow operation, while the read operation does not, so the probability of this kind of situation is relatively low. In addition, we have set expiration time, which is extremely rare in real life. To solve this problem, we need to ensure consistency through 2PC or Paxos protocol. I think this is not the way people want to use, it’s too complicated!

In my opinion, the most difficult part of making architecture is to know how to trade off. Finding the balance point of the best income is a test of comprehensive ability. Of course, if you have any good ideas, you can contact me through the group or the public number, thank you!

This article is the third in a series on cache monitoring and code automation

Cache observability

The previous two articles, we can solve the problem of the stability of the cache and data consistency, at this point we are already fully enjoy the cache system of value, solve the problem from zero to one, so we need to consider is how to further reduce the cost, determine which cache brings real business value, which can be removed, thereby lowering the cost of the server, Which caches do I need to increase server resources? What is the QPS of each cache? What is the hit ratio?

The above graph shows the cache monitoring log of a service. It can be seen that the cache service has 5057 requests per minute. 99.7% of the requests hit the cache, and only 13 of them hit the DB. From this monitoring, you can see that the cache service reduces DB stress by three orders of magnitude (90% hits are one order of magnitude, 99% hits are two orders of magnitude, and 99.7% is almost three orders of magnitude), which shows that the cache benefit is quite good.

If, on the other hand, the cache hit rate is only 0.3% and there is no benefit, then we should remove the cache, both to reduce system complexity (don’t add entities if not necessary) and to reduce server costs.

If the QPS of the service is particularly high (enough to put a lot of strain on the DB), then if the cache hit ratio is only 50%, which means we reduce the stress by half, we should consider increasing the expiration time to increase the cache hit ratio depending on the business situation.

If the QPS of the service is particularly high (enough to stress the cache) and the cache hit ratio is high, then we can consider increasing the QPS that the cache can carry or adding in-process caches to reduce the cache stress.

All of this is based on cache monitoring, and once it’s observable, we can do further targeted tuning and simplification, and I always say “no measurement, no optimization”.

How do you get the cache to be used properly?

Those of you who are familiar with the go-Zero design, or who have seen my shared videos, may remember how I often talk about “tools over conventions and documentation.”

For cache, knowledge is very diverse, each person to write the cache code will be different style, and all the knowledge is very difficult to write right, like me for so many years of writing programs, let me write all the knowledge is still very difficult. So how does Go-Zero solve this problem?

  • Encapsulate as many generic solutions as possible into the framework. So you don’t need to worry about the whole cache control process, as long as you call the right method, there is no possibility of error.
  • From SQL to CRUD + Cache code are generated through tools. It avoids having to write a bunch of structure and control logic based on the table structure.

This is a CRUD + Cache generation specification cut from go-Zero’s official bookstore example. We can provide the required schema to GoCTL via the specified SQL file or datasource, and then the model subcommand of GoCTL can generate the required CRUD + Cache code in one click.

This ensures that everyone writes the same cache code. Can tool generation be different? 😛

To be continued

This article with everybody together to discuss the cache of observability and code automation, let me share with you the next article how we refine and abstract general solution of cache, you can know in advance the design of a clustered index, his first to think about the cache how to do it, after all, after a deep thinking, your understanding will be more profound!

The solutions to all of these problems are included in the Go-Zero Microservices framework. If you want to get a better understanding of the Go-Zero project, please go to the official website for detailed examples.

Video Playback Address

ArchSummit Architect Summit – Cache architecture design for massive concurrency

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!

Wechat communication group

Pay attention to the public account of “micro-service Practice” and click on the exchange group to obtain the QR code of the community group.