High Concurrency System Design (3) — Conformance

Caches are standard in many applications and can provide significant performance gains. However, few people can make good use of the cache, the use of cache services, with the iteration of requirements will inevitably fall into a vicious circle: business side

  1. Added caching to optimize interface performance
  2. The same interface is difficult to develop new requirements due to its high complexity, poor performance, and lack of maintainability

Ops side

  1. Increase storage and query capacity to cope with usage pressures
  2. Cost pressures forced the business to optimize again to improve cache utilization

Under the repeated torturing, the system is difficult to maintain, and eventually has to move towards the overall “reconstruction”.

For example

Take Weibo as an example. When opening Weibo, the main data on the page consists of: classification list (left); List of tweets (middle); Personal information; Number of followers, fans and microblogs (far right).

According to the general background design guidelines, assume that the data required by the page is disassembled as follows:

A few points need to be highlighted:

  1. Micro-blog is not equal to micro-blog text

    Microblog refers to microblog ID, microblog body refers to microblog content.

  2. Twitter lists are also data

    The microblog list represents the collection of microblog ids

In How to Design RPC Interfaces, we mentioned the idea that all data is equal to ID + Content, and ids can exist as collections.

Assume that the query of microblog list is a separate API interface, and various data are expressed in the system as post (microblog), profile (personal information), STAT (count of likes, comments, etc.). Then the usual implementation is:

SELECT.FROM post LEFT JOIN profile LEFT JOIN stat ...
Copy the code

The two large tables are directly related. If not, add a cache and cache the query results.

Is there really no way out of this cycle during development? The answer is of course no, the purpose of this article is to follow up the concepts and relations of System Design, how to Design RPC interfaces, and talk about how to design so that data can be landed in the cache.

Structured cache

To see how this works, you can actually split the above query:

SELECT * FROM post WHERE.SELECT * FROM profile WHERE id in(...).SELECT * FROM stat WHERE id in(...).Copy the code

A lot of people will shake their heads when they see this, which leads directly to one API call and N database queries. So how can you break it up like that?

In fact, this is not the case. Because of the cache, both of the last two database queries hit the application cache, resulting in only one simple query to the database. Consider the worst-case scenario where the application cache does not hit, and the latter two database queries also have a high probability of hitting the data cache. At the same time, as the business iterates, you can safely use the composite pattern to continuously combine other data without worrying about the height of complexity. Thirdly, considering the large number of visits to KOL’s microblog, caches can be added continuously up the structure tree (for example, on the surface of the microblog column). Finally, you can combine local caches if the distributed cache is too stressful.

Best of all, all of the optimization points mentioned above can be implemented using a composite pattern without having to drastically tweak the code and avoid the development, optimization, and refactoring cycle.

Before the split, due to the use of SQL associated operations, there will be continuous challenges in the business development process:

  1. Associated query may cause extensive table scanning, frequent DISK I/OS, and poor performance
  2. Low cache hit ratio
  3. Business iterations, additional queries for additional data, and complexity piling up

conclusion

At the end of this article, several key points mentioned are summarized again:

  1. There are three and only three modes of data query

    1. According to the condition, paging query ID list
    2. Query content by ID
    3. Batch query content based on the ID list
  2. All structures have only two relationships

    1. Juxtaposed (fraternal) relations
    2. Father and son
  3. Structural and composite patterns are effective ways to deal with complexity

    Caching is just a form of complexity

Of course, this is not without cost. Obviously, if the data is accessed so infrequently that very few results hit the cache, the effect will be minimal. Sorting out data relationships, structure, and implementing code in a split form can take a lot of time, and in today’s fast-paced, hold now and optimize world, it’s easy to make an either-or decision.

Author: Cyningsun Author: www.cyningsun.com/02-18-2021/… Copyright notice: All articles on this blog are licensed under CC BY-NC-ND 3.0CN unless otherwise stated. Reprint please indicate the source!