Careful into, the author of high concurrency to do less (not done), this water is too deep, what high concurrency, big flow of things are virtual, the author is too young, without that experience, can not grasp. System only a few QPS, happy happy line, not PK, PK civilization.

What is high concurrency

High concurrency is when the system processes many requests at once.

High concurrency is a results-oriented thing. For example, common high concurrency scenarios include: Double 11 of Taobao, snatching tickets during Spring Festival travel rush, hot news of Weibo V, etc. These typical scenarios do not emerge suddenly, but gradually emerge with the development of business development. For example, in the global carnival season of Taobao Double 11 in 2020, the peak value of order creation reached a staggering 583,000 transactions per second. Four years ago, in 2016, this figure was about a quarter. Four years ago, this figure was impossible to test, but it was certainly not so exaggerated.

A high-concurrency business scenario emerges, and with it the architecture to support this high-concurrency business scenario — technology needs to serve the business, and the business forces technology to evolve. Highly concurrent architectures are not the brainchild or brainwave of some genius; they evolve as the business evolves. To use a metaphor, the famous autumn mountains, before the old driver.

So how much concurrency is high?

There is no specific standard for this. It is not enough to just look at the data. It should be combined with specific scenarios. It cannot be said that 10W QPS is high concurrency, but 1W QPS is not high concurrency. An information flow scenario involves complex recommendation models and various human policies, and its business logic can be more than 10 times more complex than a seckill scenario. Different business scenarios and execution complexity vary, so it is meaningless to look at the amount of concurrency.

The conclusion is that high concurrency is unpredictable and should be combined with specific business scenarios. No high concurrency scenario, no high concurrency architecture.

High concurrency target

The macro goal

High concurrency does not mean high performance alone. From a macro point of view, high concurrency system design has three goals: high performance, high availability, and high scalability. It is the so-called “three high”, the three high is not isolated, but mutual support.

1. High performance: Performance reflects the parallel processing capability of the system. With limited hardware input, improving performance means saving costs. At the same time, performance also reflects the user experience, with response times of 100 milliseconds and 1 second respectively, giving the user a completely different experience.

2. High availability: indicates the time when the system can provide normal services. A year-round non-stop, trouble-free; Another line every three or five accidents, downtime, the user must choose the former. In addition, if the system is only 90% usable, it can be a significant drag on business.

3. High expansion: indicates the expansion capacity of the system. Whether the capacity can be expanded in a short time during peak traffic hours and handle peak traffic more smoothly, such as singles’ 11 activities, celebrity divorce and other hot events.

These three goals need to be considered as a whole because they are interrelated and even affect each other.

For example, if you want to scale your system, you need to design services to be stateless. This cluster design ensures high scalability, but also indirectly improves the performance and availability of the system.

Another example: In order to ensure availability, it is common to set a timeout on the service interface, in case a large number of threads block and cause an avalanche of slow requests. What is a reasonable timeout? In general, we set it up based on the performance of the dependent service.

Specific goals

Performance indicators

Performance indicators Performance indicators are used to measure existing performance problems and are important indicators for high concurrency. Common performance indicators and traffic indicators are as follows

QPS/TPS/HPS: QPS is the number of queries per second, TPS is the number of transactions per second, HPS is the number of HTTP requests per second. The most commonly used metric is QPS.

It is important to note that concurrency and QPS are different concepts. Concurrency refers to the number of requests the system can handle at the same time, reflecting the load capacity of the system.

Response time: the time from the time a request is sent to the time a response is received. For example, an HTTP request takes 100ms for a system to process. This 100ms is the system response time.
Average response time: most commonly used, but the defect is obvious and insensitive to slow requests. For example, for 10,000 requests, 9900 of which are 1ms and 100 of which are 100ms, the average response time is 1.99ms. Although the average time is only increased by 0.99ms, the response time of 1% requests has increased by 100 times.
TP90 and TP99: the response time is sorted from small to large. TP90 represents the response time in the 90th quartile. The larger the quartile value is, the more sensitive it is to slow requests.

RPS (Throughput) : The number of requests processed per unit of time, usually determined by QPS and concurrency. Typically, throughput and response time are taken into account when setting performance goals, such as AVG under 50ms and TP99 under 100ms at 10,000 requests per second. For high concurrency systems, AVG and TP quantile values must be considered simultaneously.

In addition, from a user experience perspective, 200 milliseconds is considered the first cut-off point where the user will not feel the delay, and 1 second is the second cut-off point where the user will feel the delay but will accept it.

Therefore, for a healthy high concurrency system, TP99 should be controlled within 200 ms, and TP999 or TP9999 should be controlled within 1 second.

PV: Aggregate page views, i.e. page views or clicks, the number of pages a visitor visits in a 24-hour period.
UV: unique visitors, that is, the same visitors visit the site many times within a certain period of time, only counted as an independent visitor.
Bandwidth: To calculate the bandwidth size, you need to pay attention to two metrics, peak traffic and average page size.

Daily website bandwidth can be roughly calculated using the following formula:

Daily website bandwidth = PV/statistical time (in seconds) * average page size (in kB) *8

The peak is usually a multiple of the average;

QPS is not equal to the number of concurrent connections. QPS is the number of HTTP requests per second, and the number of concurrent connections is the number of simultaneous requests processed by the system:

Availability metrics

High availability means that the system has a high failure free operation capability. Availability = average failure time/total system running time.

For most systems. Two nines are basically available (you may die if you don’t reach development and operations), three nines are high available, and four nines are high available with automatic recovery. Getting to three nines and four nines is hard. There are so many factors affecting usability that it’s hard to control, and it takes good technology, a lot of money to invest in equipment, a lot of responsibility on the part of the engineer, and even a bit of luck.

Scalability metrics

In the face of sudden traffic, it is not possible to temporarily modify the architecture, and the fastest way is to add machines to linearly increase the system’s processing capacity.

For a business cluster or base component, scalability = performance increase ratio/machine increase ratio. The ideal scaling capability is: resource increase several times, performance increase several times. Typically, scalability is maintained at more than 70%.

But from the overall architecture perspective of a high-concurrency system, the goal of scaling is not just to design services to be stateless, because when traffic increases by 10 times, business services can be rapidly expanded by 10 times, but the database may become a new bottleneck.

Stateful storage services like MySQL are often technical difficulties for scaling, and if the architecture is not planned in advance (vertical and horizontal split), it will involve the migration of large amounts of data.

We need to think about system scalability in terms of the overall architecture, not just the business server. So databases, caches, dependent third parties, load balancing, switch bandwidth, and so on are all factors that need to be taken into account when scaling systems. We need to know which factors are going to be our bottlenecks once the system reaches a certain level of concurrency, so we can scale accordingly.

Highly concurrent architecture evolution

Everyone is not born an old driver, architecture is not built to support high concurrency. Let’s take a look at a classic example of architecture evolution — Taobao, which truly demonstrates that “good architecture is evolved, not designed”.

The following is the evolution of Taobao’s architecture from 2003 to 2012 described in The Decade of Taobao Technology.

Personal website

At the beginning of Taobao, there were only about ten team members, and faced with a once-in-a-lifetime business opportunity, so it was required to go online as soon as possible (the actual use of less than a month), so how did taobao’s great people do it?

— Buy one.

The original Taobao bought a website with such a structure: LAMP (Linux+Apache+MySQL+PHP). The architecture of the whole system is as follows:

The resulting website looks like this:

Because commodity search occupies database resources, alibaba’s search engine iSearch was introduced later.

Oracle/ Alipay/Wangwang

The rapid development of Taobao and the rapid increase in traffic and transaction volume have brought new problems to the technology — MySQL can not resist. How to do? You want to do something? No, Taobao bought Oracle database. Of course, this also takes into account the reason that there are Oracle leaders in the team.

The schema after replacing the database:

SQLRelay is an open source connection pool agent that can not afford to use a commercial connection pool. This agent service often deadlocks. How to solve this problem? Human operation and maintenance, engineers 24 hours on standby, problems quickly restart SQL Relay service. 😂 😂

Later, in order to optimize the Storage, NAS (Network Attached Storage) was purchased. The NAS Storage of NetApp was used as the Storage device of the database, and Oracle RAC (Real Application Clusters) was added. Real-time application cluster) to achieve load balancing.

Java 1.0 times

In 2004, Taobao had been running for a year, but the SQLRelay problem mentioned above could not be solved, so the database had to use Oracle, so WE decided to change the development language.

Without slowing down the development of the existing business, smooth replacement of the overall structure, taobao was still a challenging thing. So what? Taobao’s solution was to hire the big guns of Sun.

At that time, due to the problems of Struts 1.x, Taobao developed a set of MVC framework. Sun was pushing EJBs, so EJB was included in the architecture.

Java 2.0 times

Before, the main idea of Taobao’s architecture was “buy”. With the development of business, by 2005, “buy” had become difficult to solve the problem. It was necessary to adjust and optimize the whole architecture, and comprehensively consider the problems of capacity, performance and cost.

In The Java era 2.0, we mainly did data repository, abandoned EJB, introduced Spring, added cache, added CDN, etc.

Java 3.0 times

The biggest characteristic of Java Era 3.0 is that Taobao began to change from commercial to “self-research”, and began to truly create its own core technology, such as cache storage engine Tair and distributed storage system TFS. The search engine iSearch has also been upgraded. Taobao architecture with self-developed technologies:

Distributed Age 1.0

By 2008, Taobao’s business had grown further.

The capacity of the whole main station system has reached a bottleneck, with more than 100 million goods, more than 250 million PV and more than 50 million members. At this time, the number of Oracle connection pool is not enough, the capacity of the database to the limit, even the upper system and machines can not continue to expand, we have to continue to split the bottom of the basic services, expand from the bottom, the upper level can expand, which can accommodate the growth of the next three or five years.

Taobao began to gradually split the business module and service transformation. For example, it splits into commodity centers, commodity centers, and so on. At the same time, some self-developed middleware is introduced, such as distributed database middleware, distributed message middleware and so on.

The book “The Decade of Taobao Technology” only describes the year 2012, which is the distributed era. The figure above is a picture drawn according to reference [8].

In the blink of an eye, 2012 has quickly passed ten years. In this decade, Alibaba has gradually entered its heyday, and the technology is also surging, with a large number of talents. Finer grained microservices, containerized technology that isolates gaps, rapidly scaling cloud platform technology… If the author of This Decade of Taobao Technology can write another decade, it must be very wonderful.

According to reference [10], the following taobao sertization gradually evolved to cloud platform architecture. Because it was really difficult to find data, and the internal architecture complexity of Taobao was enough to write a book at this time. Therefore, the following architecture evolution refers to the evolution of high-concurrency distributed architecture on the server side, which is an architecture evolution conducted by an outstanding person taking Taobao as the simulated object. Although it is not the real evolution of Taobao’s architecture technology, it is also worthy of reference.

Here we skip the micro service architecture – distributed 2.0 times, the service itself is more fine-grained, lightweight value-chain, insert a about the service here is very interesting – Martin brother old things people say that design is not in conformity with the concept of service-oriented, hence he invented to create a flexible service theory, then someone says: Mr. Ma, you are not following the principles of microservices architecture design again. Well, if you say something that doesn’t fit, I’ll change the micro service theory right away.

Containerization age

The most popular container technology is Docker, and the most popular container management service is Kubernetes(K8S). Applications/services can be packaged as Docker images, which can be dynamically distributed and deployed through K8S. Docker image can be understood as a minimum operating system that can run your application/service, which puts the application/service running code, and the running environment is set up according to the actual needs. After the whole “operating system” is packaged as an image, it can be distributed to the machines where relevant services need to be deployed. Directly starting the Docker image can get the service up, making the deployment, operation and maintenance of the service simple.

Before the promotion, the existing machine cluster can be divided into servers to start the Docker image to enhance the performance of the service. After the promotion, the image can be closed without affecting other services on the machine.

Cloud Platform era

At the time of servitization, Taobao has evolved into a cloud platform architecture.

So-called cloud platform, is the huge machine resources, through a unified resource management, the abstract as a resource as a whole, the above may be applied for on-demand dynamic hardware resources (such as CPU, memory, network, etc.), and provides a common operating system above, provide the commonly used technology components (such as Hadoop technology stack, MPP databases, etc.) for the use of the user, It even provides developed applications that users can solve their needs (such as audio and video transcoding services, email services, personal blogs, etc.) without having to relate to what technology is being used inside the application.

To summarize: the high concurrency architecture was forced to some extent. Who would have thought that Taobao abandoned PHP because it could not solve the problem of database connection pool? Architecture evolution is like the water of the West Lake — the water of the West Lake, the tears of engineers, is easy to say, how many fires have been put out, how many holes have been filled. We outsiders see the lake, the water is very deep 🐶.

High concurrency architecture implementation

There are two main directions for a system to withstand more concurrency:

Vertical expansion:

1. Improve the hardware performance of a single machine: increase the memory, CPU cores, storage capacity, or upgrade the disk to SSD heap hardware to improve

2. Improve the software performance of a single machine: use cache to reduce I/O times, and use concurrent or asynchronous methods to increase throughput.
Scale-out: There is always a limit to the performance of a single machine, so eventually you need to introduce scale-out and cluster deployment to further improve concurrent processing capabilities.

1. Hierarchical architecture: This is the premise of horizontal expansion, because high-concurrency systems tend to have complex businesses. Hierarchical processing can simplify complex problems and make horizontal expansion easier.

2. Horizontal expansion of each layer: stateless horizontal expansion and stateless fragment routing. Service clusters can be stateless, while databases and caches are stateful. Therefore, partition keys must be designed to fragment storage. You can also improve read performance by synchronizing primary and secondary data and separating read and write data.

To use an analogy, if you are going to fight ten big men, you probably can’t beat them. The best result is that they can’t beat you. So that’s when you have to do something. The first way is to work hard, and then get fully armed, and hopefully, that’s vertical expansion; The second way, no, you see a lot of people opposite, you call nineteen brothers, and then twenty of you beat ten of them, alas, this looks like can beat, this is horizontal expansion; There is a third method not commonly used, you find a door, each time put a big man in, knocked down one and then put down another, this is the practice of cutting peak flow limit.

Let’s take a look at an approximate typical architecture that supports three heights:

Now, let’s look at some of the key technologies at each layer, from the top down.

The network layer

Many machines

Stacking machine is not everything, not stacking machine is absolutely impossible.

We worked hard to upgrade the architecture so that our services could scale quickly and horizontally. The foundation of horizontal expansion is also to have a certain number of machines with certain performance.

In the same analogy, you have to fight ten big men, and when you become Master IP, you suddenly find that the other children have grown up, and the number of them is 2. At this time, you still have to call yourself brother.

General dog large factories in all parts of the country have machine rooms, there may be two in Beijing, the request of different places to different machine rooms, and then to different clusters, and then to different machines, so a uniform, within the scope of the service can carry. Let’s have a rough idea of how to estimate the number of machines we’ll need.

Calculate the number of deployed servers based on QPS and PV

PV calculation of a single server per day:

Server computing:

Peak QPS and machine calculation formula

How it works: 80% of daily visits are focused on 20% of the time, which is called peak time

Formula :(total PV * 80%)/(seconds per day * 20%) = peak time requests per second (QPS)

Machines: Peak time QPS per second/QPS per machine = required machines.

Generally, companies with large flow business have realized multi-machine rooms, including multi-machine rooms in the same city, multi-machine rooms across the city and multi-machine rooms across the country. To ensure availability, companies with deep pockets will have plenty of redundancy, typically with twice the number of machines needed to calculate the peak. In order to save costs, we can also consider the current popular cloud platform. Before the hot events, Weibo rented a lot of cloud servers from Ali Cloud.

DNS

DNS is the first gateway for request distribution and implements geographic level balancing. Dns-server Multiple RESOLUTION IP addresses are configured for a domain name. Each DNS resolution request accesses the DNS-server. The IP address that is close to the user is usually returned, and the user accesses the IP address. For example, a user in Beijing accesses the computer room in Beijing, and a user in Nanjing accesses resources in Nanjing.

Generally, DNS is not used for machine-level load balancing because IP resources are too precious to be created. For example, Baidu search may require tens of thousands of machines, so it is impossible to configure public IP addresses for each machine. Generally, only nodes with limited public IP addresses are configured, and machine-level load balancing is performed on these nodes. In this way, only LAN IP addresses need to be configured for machines in each equipment room.

The advantages of DNS load balancing are universal (universal) and low cost (apply for a domain name and register DNS).

Disadvantages are also obvious, mainly reflected in:

The DNS cache takes a long time. Even after a service machine is deleted from the DNS server, many users continue to access the deleted machine due to the cache.
DNS is not flexible enough. The DNS cannot detect the status of the back-end server and can only implement load balancing based on the configured policy rather than a flexible load balancing policy. For example, one machine is much better configured than the others, and in theory more requests should be allocated to it, but DNS cannot do that.

Therefore, for delay and fault sensitive services, competent companies may try to implement the http-DNS function, which uses THE HTTP protocol to implement a private DNS system. Http-dns is mainly applied to services provided through apps, because flexible server access policies can be implemented on the App side. It is more difficult to implement Web services, because URL resolution is completed by the browser. Only Javascript access can be as flexible as App access.

CDN

CDN is to solve the “last mile” effect when users access the network. In essence, it is a kind of “space for time” acceleration strategy, that is, content is cached in the nearest place to users, and users access the cached content instead of the real-time access content of the site.

The CDN is deployed in the equipment room of the network operator, who is also the network provider of the end user. Therefore, the first hop of the user’s request route reaches the CDN server. When the resource requested by the browser exists in the CDN, it is directly returned to the browser from the CDN, and the response is returned in the shortest path to speed up user access.

The following is a simple DIAGRAM of CDN request flow:

CDN can cache generally static resources, such as pictures, files, CSS, Script scripts, static web pages, etc., but these files are frequently accessed. CDN can greatly improve the opening speed of web pages.

Reverse proxy layer

We call this layer the reverse proxy layer, or the access layer, or the load layer. This layer is the entry point of traffic and is a critical layer of the system against concurrency.

The ideal situation would be for the two of you to fight each other. However, you are so excited that you rush to the front and get beaten up by the ten big guys in an instant.

The reverse proxy distributes the traffic to ensure that the traffic that ends up on each service is within the range that the service can handle.

Nginx, LVS, F5

DNS is used for load balancing at the geographic level, while Nginx, LVS, and F5 are used for load balancing at the machine level within the same location. Among them, Nginx is the 7-layer load balancing of software, LVS is the 4-layer load balancing of kernel, and F5 is the 4-layer load balancing of hardware.

The difference between software and hardware lies in performance, hardware is far higher than software, Ngxin performance is ten thousand level, the general Linux server installed a Nginx can about 50 thousand/second; The LVS has a performance of 100,000 stages and is said to reach 800,000 / SEC; F5 performance is in the millions, ranging from 2 to 8 million MBPS.

Although the hardware has high performance, the cost of a single hardware is also high. Even the cheapest F5 is several hundred thousand. However, if the cost is calculated according to the same order of request, in fact, the hardware load balancing device may be cheaper. But with Nginx, maybe 20, so the cost of using F5 is lower. Therefore, in general, if performance requirements are not high, software load balancing can be used; If performance requirements are high, hardware load balancers are recommended.

The difference between Tier 4 and Tier 7 is protocol and flexibility. Nginx supports HTTP and E-mail protocols, while LVS and F5 are layer 4 load balancing, protocol independent, almost all applications can do, such as chat, database, etc. At present, many cloud service providers have provided load balancing products, such as SLB of Ali Cloud and ULB of UCIoud, which can be purchased directly by small and medium-sized companies.

For development, Nginx is usually the only layer to focus on.

Typical architecture of load balancing

Load balancing mechanisms like the one mentioned above can be used in combination.

DNS load balancing is used to implement load balancing at the geographical level and hardware load balancing is used to implement load balancing at the cluster level. Software load balancing is used to implement machine-level load balancing.

The load balancing of the whole system is divided into three layers.

Geo-level load balancing: www.xxx.com Is deployed in three equipment rooms in Beijing, Guangzhou, and Shanghai. When a user accesses the IP address of the equipment room, the DNS returns the IP address of the equipment room in Guangzhou based on the user’s geographical location. In the figure, the IP address of the equipment room in Guangzhou is returned, so that the user accesses the equipment room in Guangzhou.
Cluster-level load balancing: The load balancing device in guangzhou machine room is F5. After receiving the user request, F5 performs cluster-level load balancing and sends the user request to one of the three local clusters. We assume that F5 sends the user request to “Guangzhou cluster 2”.
Machine-level load balancing: Load balancing in Guangzhou cluster 2 uses Nginx. After receiving a user request, Nginx sends the request to a server in the cluster. The server processes the request and returns the response.

Nginx load balancing

Our main concern is the load on the Nginx layer. LVS and F5 layers are usually managed by network operations engineers.

Our main concerns about load balancing are as follows:

Upstream server configuration: Configure the upstream server using the upstream Server
Load balancing algorithm: load balancing mechanism when multiple upstream servers are configured.
Retry failure mechanism: Configures whether to retry other upstream servers when timeout occurs or the upstream server does not survive.
Server heartbeat check: Health check or heartbeat check of upstream servers.

Upstream server: load balancing server set up to be accessed by the nginx proxy.

Load balancing algorithm

There are many load balancing algorithms. Nginx mainly supports the following load balancing algorithms:

1. Polling (default)

Each request is allocated to a different back-end service one by one in chronological order. If a back-end server crashes, the faulty system is automatically removed, so that users’ access is not affected.

2. Weight (polling weight)

The larger the value of weight is, the higher the access probability is, which is mainly used when the performance of each back-end server is unbalanced. Or just set different weights in the case of master and slave to achieve reasonable and effective use of host resources.

3, ip_hash

Each request is allocated according to the hash result of access IP, so that visitors from the same IP address can access a back-end server, and effectively solve the session sharing problem existing in dynamic web pages.

4, fair

Fair is a more intelligent load balancing algorithm than weight and IP_hash. The fair algorithm can perform load balancing intelligently according to the page size and load time. That is, the request is allocated according to the response time of the back-end server, and the request with short response time is allocated first. Nginx does not support fair itself, so if you want this scheduling algorithm, you must install the upstream_fair module.

5, url_hash

The efficiency of the backend cache server can be further improved by allocating requests based on the hash result of the URL visited and directing each URL to a back-end server. Nginx does not support URl_hash. If this scheduling algorithm is required, you must install the Nginx Hash software package.

Failure to retry

Nginx has two main parts of the failed retry configuration: upstream Server and proxy_pass.

Specify each upstream server by configuring max_fails and FAIL_TIMEOUT for the upstream server. If max_FAIL requests fail within the fail_timeout period, the upstream server is considered unavailable or not viable, and the upstream server will be removed. After fail_timeout, the server is added to the active upstream server list and retry.

Health check

Nginx uses a lazy policy for health checks on upstream servers by default. Nginx commercial edition provides healthcheck as a primary healthcheck. Nginx_upstream_check_module (github.com/yaoweibin/n… Module) to perform proactive health checks.

Nginx_upstream_check_module supports TCP and HTTP heartbeats for health checks.

Flow control

The flow distribution

Traffic distribution is the basic function of the access layer.

Flow switch

I heard a friend said an interesting thing, their company will flow from one machine room to another machine room, overturned, all engineers operation and maintenance platform a red, the whole company gathered around, the operation and maintenance team is very lose face.

Traffic switching refers to the need to switch traffic to different equipment rooms and servers in certain situations, such as equipment room failure, optical fiber cutting, server failure, gray release, A/B and other operation and maintenance test scenarios.

Like the typical architecture of load balancing we mentioned above, different levels of load are responsible for switching between different levels of traffic.

DNS: Switches the equipment room entrance.
HttpDNS: In major APP scenarios, traffic entry is allocated on the client to bypass carrier LocalDNS and achieve more accurate traffic scheduling.
LVS/HaProxy: Switch over the faulty Nginx access layer.
Nginx: Failover failure application layer.

In addition, some applications can also switch in the Nginx access layer for easier switching, using Nginx for some traffic switching, but not through LVS/HaProxy for switching.

Current limiting

Traffic limiting is an important means to ensure the availability of the system and prevent overloaded traffic from directly hitting the service. Traffic limiting algorithms mainly include token buckets and leaky buckets.

Traffic limiting can be implemented at many levels, such as service layer gateway traffic limiting, message queue traffic limiting, and Redis traffic limiting. These traffic limiting mainly applies to services.

Here we mainly discuss the flow limiting at the access layer, directly limiting the flow at the traffic inlet.

Nginx has two built-in modules: ngx_HTTP_limit_conn_module for connection number traffic limiting and ngx_HTTP_limit_req_moduleo for leak-bucket algorithm

You can also use the Lua traffic limiting module uA-resty **-**limit-traffic provided by OpenResty for more complex traffic limiting scenarios.

Limmit_conn is used to limit traffic of the total number of network connections corresponding to a key. You can limit traffic by IP address or domain name. Limit_req Limits the average rate of requests corresponding to a key. The limit_req mode can be smooth mode (delay) or burst mode (nodelay).

Traffic filtering

A lot of times, a lot of traffic to a site is crawler traffic, or directly malicious traffic.

The request parameters can be verified at the access layer, and if the parameter verification is invalid, the request can be rejected directly, or the request can be routed to a service dedicated to processing illegal requests.

The simplest would be to use Nginx, but the actual scenario would be to use OpenResty to filter crawler User-agent and some malicious IP addresses (configure thresholds by counting IP traffic) and shred them into fixed groups, which would have some degree of manslaughter. Since the public IP address of the company is usually the same, and everyone uses the same public IP address to access the website, the method of IP+Cookie can be considered to plant a unique Cookie to identify the user’s identity in the user’s browser. Cookies should be planted before accessing the service and verified when accessing the service. If there is no Cookie or the Cookie is incorrect, it can be considered to split the Cookie to a fixed group or prompted to enter the verification code before accessing the service.

demotion

Downgrading is also a sharp sword to ensure high availability. The idea of downgrading is to “abandon the car and save the boss”, which means discarding or limiting some non-important services under the circumstance that the global availability cannot be guaranteed.

For example, in the application layer, the configuration center sets the degradation threshold. Once the threshold is reached, the degradation is performed based on different degradation policies.

You can also put the degrade switch in front of the access layer, configure the function degrade development in the access layer, and then perform automatic/manual degrade according to the situation. If the back-end application service is faulty, the access layer is degraded to prevent unnecessary traffic from being sent to the back-end service and allow the application service enough time to recover.

The Web tier

After a series of load balancing, the user finally requests the Web layer service. Web services are developed, deployed and run in web servers to provide services to users.

The cluster

Generally, different services are divided according to service modules. Multiple instances of a service are deployed to form a cluster.

To isolate failures, clusters can be grouped so that problems in one group do not affect the others. Like the more commonly asked seckill, seckill service clusters are usually isolated from regular service clusters.

The three keys to cluster deployment are stateless, split, and servitization.

Stateless: Applications designed to be stateless are easier to scale horizontally.
Split: early design can not be split, but later when the volume of traffic, you can consider the split system by function. The split dimensions are also flexible, depending on the actual situation, such as system dimension, functional dimension, read and write dimension, AOP dimension, module dimension, and so on.
Servitization: separation is more about design, servitization is landing, servitization is generally a problem of service governance. In addition to the most basic remote invocation, consider load balancing, service discovery, service isolation, service limiting, service access blacklists, and so on. There are even details to consider, such as timeouts, retry mechanisms, service routing, fault compensation, and so on.

The Web server

The cost of independently developing a mature Web server is very high, and there are so many mature open source Web servers in the industry, so the Internet industry is basically “take the ism”, choose a popular open source server. Larger companies may do secondary development on the basis of open source server in combination with their own business characteristics, such as Tengine of Taobao, but generally companies just need to understand the open source server, optimize the parameters and adjust the configuration.

The choice of server is mainly related to the development language, such as Tomcat, JBoss, Resin, etc., for Java, Nginx is used for PHP/Python.

For example, Tomcat, the most popular Web server in Java, has a default maximum number of requests of 150, but cluster deployment is ok.

The container

Container is only in recent years began to fire, which is represented by Docker, in BAT level companies have more applications.

Containerization has revolutionized operations and maintenance. Docker starts quickly, occupies almost no resources, and starts and stops at any time. Building automatic and intelligent operation and maintenance based on Docker has gradually become the mainstream way.

Containerization is also a natural fit for the current trend of microservices, where containers isolate microserver processes and applications into smaller instances that use fewer resources and can be deployed more quickly. Combined with container choreography technology, high availability cluster can be built more conveniently and quickly.

The service layer

Development framework

Typically, Internet companies specify a broad technical direction and then use a unified development framework. For example, Java related development framework SSH, SpringBoot, Ruby Ruby on Rails, PHP ThinkPHP, Python Django and so on.

Framework selection, there is a general principle: preferred mature framework, avoid blind pursuit of new technology!

For the average screwworker, the main work is done under this development framework. For the use of development languages and frameworks, it is important to fully understand and use the characteristics of the language and framework.

Taking Java as an example, the author’s development involves an encryption and decryption service call, and the service provider uses JNI technology — simply speaking, C language to write code and provide API for Java to call, which makes up for Java’s relatively lower disadvantage and greatly improves the speed of operation.

At the day-to-day level of service development, there are several things you can do to improve performance:

Concurrent processing, parallelization of serial logic through multiple threads.
Reduce IO counts, such as bulk reads and writes to databases and caches, bulk interface support for RPC, or the elimination of RPC calls through redundant data.
Reducing the size of data packets during I/O includes using lightweight communication protocols, proper data structures, removing redundant fields on interfaces, reducing the size of cache keys, and compressing cache values.
Program logic optimization, such as preloading the judgment logic of the execution process with a high probability of blocking, optimizing the calculation logic of the For loop, or adopting more efficient algorithms
Use of various pooling techniques and pool size Settings, including HTTP request pools, thread pools (consider CPU intensive or IO intensive Settings for core parameters), databases and Redis connection pools, etc.
JVM optimizations, including generation and age sizes, GC algorithm selection, and so on, minimize GC frequency and time.
Lock options, use optimistic locks for read and write scenarios, or consider segmental locking to reduce lock conflicts.

You can improve usability by:

Set appropriate timeout time, retry times and mechanism, timely downgrade if necessary, return bottom data, etc., to prevent the service provider to defeat
Anti-weight design: anti-weight key, anti-weight table and other ways to achieve anti-weight
Idempotent design: Implement idempotent design at the interface level

Service center

When the number of systems is small, inter-system calls are usually recorded directly within each system through configuration files, but when the number of systems is large, this approach becomes problematic.

For example, A total of 10 systems rely on the X interface of system A, and system A implements A new interface Y, which can better provide the functions of the original X interface. If the existing 10 systems are switched to THE Y interface, the configurations of dozens or hundreds of devices in these 10 systems need to be modified and then restarted. It is conceivable that this efficiency is very low.

The realization of service center mainly adopts service name system.

Service Name System

If you read this translation, you will immediately think of DNS, which means Domain Name System. Yes, the properties are basically similar.

The main reason DNS resolves domain names to IP addresses is that we can’t remember too many digital IP’s, so domain names are easy to remember. The Service name system is designed to resolve the Service name to “host + port + interface name “, but like DNS, it is the requester who actually initiates the request.

Under the architecture of microservices, the registry that realizes this function is called. For example, under the Java language system, the open source registry includes Nacos and Ecuraka.

Configuration center

The configuration center centrally manages the configuration of each service.

When there are not many services, each service can manage its own configuration, which is no problem, but when there are hundreds of services, and then the management of each service is a headache.

Therefore, the configuration center is abstracted into a common component to configure multiple systems in a centralized manner.

Under the micro-service architecture system, the open source solutions of the configuration center include SpringCloud Config of SpringCloud and Ali Nacos, etc.

Service framework

Service consumer A needs to query the address of service provider B through the registry and then initiate the call. This seemingly simple process may encounter the following situations, such as:

The registry is down;
Service provider B has a node down.
Network failure between service consumer A and the registry;
Network failure between service provider B and the registry;
The network between service consumer A and service provider B is abnormal.
Some nodes of service provider B perform slowly;
Service provider B has a problem for a short time.

How to ensure that the service consumer successfully invokes the service producer? This is what the service governance framework addresses.

Under the Java language system, the current popular service governance frameworks are SpringCloud and Dubbo.

Take SpringCloud as an example:

Feign encapsulates RestTemplate to implement HTTP request remote calls
Feign encapsulates the Ribbon to implement client load balancing
Euraka cluster deployment enables registry high availability
Registry heartbeat monitoring to update service availability status
Integrate Hystrix to implement circuit breaker mechanism
Zuul acts as an API gateway, providing functions such as route forwarding and request filtering
Config Implements distributed configuration management
Sluth implements call link tracing
ELK is integrated and logs are written asynchronously to Elasticsearch via Kafka queue and viewed visually through Kibana

SpringCloud is a complete suite of microservices solutions known as the “SpringCloud family bucket”. Here is just a brief introduction.

Dubbo mainly provides the most basic RPC functionality.

However, SpringCloud’s RPC uses the HTTP protocol, which may be less performance.

The good news is that “SpringCloud2.0” — SpringCloud Alibaba is popular, and Dubbo can fit perfectly into the SpringCloud ecosystem.

The message queue

Message queues play an important role in high performance, high scale, and high availability architectures.

Message queues are used to decouple services that do not need to be called synchronously or to subscribe to changes that your system cares about. Using message queues enables service decoupling (one-to-many consumption), asynchronous processing, traffic peak-cutting/buffering, and so on.

The service of decoupling

Service decoupling can reduce coupling between services and improve system scalability.

For example, if an order service has multiple downstreams, if message queues are not used, the order service will invoke multiple downstreams. If the requirements were to be added downstream, the order service would have to add a function to invoke the new downstream, which would be annoying.

Once message queuing is introduced, the order service can simply plug order-related messages into the message queue, and the downstream system can simply subscribe.

Asynchronous processing

Asynchronous processing can reduce response time and improve system performance.

With the development of services, the request links of projects become longer. As a result, the response time becomes longer. In fact, some operations do not need to be handled synchronously.

Flow peak clipping/buffering

Flow peak peaking/buffering can improve system availability.

We mentioned traffic limiting at the access layer earlier, and traffic limiting at the service layer can be done through message queues. A gateway request is placed in a message queue, and the back-end service tries to consume the request in the message queue. A timeout request can either return an error directly or wait in a message queue.

The basic function of message queue system is relatively simple, but it is difficult to achieve high performance, high availability, message timing and message transactionality. There are many mature open source implementation solutions in the industry. If the requirements are not high, they can be basically used, such as RocketMQ, Kafka, ActiveMQ, etc.

However, if the business has high requirements for message reliability, timing, and transactality, it is important to delve into these open source solutions and think ahead about possible problems such as repeated message consumption, message loss, message stacking, and so on.

The platform layer

When the service scale is small and the system complexity is not high, support functions such as o&M, testing, data analysis, and management are performed independently by each system or team. As businesses get larger, systems get more complex, and subsystems get more numerous, continuing to implement these supporting functions in a separate way can lead to a lot of duplication. Therefore, it is natural to pull out relevant functions as a public service, avoiding duplication of wheels and reducing communication and collaboration costs caused by irregularities.

Platform layer is the product of service-oriented thinking. Some common functions are separated out so that related business services can only focus on their own business. In this way, the responsibilities of services are clarified and service expansion is convenient.

At the same time, some public platform, and also between the individual service as a whole, such as data platform, the data can be aggregated, need a service before integration may need to call some data more upstream service, but after the introduction of data platform, only need to fetch the data from the data platform is ok, can reduce the response time of services.

Operational platform

The core responsibilities of the operation and maintenance platform are divided into four parts: configuration, deployment, monitoring, and emergency response. Each responsibility corresponds to a stage of the system life cycle, as shown in the figure below:

Deployment: Mainly responsible for releasing the system online. For example, package management, gray release management, rollback, etc.
Monitoring: Mainly responsible for collecting and monitoring related data after the system goes online, so as to discover problems in time.
Emergency: mainly responsible for handling system failure. For example, stop the program, bring down the faulty machine, switch IP, etc.

The core design elements of operation and maintenance platform are “four modernizations” — standardization, platformization, automation and visualization.

Standardization: O&M standards should be formulated to standardize configuration management, deployment process, monitoring indicators, and emergency response capabilities. All systems should comply with o&M standards

Implementation to avoid different systems with different processing methods.
Platformization: Traditional manual operation and maintenance requires a large amount of manpower, is low in efficiency and prone to error. Therefore, on the basis of standardized operation and maintenance,

All o&M operations are integrated into the O&M platform to complete O&M work.
Automation: One of the main reasons for the inefficiency of traditional manual operation and maintenance is the large number of repetitive operations that the operation platform can carry out

After the operation is solidified, the system automatically completes it.
Visualization: The operation and maintenance platform has a lot of data. If all the data are queried manually, the efficiency will be very low. The main purpose of visualization is to improve the efficiency of data viewing.

The test platform

The core responsibility of the test platform is of course the test, including unit test, integration test, interface test, performance test, etc., can be completed in the test platform.

The core purpose of the test platform is to improve the test efficiency, so as to improve product quality, and the key to its design is automation.

Data platform

The core responsibilities of data platform mainly include three parts: data management, data analysis and data application. Each part contains more segments, and the detailed data platform architecture is shown in the figure below:

Data management

Data management includes four core responsibilities: data collection, data storage, data access and data security. It is the basic function of data platform.

Data collection: Collect all kinds of data from the service system. For example, logs, user behavior, business data, etc., are delivered to the data platform.
Data storage: Stores the data collected from service systems to a data platform for subsequent data analysis.
Data access: Responsible for providing various protocols for reading and writing data. For example, read and write protocols such as SQL, Hive, and key-value.
Data security: Data platforms are usually shared by multiple services. Some service-sensitive data needs to be protected to prevent other services from reading or modifying it. Therefore, data security policies need to be designed to protect data.

The data analysis

Data analysis includes data statistics, data mining, machine learning, deep learning and other subdivisions.

Data mining: The concept of data mining itself can be very broad. In order to distinguish it from machine learning and deep learning, data mining here mainly refers to traditional data mining methods. For example, experienced data analysts construct a series of rules based on data warehouse to analyze data and find some hidden laws, phenomena and problems, etc. A classic case of data mining is the discovery of the correlation between beer and diapers in Wal-Mart.
Machine learning and deep learning: Machine learning and deep learning are specific implementation methods of data mining. Since their implementation methods are quite different from traditional data mining methods, data platforms need to be designed independently for machine learning and deep learning.

Data applications

Data has a wide range of applications, both online and offline. For example, recommendations and advertisements belong to online applications, while reports, fraud detection, and exception detection belong to offline applications. “Big data” is the prerequisite for data application to play its value. Only when the scale of data reaches a certain level, can valuable laws, phenomena and problems be found through data-based analysis and mining. If the data does not reach a certain scale, it is usually enough to do good data statistics, especially for many start-ups, they do not need to refer to BAT to build their own data platform at the very beginning.

Management platform

The core responsibility of the management platform is permission management, whether it is business system (such as Taobao), middleware system (such as message queue Kafka), or platform system (such as operation and maintenance platform), all need to be managed. If each system to achieve their own permission management, the efficiency is too low, a lot of repeated work, so the need for a unified management platform to manage all the system permissions.

Speaking of the “platform”, I can’t help but think of the “central platform” which has been repeatedly touted and denounced in recent years. In fact, the data platform in the platform has been similar to the so-called “data center”. “Middle stage” is a conceptual thing, specific how to achieve, there is no unified standard scheme. The company where the author works has also built a central platform. Taking “Data Central Platform” as an example, the construction of our data central platform is mainly for data sharing and data visualization. In short, it is to gather some data of various business modules. Simple to say, but difficult to implement, timely data aggregation, rapid response to data sharing…… The final solution was to purchase some of Alibaba’s commercial components, which cost a lot of money, but the effect was, not to say, similar.

Buffer layer

You can improve the performance of the storage system by various means. However, in some complex service scenarios, the performance of the storage system cannot be improved solely.

Most online businesses read more than they write. For example, for Internet services such as Weibo, Taobao and wechat, reading business accounts for more than 90% of the overall business volume. Take microblog for example: a star sends a microblog, tens of millions of people may browse.

If directly from the DB data, there are two problems, one is the speed of DB query bottleneck, will increase the response time of the system, one is the database itself concurrency bottleneck. Caching is designed to make up for the deficiency of the storage system in the scenario of reading too much and writing too little.

The CDN mentioned above is a kind of cache, which caches static resources.

In terms of the overall architecture, multi-level caching is generally adopted to cache data at different levels to improve access efficiency.

Just to give you a sense of the architecture and the flow, the cache reads level by level, if it doesn’t hit, then it reads the next level, first the local cache, then the distributed cache, and then the distributed cache doesn’t hit, and then it reads the DB.

Distributed cache

In order to improve the availability of cache, distributed cache is generally used. Distributed caching is generally implemented by sharding, which means that data is distributed among multiple instances or servers. The algorithm generally adopts modulus and consistency hashing.

To use the caching mechanism without expiration, you can consider the mode fetching mechanism. During capacity expansion, you usually create a new cluster.

For cached data that can be lost, consider consistent hashing, even if only a small fraction of one instance is lost.

For sharding implementations, consider client-side implementations, or use middleware such as Twemproxy for proxies (sharding is transparent to the client).

If Redis is used, redis-cluster distributed cluster solution can be considered.

Hotspot local cache

If hotspot caches that are frequently accessed are obtained from the remote cache system every time, the remote cache system may have too many requests, high load, or high bandwidth due to heavy traffic. As a result, the cache response may be slow and client requests may time out.

One solution is by hanging more from the cache, where the client reads data from the cache system through a load balancing mechanism. However, it is also possible to store a copy locally in the application/proxy layer where the client resides to avoid accessing the remote cache. Even data such as inventory can be cached locally for a few seconds in some applications to reduce the strain on the remote system.

Although the introduction of cache improves the performance of the system, it also increases the complexity of the system and brings some operation and maintenance costs.

The cache to penetrate

Cache penetration means that the cache does not work. The service system tries to cache the query data, but no data is in the cache. Therefore, the service system needs to query the data again, and the storage system also has no data.

Schematic diagram of cache penetration:

Generally, if a certain data does not exist in the storage system, the corresponding data is not stored in the cache. As a result, users cannot find the corresponding data in the cache and have to query the data again in the storage system each time. Caching in this scenario does not take the load off the storage system.

Normally, the volume of requests to read non-existent data is not very large. However, if some abnormal conditions occur, for example, a hacker attacks and deliberately accesses a large number of services that read non-existent data, the storage system may collapse.

There are two solutions to this situation:

A simple method is to set a default value (either a null value or a specific value) to the cache if the data in the storage system is not found. In this way, the system obtains the default value when it reads the cache for the second time and does not continue to access the storage system.

One needs to introduce bloom filter, its principle is also very simple is to use efficient data structure and algorithm, quickly determine whether the query Key exists in the database, if there is no direct return empty, check DB, refresh KV and return value.

Cache breakdown

Cache penetration and cache penetration are also a little hard to distinguish. Cache penetration means there is no data in the cache and there is no data in the database. Cache penetration means there is no data in the cache and there is data in the database. Cache breakdown refers to the failure of a key in a hot spot. When a large number of concurrent requests are made to the key, a large number of requests are made to read data from the cache. As a result, a high number of concurrent requests are made to the database, causing a sharp increase in database pressure. This phenomenon is called cache breakdown.

Schematic diagram of cache breakdown:

The key of a hot spot fails, causing a large number of concurrent calls to the database. Therefore, we need to solve the problem from two aspects: first, we can consider whether the hot key does not set expiration time; second, we can consider reducing the number of requests made on the database.

There are two main solutions:

Mutex is used to ensure that only one client can query the data of the underlying database at the same time. Once the data is found, it will be cached in Redis to avoid a large number of other requests accessing the underlying database at the same time. This approach blocks other threads, and the throughput of the system decreases
The hot data cache never expires.

There are two ways to never expire:

The physical key does not expire, and the expiration time is not set for the hotspot key
Logical expiration: store the expiration time in the value corresponding to the key. If it is found to be about to expire, build the cache through an asynchronous thread in the background

Cache avalanche

Cache avalanche is when the cache becomes unavailable or a large number of hot keys fail at the same time.

The same result of the two cases is that a large number of requests directly fall on the database. For a high concurrency business system, hundreds of thousands of requests may be received within a few hundred milliseconds. The most serious result is that the database directly breaks down, which may cause a chain reaction and lead to system crash.

The cache avalanche solution can be divided into three dimensions:

In advance:

(1) Uniform expiration: set different expiration time, so that the cache expiration time is as even as possible, to avoid the same expiration time resulting in cache avalanche, resulting in a large number of database access.

② Hierarchical cache: On the basis of the failure of the first-level cache, access to the second-level cache, and the failure time of each level cache is different.

③ The hot data cache never expires.

④ Ensure the high availability of Redis cache and prevent the cache avalanche caused by Redis downtime. You can avoid a complete Redis collapse by using something like a Redis cluster.

Matter:

(1) Mutex: After the cache is invalid, the mutex or queue is used to control the number of threads that read and write the data cache. For example, only one thread is allowed to query the data and write the cache for a certain key, while other threads wait. This approach blocks other threads, and the throughput of the system decreases

② Use circuit breaker mechanism to limit current degradation. When the traffic reaches a certain threshold, a message such as “System congestion” is displayed to prevent excessive requests from hitting the database and causing the database to collapse. In this way, at least some users can use the database normally, and other users can obtain results even after refreshing for several times.

After the event:

(1) Enable Redis persistence mechanism to recover cached data as soon as possible. Once restarted, data can be automatically loaded from disk to restore data in memory.

Storage layer

In order to meet the needs of business development and improve their competitiveness, relational database vendors (Oracle, DB2, MySQL, etc.) have also made a lot of technical optimization and improvement in optimizing and improving the performance of a single database server. However, the speed of business development and data growth is far beyond the optimization speed of database manufacturers, especially after the rise of Internet business, the characteristics of massive users and massive data, a single database server has been difficult to meet business needs, we must consider the way of database cluster to improve performance.

Reading and writing separation

The basic principle of read/write separation is to distribute database read/write operations to different nodes. The following is the basic architecture diagram:

The basic implementation of read/write separation is:

The database server sets up a master/slave cluster, one master/slave, one master/multiple slave can be.
The database host is responsible for read and write operations, while the slave machine is only responsible for read operations.
Database hosts synchronize data to slave machines through replication, and each database server stores all business data.
The service server sends write operations to the database host and read operations to the database slave.

The implementation logic for read/write separation is not complex, but two details introduce design complexity: master-slave replication latency and allocation mechanisms.

Copy the delay

In the case of MySQL, the master/slave replication delay may reach 1 second, or 1 minute if there is a large amount of data synchronization.

The master/slave replication delay causes a problem: If the service server reads data immediately (within 1 second) after writing data to the master server of the database, the read operation accesses the slave machine before the data is copied to the master server, and the latest data cannot be read from the slave machine, service problems may occur.

For example, the microblog information will be synchronized to the auditing system. Therefore, after updating the main database, the MICROBLOG ID will be written into the message queue, and then the queue processor will obtain the microblog information from the database according to the ID and send it to the auditing system. At this time, if there is a delay in the master and slave databases, the weibo information cannot be obtained from the database, and the whole process will be abnormal.

Common solutions to master/slave replication delay:

Redundancy of data

We can send not only the tweet ID, but all the tweets that the queue processor needs when sending a message queue, thus avoiding the need to re-query the data from the database.

Use the cache

We can simultaneously write the microblog data to the cache while writing to the database. The queue processor will give priority to query the cache when acquiring microblog information, so as to ensure the consistency of data.

Second reading

We can encapsulate the API of the underlying database access and read it again after the first reading is found to be not real-time in the slave database. For example, if we do not read the microblog in the slave database through the microblog ID, we will directly read it in the master database for the second time.

Query the library

We can direct all the read and write operations of critical services or services that require real time to the host, and use read and write separation for non-critical services or services that do not require high real time.

Allocation mechanism

There are generally two ways to separate read and write operations and then access different database servers: program code encapsulation and middleware encapsulation.

Program code encapsulation

Program code encapsulation refers to the abstraction of a data access layer in the code (so some articles also called this approach “middle layer encapsulation”), to achieve the separation of read and write operations and database server connection management. For example, simple encapsulation based on Hibernate can achieve read and write separation. The basic architecture is as follows:

The way program code is encapsulated has several characteristics:

Simple implementation, and can do more customized functions according to the business.
Each programming language needs to be implemented once and cannot be used in general. If a business contains multiple subsystems written by multiple programming languages, the workload of repeated development is large.
If a primary/secondary switchover occurs, all systems may need to modify their configurations and restart.

If you don’t want to build your own wheels, you can also use open source solutions. Taobao’s TDDL is a well-known one.

Middleware encapsulation

Middleware encapsulation refers to an independent set of systems to achieve read and write operation separation and database server connection management. The middleware provides SQL-compatible protocols for the service server, and the service server does not need to perform read/write separation. To a business server, there is no difference between accessing middleware and accessing a database; in fact, middleware is a database server from a business server’s point of view.

Its basic structure is as follows:

The characteristics of the database middleware approach are:

Multiple programming languages are supported because the database middleware provides a standard SQL interface to the business server.
Database middleware needs to support complete SQL syntax and database server protocol (for example, the connection protocol between MySQL client and server), which is complicated to implement and has many details. It is prone to bugs and takes a long time to be stable.
The database middleware itself does not perform real read and write operations, but all database operation requests go through the middleware, which has high performance requirements.
The database middleware can detect the primary/secondary status of the database server. For example, if a piece of data is written to a test table, the host succeeds and the slave fails.

At present, open source database middleware includes Atlas of Qihoo 360 developed based on MySQL Proxy, Cobar of Ali, Mycat developed based on Cobar, etc.

Depots table

Read/write separation disperses the pressure of database read/write operations, but does not disperse the pressure of storage. When the amount of data reaches tens of thousands or even hundreds of millions, the storage capacity of a single database server will become the bottleneck of the system, mainly reflected in the following aspects:

If the amount of data is too large, the read/write performance deteriorates. Even if indexes exist, the indexes become too large and the performance deteriorates.
Data files can become large, and database backups and recoveries can take a long time.
The larger the data file, the higher the risk of data loss in extreme cases (for example, an engine room fire causes both the active and standby database machines to fail).

For the above reasons, the amount of data stored by a single database server should not be too large and should be controlled within a certain range. To meet the requirements of business data storage, the storage needs to be spread across multiple database servers.

Business depots

Business repository refers to the distribution of data by business module to different database servers. For example, a simple e-commerce website, including user, commodity, order three business modules, we can separate user data, commodity data, order data into three different database server, instead of putting all data in one database server.

While business repositories can spread storage and access stress, they also introduce new problems, which we’ll examine in detail.

Join operation problem

After services are divided into different databases, tables in the same database are scattered to different databases. As a result, SQL JOIN query cannot be used.

For example, “query the list” of female users who have purchased cosmetics. Although there is user ID information in the order data, the user’s gender data is in the user database. If it is in the same database, a simple join query can be completed. However, now the data are scattered in two different databases, so join query is not possible. We can only query the list of user ids that have purchased cosmetics from the order database, and then query the list of female users in this batch of user ids in the user database. In this way, the implementation is more complicated than simple join query.

Transaction issues

Different tables in the same database can be modified in the same transaction. However, after services are divided into different databases, the tables cannot be modified in a unified manner through transactions. Although database vendors offer some solutions for distributed transactions (for example, XA for MySQL), performance is too low to meet the goals of high-performance storage.

For example, the user needs to deduct the commodity inventory when placing an order. If the order data and commodity data are in the same database, we can place an order. If the order fails to be generated due to the abnormality of the order database, the business program needs to add the commodity inventory. However, if the business program fails to generate orders due to its own abnormalities, the commodity inventory cannot be restored, and manual repair of inventory abnormalities is needed through daily records and other methods.

Cost problem

At the same time, the service branch library also brings the cost, the original one server to deal with the matter, now need 3, if considering backup, that is 2 to 6.

For the above reasons, it is not recommended to split the start-up business of small companies in the beginning, mainly for several reasons: there are great uncertainties in the start-up business, the business may not develop, there is no real pressure of storage and access at the beginning of the business, and the business repository does not bring value to the business. After business branch, join query between tables and database transaction cannot be realized simply.

After the service is divided into different databases, because different data needs to read and write different databases, the code needs to add logic mapping to different databases according to the data type, which increases the workload. However, the most important thing in the start-up period is fast implementation and fast verification, and business branch library will slow down the business pace.

Single table split

Storing data of different services on different database servers can support services with a scale of millions or even tens of millions of users. However, if services continue to develop, single table data of the same service will reach the processing bottleneck of a single database server. For example, if the hundreds of millions of user data of Taobao are stored in a single table on a database server, it cannot meet performance requirements. At this time, it is necessary to split the single table data.

There are two ways to split single table data: vertical table and horizontal table. The schematic diagram is as follows:

Separate tables can effectively spread storage stress and improve performance, but like separate libraries, they introduce a variety of complexities.

An example of the two ways of dividing is that many of us have probably read an article about how to cut an apple into stars, and the answer is to cut it horizontally.

The vertical table

Vertical split tables are good for splitting out some infrequently used columns in a table that take up a lot of space. For example, the nickname and desc fields in the previous diagram assume that we are a dating website. When filtering other users, the user mainly uses the age and sex fields for query, while the nickname and description fields are mainly used for display. It is not usually used in business queries. Description itself is quite long, so we can separate the two fields into a separate table, which can provide some performance improvement when querying age and sex. The complexity introduced by vertical split tables is mainly reflected in the increase in the number of table operations. For example, the name, age, sex, nickname, and description can be obtained from the first query. Now, the name, age, sex, and nickname can be obtained from the second query.

But this complexity pales in comparison to the level scales we’ll talk about next.

The level of table

The horizontal table is suitable for the table with a particularly large number of rows. Some companies require that the number of rows in a single table exceeds 50 million. The number can be used as a reference, but it is not an absolute standard. For some more complex tables, more than 10 million may be divided into tables; For some simple tables, even if more than 100 million rows of data are stored, the table can be separated. But either way, when you see tables with data volumes in the tens of thousands, it’s likely to be a performance bottleneck or a pitfall for your architecture.

Compared with vertical table, horizontal table introduces more complexity, which is mainly reflected in the following aspects:

routing

After the horizontal partition table, the routing algorithm needs to be added to calculate which sub-table a certain piece of data belongs to, which will introduce some complexity.

Common routing algorithms include:

Range routing: Select ordered data columns (for example, integer, timestamp, etc.) as routing criteria, and divide the segments into different database tables. Taking order IDS as an example, the routing algorithm can be segmented by a range size of 10 million. The complexity of range routing design is mainly reflected in the selection of segment size. Too small segment will lead to excessive number of sub-tables after segmentation, which will increase the maintenance complexity. A large number of segments may cause performance problems in a single table. It is recommended that the size of a segment range from 1 million to 20 million. Select an appropriate size based on services.

The advantage of range routing is that new tables can be smoothly expanded as data increases. For example, if the current number of users is 1 million, and the number increases to 10 million, all you need to do is add a new table and leave the old data untouched. One of the hidden disadvantages of range routing is that it is not evenly distributed. If the table is divided according to 10 million, it is possible that the actual amount of data stored in one segment is only 1000, while the actual amount of data stored in another segment is 9 million.

Hash routing: Hash the values of a column (or a combination of columns) and then distribute the Hash results to different database tables. Similarly with order IDS, if we start with four database tables, the routing algorithm can simply use the value of ID % 4 to represent the number of the database table to which the data belongs. The order with ID 12 is placed in a sub-table with id 50, and the order with ID 13 is placed in a word table with id 61.

The complexity of Hash route design is mainly reflected in the selection of the initial number of tables. A large number of tables is difficult to maintain, and a small number of tables may cause performance problems for a single table. When Hash routing is used, increasing the number of word tables is very troublesome and all data has to be redistributed.

The advantages and disadvantages of Hash routes are basically the opposite of range routes. The advantages of Hash routes are that the tables are evenly distributed, but the disadvantages are that it is difficult to expand new tables and all data must be redistributed.

Configuring routes: Configuring routes is a routing table that records routing information in an independent table.

Similarly, with order ID as an example, we add an order_router table. This table contains two columns orderJD and tableJD, and the corresponding table_ID can be queried according to orderJD.

The configuration route is simple in design and flexible in use, especially when expanding the table. You only need to migrate the specified data and modify the routing table.

The disadvantage of configuring routes is that you need to query routes more than once, which affects the overall performance. In addition, if the routing table itself is too large (for example, hundreds of millions of data), performance may also become a bottleneck. If we divide the routing table into databases and tables again, we will face a problem of routing algorithm selection in dead-loop mode.

Join operation

After horizontal table division, data is scattered in multiple tables. If join query is required with other tables, multiple join queries need to be performed in business code or database middleware, and then the results are merged.

The count () operation

It’s not so easy when you split the tables. There are two common handling methods:

Count () add: This is done by counting each table in business code or database middleware and then adding up the results. This method is simple to implement, but the disadvantage is that the performance is low. For example, if the horizontal table is shelled into 20 tables, it takes 20 count() operations and may take several seconds to get the result if it is serial.

Record table: Create a new table, if the name of the table is “record table”, containing two columns table_name, row_count, after each successful insert or delete subtable data, update the “record table”. Getting the number of table records in this way is much better than adding count() because you can get the data in a single query. The disadvantage is that the complexity increases a lot. The operation of the sub-table needs to synchronize the operation of the “record table”. If a business logic is missed, the data will be inconsistent. In addition, the operation on “record table” and the operation on sub-table cannot be processed in the same transaction. In abnormal cases, the operation on sub-table will succeed but the operation on record table will fail, which will also lead to data inconsistency.

In addition, the way of recording the table also increases the database write pressure, because every INSERT and DELETE operation on the sub-table needs to update the record table, so for some services that do not require the record number to keep accurate in real time, the record table can also be periodically updated through the background. Timed update is actually a combination of “count() plus “and” record table “, that is, timed count() plus to calculate the number of records in the table, and then update the data in the record table.

The order by operating

After the horizontal table is divided, the data is scattered into multiple sub-tables, and the sorting operation cannot be completed in the database. The business code or database middleware can only query the data in each sub-table, and then summarize the data for sorting.

Implementation method

Similar to database read/write separation, the implementation of separate database table is also “program code encapsulation” and “middleware encapsulation”, but the implementation will be more complex. Read and write separation is implemented as long as the identification of SQL operation is read or write operation, through the simple judgment of SELECT, UPDATE, INSERT, DELETE several keywords can be done, and the implementation of separate database and table in addition to determine the type of operation, It also determines the table, operation function (such as count function), order by, group by operation, etc., and then performs different processing according to different operations. For example, the order BY operation needs to query data from multiple libraries, and then order BY again to get the final result.

Heterogeneous data

After completing the database and table, we found that there were some problems. In addition to “program code encapsulation” and “middleware encapsulation”, we had another way, that is, data heterogeneity. Data heterogeneity is to store data in different places. For example, a copy of MySQL data is written to Redis in business. This is to realize data storage in different places in the cluster, that is, data heterogeneity.

Using data heterogeneity is very effective when both the volume of data and the number of visits are high, but it increases the complexity of the architecture. Heterogeneity can be achieved by double-writing, subscribing to MQ, or binlog and parsing.

Double write: Data is written into MySQL and heterogeneous storage systems at the same time.
MQ: after the MySQL database is successfully written, an MQ message is sent. The cache reads the MQ message and writes the message to the heterogeneous storage system.
Binlog: After writing to MySQL, cache system X consumes the binlog and writes changes to the heterogeneous storage system.

This is a heterogeneous data architecture diagram:

The ES search cluster is used in the figure to handle the search business, as well as the cross-library join problem we mentioned earlier.

When designing heterogeneity, we can take advantage of some popular NoSQL databases. NoSQL, while proven not to be a substitute for relational databases, is a powerful complement to relational databases in many scenarios.

For example, KV storage like Redis, which we are familiar with, has very high read/write performance and can be used in scenarios where read/write performance is required;

Columnar storage databases such as Hbase and Cassandra. The characteristics of this kind of database is that data is not stored in behavioral units like traditional database, but stored in columns, which is suitable for some off-line data statistics scenarios.

Document databases such as MongoDB and CouchDB have the feature of Schema Free. Fields in data tables can be expanded arbitrarily and can be used in scenarios where data fields are not fixed.

Querying dimension Heterogeneity

For example, for the order library, when it is divided into the database and table, if you want to query according to the merchant dimension or according to the user dimension, it is very difficult, so you can solve this problem through heterogeneous database. You can use the architecture shown below.

Or use ES isomerism as shown below:

Heterogeneous data stores the relationship between data, and then queries the actual data by querying the source database. However, data redundancy storage can sometimes be used to reduce source library queries or improve query performance.

Aggregation data isomerism

The commodity detail page generally includes basic commodity information, commodity attributes and commodity pictures. When displaying the commodity detail page in the front end, the query is carried out according to the commodity ID dimension, and three or more libraries are required to check all the displayed data. At this point, if one of the libraries is unstable, there will be problems in the product details page. Therefore, we aggregated the data and stored them in a heterogeneous KV storage cluster (such as storing JSON), so that all the display data can be obtained with only one query. This approach also needs to be considered when the system has a certain amount of data and visits.

High concurrency architecture essentials

Now that you have a pretty good idea of what a high-concurrency architecture looks like, here are some summaries and additions.

High performance essentials

High availability points

In addition to technical considerations, ensuring high availability also requires a good organizational system to ensure rapid recovery of service problems.

High expansion points

1. Reasonable layered architecture: for example, the most common layered architecture of the Internet mentioned above, in addition, micro-services can be further stratified in finer granularity according to the data access layer and business logic layer (but the performance needs to be evaluated, there may be one more hop in the network).

2. Split storage layer: split vertically according to the business dimension and further split horizontally according to the data characteristic dimension (divided into databases and tables).

3. Business layer split: the most common one is split according To business dimension (such as goods and services of e-shopping mall, order services, etc.), it can also split according To core request and non-core request, and it can also split according To request source (such as To C and To B, APP and H5).

Well, this one is finally finished. For more in-depth study suggestions, read books for reference [11]. I wish all architects can really like the river like the sea, grasp the high concurrency, earn more.

Reference and thanks:

[1] : Geek Time “Learning Architecture from Scratch”

[2] : Zhihu Q&A: I have no experience in high concurrency projects, but I am often asked questions about high concurrency and performance tuning in interviews. Is there any solution?

[3] : What is high concurrency, explain in detail

[4] : [High concurrency] How to design a system to support high concurrency and large flow? This time I will share the design ideas with you!

[5] : The Decade of Taobao Technology

[6] : Zhihu Q&A: How to get high concurrency experience?

[7] : Evolution of server-side high-concurrency distributed architecture

[8] : Seven years of grinding a sword, exclusive reveal taobao technology development process and architecture experience

[9] : Core Principles and Case Analysis of Technical Architecture of Large Websites

[10] : Ali technical expert: Share the development history and architecture experience of Taobao technology with 500 million daily life! 18 PPT details

[11] : Hundred-million-level Traffic Website Architecture Technology

[12] : Geek Time “Learning Micro Services from Scratch”

[13] : Interview question: How to ensure that information is not lost? Handling duplicate messages? Message ordering? Message stack processing?

[14] : Geek Time “high Concurrency System Design 40 Questions”

[15] : Redis Deep Adventure: Core Principles and Application Practice

[16] : Redis cache avalanche, cache breakdown, cache penetration and cache preheating, cache degradation

[17] : How to design and use cache elegantly?

[18] : Cache penetration, cache breakdown, cache avalanche, just read this article

[19] : Heterogeneous data

[20] : Sub-database sub-table? How to never migrate data and avoid hot spots?

High concurrency, I can not grasp ah!