[High concurrency, High performance, high availability] System design experience | More challenges in August

This is the 28th day of my participation in the August Challenge

Author: Tom Brother wechat official account: Micro technology

Hi, I’m Tom

Software development often refers to a term “three high”, that is, high concurrency, high performance, high availability.

Specific definition of indicators, such as: High concurrency requires QPS greater than 100,000; In terms of high performance, request delay is less than 100 ms. High availability is higher than 99.99%.

Next, we focus on the introduction of the three high

High concurrency

We use QPS (Queries Per Second) to measure the system capacity. What are the architecture strategies?

1. Load balancing

Just as the so-called two fists are difficult to defeat four hands, the preferred solution of high concurrency supporting scene is clustered deployment, a server carrying limited QPS, multiple servers stacking effect is not the same.

How to forward the traffic to the server cluster, which uses load balancing, such as LVS and Nginx.

Common load algorithms are polling method, random method, source address hash method, weighted polling method, weighted random method, minimum connection number method and so on

Service combat: One LVS cannot withstand the traffic peak of tens of millions of traffic. Usually, about 10 LVS are used to perform load balancing for domain name resolution (DDNS). Combined with high-performance network cards, a single LVS can provide more than one million concurrent capabilities.

Note that LVS is responsible for layer 4 network protocol forwarding and cannot load balance according to HTTP request path, so Nginx is also required

2. Pooling technology

Multiplexing a single connection cannot support high concurrency. If a connection is created and closed for each request, time is wasted considering TCP’s three-way handshake and four-way wave. The core of pooling technology is “pre-allocation” and “recycling” of resources. Common pooling technologies include thread pool, process pool, object pool, memory pool, connection pool and coroutine pool.

Several important parameters of connection pool: minimum number of connections, free number of connections, maximum number of connections

The Linux kernel uses processes as units to schedule resources, and threads are also lightweight processes. So processes and threads are created and scheduled by the kernel. A coroutine is a task execution unit created by an application, such as the “Goroutine” in the Go language. The coroutine itself, which runs on threads and is scheduled by the application itself, is a lighter unit of execution than threads.

In Go, a coroutine starts with 2KB of memory (the default thread stack size under Linux is 8MB), which is much smaller than threads and processes. The creation and destruction of coroutines is completely performed in user mode and does not involve switching between user and kernel mode. In addition, the coroutine is completely called by the application in the user mode and does not involve kernel-mode context switching. Coroutine switching is fast because there is no thread state to deal with and very little context to save.

There are two ways to realize the coroutine pool in Go language: preemption and scheduling.

In a preemptive coroutine pool, all tasks are stored in a shared channel, and multiple coroutines consume tasks in the channel at the same time, resulting in lock competition.
Scheduling coroutine pool, each coroutine has its own channel, each coroutine only consumes its own channel. When a task is delivered, the load balancing algorithm is used to select an appropriate coroutine to execute the task. Such as selecting the coroutine with the fewest tasks in the queue, or simply polling.

This PDF includes Java basics, Java concurrency, JVM, MySQL, Redis, Spring, MyBatis, Kafka, design patterns and other interview questions. Links to download address: baidu cloud: pan.baidu.com/s/1XHT4ppXT… Extraction code: s3AB

3. Flow funnel

The above is a positive way to improve the system QPS, we can also reverse thinking, do subtraction, intercept illegal requests, the core capabilities to normal business!

Not all the high concurrent traffic on the Internet is pure, and there are also a lot of malicious traffic (such as hacker attacks, malicious crawlers, scalpers, seckill, etc.). We need to design traffic interceptors to filter out those illegal, unqualified and low-priority traffic and reduce the concurrent pressure of the system.

Interceptor layering:

Gateway and WAF (Web Application Firewall)

Methods such as blocking the attacker’s source IP address, rejecting requests with invalid parameters, limiting traffic by source IP address, and limiting traffic by user ID are adopted

Risk control analysis. Analyze historical business data such as orders with the help of big data ability, effectively identify behaviors such as placing orders with multiple accounts of the same IP or paying too fast after placing orders, mark accounts and provide them to the business team for use.
Local memory caching is applied to each downstream Tomcat instance, and some inventory is stored in a local copy for pre-verification. Of course, in order to keep the data consistent as much as possible, there are scheduled tasks that periodically pull the latest inventory data from Redis and update it to the local memory cache.

A high performance

Performance directly affects the user’s sensory experience. If there is no response after 5 seconds, most users will choose to leave the system.

So what are the factors that affect system performance?

User Network Environment
Request/response packet size
CPU, memory, and disk performance of the service system
Length of the service link
Performance of downstream systems
Whether the algorithm implementation is efficient

Of course, as the number of concurrent requests increases, so does the stress on the system and the average request latency.

1. High-performance caching

Some hotspot data is read from the DB every time, which causes great pressure on the DB and deteriorates performance. Therefore, we need to use caching to improve the performance of hot data access, such as the active information data in the browser cache for a period of time.

Caches are classified in descending order of performance into register, L1 cache, L2 cache, L3 cache, local memory, and distributed cache

The upper registers, L1 caches, and L2 caches are caches located in the CPU core, with an access delay of less than 10 nanoseconds. L3 cache is a shared cache outside the CPU core but inside the chip, with an access delay typically around ten nanoseconds. Cache has the characteristics of high cost and small capacity. L3 cache with the largest capacity is usually only tens of MB.

Local memory is the main memory in the computer. Memory is much cheaper than the cache inside the CPU chip, usually in the gigabyte level, and access latency is usually in the tens to hundreds of nanoseconds.

Memory and cache are both types of memory that are vulnerable to power outages. If the machine fails, data in this type of memory will be lost.

Note: When using cache, pay attention to cache penetration, cache avalanche, cache hot spots, and cache data consistency. Of course, a multi-level cache combination (browser cache + server local memory cache + server network memory cache) is usually used to improve overall performance.

2. Log optimization to avoid IO bottlenecks

When the system processes a large number of DISK I/O operations, the CPU and memory speed is much higher than that of the disk. As a result, the CPU may spend too much time waiting for the disk to return the processing result. This portion of THE CPU’s IO overhead is called an “IOwait.”

During an I/O interrupt, if other threads are available for scheduling, the system dispatches them directly, and the CPU is displayed as Usr or Sys. However, if the system is idle and no other tasks can be scheduled, the CPU is displayed as IOWAIT (in fact, the same as Idle).

A disk has a performance indicator: IOPS, which is the number of read and write operations per second. The IOPS of a solid state disk is around 30,000. For a seckill system, if the single-node QPS is at 100,000 and three logs are generated per request, then the write QPS of logs is at 30W/s and the disk cannot support it.

Linux has a special file system: TMPFS (Temporary File System), which is a memory-based file system managed by the operating system. When we write to disk we actually write to memory. When the log file reaches our threshold, the operating system writes the log file to disk and deletes the log file from TMPFS.

This batch, sequential write, greatly improve the disk throughput performance!

High availability

The high availability indicator is used to measure how available a system is.

MTBF (Mean Time Between Failure) : indicates the system availability duration
MTTR (Mean Time To Repair) : indicates the Time required for the system To recover from a fault
A service-level Agreement (SLA) is used to evaluate Service availability. The formula is MTBF divided by MTBF plus MTTR.

Generally when we say availability is higher than 99.99%, we mean SLA is higher than 99.99%.

Technical architecture, what are the strategies for high availability?

Multi-cloud architecture, remote live, remote backup
Active/standby switchover, such as redis cache and mysql database, synchronizes and backs up data on the active and standby nodes in real time. If the active node is unavailable, the system automatically switches to the standby node
Microservices, stateless architecture, business cluster deployment, heartbeat detection, the shortest time to detect unavailable services.
The circuit breaker and current limit are used to solve traffic overload problems and provide overload protection
Pay attention to Web security and solve attacks and XSS problems

1. Active/standby switchover to reduce the failure time

When a fault occurs in the system, the primary task is not to find the cause immediately. Considering the complexity of the fault, it takes some time to locate and troubleshoot. When the problem is repaired, the SLA has been lowered several levels. Is there a faster way to solve this problem? That is failover.

When a failed node is found, instead of trying to fix it, it is immediately isolated and traffic is diverted to the normal node. This not only reduces MTTR and increases SLA through failover, but also buys enough time to repair the failed nodes.

The active/standby switchover is roughly divided into three steps:

The first step is auto-detect, which uses health check, heartbeat and other technologies to automatically detect faulty nodes.
The second step is FailOver. When the faulty node is detected, the faulty node is isolated by removing traffic from the cluster, and the traffic is transferred to the normal node.
Step 3 FailBack. After the faulty node recovers, it is automatically added to the cluster to ensure that the cluster resources are the same as before the fault.

2, fuse, provide overload protection

Overload protection means that when the load exceeds the system capacity, the system automatically takes protection measures to prevent itself from being overwhelmed.

When the system is on the verge of crash, the circuit breaker immediately interrupts services to ensure system stability and avoid crash. It is similar to the “fuse” in electrical appliances. When the current is too high, the “fuse” will be burned first, cutting off the current, so that the circuit does not overheat and burn the appliance and cause a fire.

For example, a fuse break occurs when the CPU usage exceeds 90%, the request error rate exceeds 5%, and the request delay exceeds 500ms.

3, limit current, provide overload protection

The principle of current limiting is similar to that of circuit breaker, which is to determine whether to implement a policy by judging a certain condition. However, there is a difference. The circuit breaker triggers overload protection and the node is suspended until it is restored. In traffic limiting, the system processes only the requests within its capacity. If the number of requests exceeds its capacity, the traffic is restricted.

The current limiting algorithms mainly include: counter current limiting, sliding window current limiting, token bucket current limiting, leakage bucket current limiting. There is a lot of information on the Internet, so I won’t go over it here.

4, the drop

For example, in the case of e-commerce promotion, when the system cannot withstand all the traffic at the peak of the business, the system load and CPU usage have exceeded the warning level. Some non-core functions can be downgraded to reduce the system pressure, such as temporarily shutting down the function of commodity evaluation and transaction record. Abandon the car to ensure the normal use of core functions such as order creation and payment.

Of course, different businesses and companies have different ways to deal with it. Therefore, a unified downgrade plan should be reached after discussion with the business side based on the actual scenario.

To summarize: downgrades protect the availability of core systems by temporarily shutting down non-core services or components.

More:

Github.com/aalansehaiy…

Author introduction: Tom brother, computer graduate student, the school recruited ali, P7 technical expert, has a patent, CSDN blog expert. Responsible for e-commerce transactions, community fresh, flow marketing, Internet finance and other businesses, many years of first-line team management experience

[High concurrency, High performance, high availability] System design experience | More challenges in August

High concurrency

A high performance

High availability

Related Posts

Data structures and algorithms — Basic data structures

How do PHP engineers develop bitcoins

An in-depth understanding of one of the JVMS