way

High concurrency is ensured by improving throughput and reducing response latency in both directions.

The goal of performance optimization

  1. Reduced response time
  2. Increase concurrency
  3. Keep the system in a reasonable state

Optimization means

  1. When time is the bottleneck.

When the running time of the system is the main bottleneck and the space is more than enough, you can use the cache to reuse the calculation results, storing the calculation results in the cache, reducing the overhead on the system time, because the cache is easier to access.

  1. When space is scarce

At this point, I can consider time in space, when the system runs fast and is limited by space. Two examples:

  1. Network transmission is the bottleneck, this time can use the system time to use HTTP GZIP compression, consumption of time for space

2. The request classification interface of App can use the version number to determine which data needs to be updated, as long as you download the data that needs to be updated

  1. Find some ideas for system bottlenecks

  2. Analyze the system business process, find the critical path and decompose optimization.

  3. For example, in a service cluster, the top five interfaces account for 80 percent of the traffic. These interfaces are critical paths

  4. Code optimization for critical paths is usually the most profitable, and other paths should be optimized as well.

The whole optimization means: how many RPC interfaces are called, how much data is loaded, what algorithm is used, whether non-core processes can be asynchronized, and whether logic without data dependence can be executed in parallel.

Level of optimization

  1. Architectural Design hierarchy

    Focus on system control and data flow.

    How to split the system so that the overall load of each part of the system is more balanced and the internal system overhead is reduced.

  2. Algorithmic logic hierarchy

Focus on the efficiency of algorithm selection, optimization of algorithm logic, parallel processing of spatial and temporal optimization tasks, use of lock-free data structure, avoid locking (CAS).

Space for time

ThradLocal

Time for space

Using compression algorithms to compress data, more complex logic to reduce data transmission.

  1. Code optimization hierarchy

Focus on optimizing the details of the code, whether the code implementation is reasonable, whether too many objects are created, whether the loop is efficient, whether the cache is used properly, and whether the settlement results are reused.

The code optimization levels are as follows:

Whether loop traversal is efficient:

Do not call RPC interface, query distributed cache, execute SQL, etc., in the loop, you should first call batch interface to assemble data, and then loop processing.

Code needs to pay attention to avoid generating too many objects or invalid objects, such as the log level of the output log needs to determine, avoid new invalid objects.

Check whether the initial capacity of ArrayList and HashMap is set properly.

Whether the data objects can be reused properly, for example, the data queried through RPC must be reused.

Choose an appropriate data structure based on data access features, such as read more than write less, and consider copyOnWrite.

Whether to append strings using String addition or StringBuilder (StringBuilder performs about 15 times better than String addition when StringBuilder capacity is pre-allocated).

Whether the data is initialized correctly, some globally shared data, hungry mode, is initialized before user access.

Database constructs can be as small as possible in data structures such as state fields. If state values are within 255 take the unisuntinyint and IP uses int instead of vARCHar.

Tinyint is used in enum scenarios. This table is required for EMUM extension.

Do not use select * to query data but only the required fields to avoid wasting data IO, memory, CPU, and network transmission.

Analyze query scenarios to establish appropriate indexes, analyze field selectivity, index length, and use prefix indexes for long VARCHARS

The MySql manual explains that allowing null fields requires extra storage space to handle NULL and is difficult to optimize queries

Conclusion:

This is all to reduce server CPU usage, IO traffic, memory usage, network consumption, response time, and so on.

The CPU Cache structure

Faster and faster: memory -> L3 -> L2 -> L1 multi-level cache

What data is suitable for caching?

Very focused high-frequency access, not high timeliness requirements suitable for caching, such as very focused and high-frequency access to business data such as channels, columns, advertising space. The timeliness requirement is not very high, the update does not reflect the real-time suitable cache to improve performance.

If you have high requirements for data timeliness, you need to consider the consistency of updating the cache.

The conflict between timeliness and caching, for example, goods and services are cached for goods. Since caching and updating goods are not the same transaction, the requirements for timeliness of data are like transactions, and the commodity information can only be queried from the database.

The algorithm logic optimization level is as follows:

Architecture design optimizes logic

  1. Microservitization of distributed systems
  2. Sub-database sub-table, read and write separation, data sharding
  3. Stateless design and horizontal scaling
  4. Call link carding and hotspot data as close as possible to the user
  5. Distributed cache, multi-level and multi-type cache
  6. Capacity planning
  7. Reject early to ensure flexibility is available.

Case analysis

Seconds kill system

Development scenario analysis:
  • A lot of concurrency, 99% of user requests coming into the server at some point
  • Valid requests are low, consistent with the amount of database inventory remaining
  • Strict inventory consistency requirements, not oversold
Architectural design

The data needs to be layered

The data is checked hierarchically, and the upper layer filters out invalid requests as far as possible.

The upper layer can be inaccurate filtering.

At the last layer, data consistency check is performed and inventory is deducted.

Specific means

  1. Cache Html, JS, CSS and other static resource files to the client (APP or browser)

  2. Non real-time dynamic data (such as during a seconds kill commodities title, description, images, the URL list, store information, seconds kill activity information, etc.), the data cached user access link in the flow near the user’s location, coarse filtration of the units, such as whether the user is eligible for seconds to kill, kill whether has ended, etc, these data real-time demand is not high, This data sits on top of the tier 1 cache and is synchronized to the cache via MQ when the database changes.

  3. Real-time data such as user marketing data (red packets, discounts, merchandise inventory, etc.) are then filtered out for a batch of users. This data is placed in the second level cache and is synchronized over MQ.

  4. After multi-layer filtering, the traffic finally falls into the database is very small, so transactions are finally used at the database level to ensure the accuracy of the storage.

# # # feed system

Example: wechat moments

Features:

  1. Read more and write less 100/1
  2. Hot and cold data is obviously 80% day data, 20% of users are active users
  3. The hot spot effect is obvious in hot events, major festivals and daily life
  4. High traffic

Pay attention to the point

Read more than write less, hot and cold data is obvious, hot data is cached to the call link closer to the user

L1 has a small cache capacity and is responsible for preventing the hottest data. L2 has a larger cache scope of data, such as timeLine of general users, separate cache of high-hotspot data, such as setting the whitelist, and separate cache of large-V user data.

Data analysis of the access proportion of the first few pages of various timeline(timeline of concern, timeline of topic and timeline of some operations), such as the first three pages account for 97 percent. Based on this business feature, the data of the first few pages are put into L1Cache as hot data.

A push or pull

When users publish data, they need to consider whether to push the data to followers (friends) or to personally pull messages from followers. This is push or pull.

For example, on Weibo, the most popular celebrity has tens of millions of followers. When you post a message, think about how to get the message out.

implementation

  1. Spread messages based on write on the same push channel

  2. Push strategy: split data parallel push, active users push first, inactive users slowly push. For example: there are 10,000 users concerned, sent a message, first split into 100, each 100 parallel push 10,000 users, active assuming there are 2000, then active users push first, inactive users slowly push

    Each user maintains a list of active users. If a user goes online, he/she is an active user.

  3. Message standardized format

  4. Unified data flow, clear responsibilities

Data Store Selection