What solutions can be used to improve performance for real performance problems? This requires you to understand the specific optimization strategy.

Real performance issues and specific domains vary, and I can’t cover them all. But to help you understand, I’ve put together a list of 10 commonly used optimization strategies.

I’ve divided the top 10 strategies into five categories, each with two related strategies to help you master them. The five categories are: spatio-temporal interconversion, parallel/asynchronous operation, advance/delay processing, cache/batch merging, algorithm design, and data structure. Let’s go through them one by one.

First, space-time transformation

The first policy category is “space-time shift.” When we watch science fiction movies and novels, we often see the subject of time and space transformation. There are two strategies in performance optimization that fit into this category, including the seemingly contradictory strategies “Time for Space” and “space for time.”

1. Trade time for space

The time-for-space strategy starts with the idea that “space” resources such as memory and storage can sometimes be the scarcest resources, so you need to minimize the amount of space you take up. For example, if a system’s biggest performance bottleneck is memory usage, then reducing memory usage is the most important performance optimization.

There are several specific operation methods of this strategy:

  • Change the application’s own data structure or data format to reduce the size of the data that needs to be stored;
  • Find ways to compress data in memory, such as using a compression algorithm and decompressing it when it is actually used;
  • Store some of the in-memory data on an external, cheaper storage system and retrieve it when needed.

These methods to save memory space, generally need to pay the price of time.

In addition to memory, a common scenario is to reduce the size of data to facilitate network transfer and external storage. There are many methods and algorithms for compression, such as the popular ZStandard (ZSTD) and LZ4. There are spatial and temporal trade-offs between these algorithms.

Any compression algorithm is basically measured by three metrics: compression ratio, compression speed, and memory usage.

If the bottleneck of the system is the network transmission speed or the size of storage space, try to adopt the algorithm of high compression ratio, so that time can be exchanged for space, which can save time or other costs.

2. Trade space for time

“Space for Time” is the opposite of the “time for space” strategy. In some scenarios, time and speed are more important, but space is still available, so we can consider trading space for time.

One thing to note here is that we’ll talk about a strategy for using caching later. Although the caching strategy is also theoretically a “space for time” approach, we separate it here because the “space” definition of the caching strategy is different from that of the general “space for time”. Generally speaking, the “cache” space is not on the same level as the original space, and the added cache is often one level higher than the original space. In our “space for time” strategy, the “space” in it is similar to the original space.

Internet services are often large in scale, such as national or even global services. Users are geographically dispersed and have high access time requirements, which require that data and services be accessed as close to them as possible. “Space for time” involves making multiple copies of data and services to cover as many users as possible. The CDN content delivery network technology we talked about earlier can be categorized here.

In fact, any large-scale system we deploy is more or less a space-for-time strategy, such as load balancing between clusters and servers, where many servers (space) are used simultaneously in exchange for reduced latency (time).

Two, in advance and delayed processing

The second category of optimization strategies is “pre – and postponement”, which also has two opposing strategies. One is pre-processed or pre-processed, and the other is deferred or lazy processed.

3. Pre-processing/pre-processing

The up-front/up-front processing strategy also applies in many areas, such as the early loading of web page resources. Web standards specify at least two preloading methods: Preload and prefetch. Loading resources with different priorities can significantly improve page download performance.

Many file systems have the prefetch function, which reads extra data from the disk in advance for the next upper-layer application to read data. This feature is very effective for sequential reads and can significantly reduce the number of disk requests, thereby improving read data performance.

The CPU and memory also perform prefetch operations, which store the instructions and data in the memory to the cache in advance to speed up processor execution. Cache prefetch can be implemented by hardware or software, that is, hardware prefetch and software prefetch.

Hardware prefetch is implemented through the hardware in the processor. The hardware will always monitor the instructions or data requested by the executing program and, according to established rules, identify and prefetch the data or instructions required by the next program.

Software prefetch is the active insertion of prefetech instructions during program compilation, which can be added by the compiler itself or by our code. In this way, the prefetch operation will be performed at the specified location during execution.

4. Delayed/lazy processing

The deferred/lazy processing strategy is the opposite of the advance/advance processing mentioned earlier. Postpone operations (such as computations) as long as possible until they are needed to be performed, and there is a good chance that unnecessary operations will be avoided, if at all.

The most famous example of using this strategy is COW (Copy On Write). If multiple threads want to manipulate a piece of data, each thread can normally make a copy of the data and store it in its own space. But copying takes time. If the system adopts lazy processing, the copying operation will be delayed. If multiple threads have only read requests for the data, then the same data resource can be shared because the “read” operation does not change the data. When a thread needs to modify this data (a write operation), the system makes a copy of the resource for that thread to use, allowing rewriting, so that it does not affect other threads.

COW is best known for two scenarios. One is that the child processes generated by the Unix fork call share the address space of the parent process and copy it only when a child process needs to write. The other is the class sum of high-level languages

Containers, such as the CopyOnWrite container in Java, are used for efficient access in multithreaded concurrent situations. Many of the STL standard template library classes used in C++, such as the string class, also have copy-on-write technology.

Parallel/asynchronous operation

The third category of optimization strategies is “parallel/asynchronous operation”. Parallel and asynchronous operations, although very different from each other, are actually similar in that they divide a pipeline and process into several lines, both physically and logically.

5. Parallel operations

Parallel operation is a strategy that physically divides a pipeline into several lines. Intuitively speaking, if one person can’t do all the work, find more people to do it. Parallel operation not only increases the throughput of the system, but also reduces the average user wait time. Modern cpus, for example, have many cores on which threads can run independently. This is called parallel operation.

Parallel operations require our programs to be extensible, and programs that cannot be extended cannot be processed in parallel. The concept of parallelism here varies in granularity, such as at server granularity (so-called horizontal scaling), at multithreading granularity, or even at the instruction level.

Most Internet servers use either multiple processes or multiple threads to handle user requests to take full advantage of multi-core cpus. Another case is where there is IO blocking, it is also very suitable to use multithreading parallel operation, because the CPU is basically idle in this case, multithreading can make the CPU do more work.

6. Asynchronous operation

Asynchronous operation, unlike parallel operation, is a strategy that logically divides a pipeline into several lines.

Let’s first clarify the concepts in the programming world: synchronous and asynchronous. The difference between synchronous and asynchronous is whether a function call directly returns the result. If the function is suspended and does not return until the result is obtained, it is synchronous; If the function returns immediately and waits for the data to arrive before notifying the function, then this is asynchronous.

We know that Unix file operations, there are block and non-block mode, some system calls are block, such as: Socket select, etc.. If our program operates synchronously all the time, then performance will be severely affected. Using asynchronous operation, although slightly increase the complexity of the program, but will make performance throughput rate greatly improved.

Modern languages tend to have good support for asynchronous operations, making asynchronous programming easier and more readable.

Cache/batch merge

Cache/batch merge is the fourth category of optimization strategies. Caching and batch merging are two strategies that work together in some scenarios, so I put them together.

7. Cache data

The essence of caching is to speed up access. This is a very common strategy that is followed by almost every module and domain in a computer system: CPU, memory, file system, storage system, content distribution, database, etc.

We are probably most familiar with CPU levels of caching. In file systems, storage systems, and database systems, there are also fast caches to store frequently accessed data. The purpose is to maximize cache hit ratio and avoid accessing slower storage media.

For a Web-based application service, the front-end will have browser cache, CDN stored on the edge server, static content cache provided by the reverse proxy; There is also a server local cache on the back end.

In programming, for objects that may be repeatedly created and destroyed, and which are costly to create and destroy (such as sockets and threads), caching can also be used, in the form of connection pools and thread pools.

For costly calculations, the results can also be cached so that they can be read directly next time. An effective way to optimize recursive code, for example, is to cache intermediate results.

8. Batch merge processing

When there is IO (such as network IO and disk IO), merge operations and batch operations tend to improve throughput and performance.

Our most common is batch IO reads and writes. When there are multiple I/OS, they can be combined into one read/write data. This reduces the read/write time and protocol burden. For example, when GFS writes files, try to write them in batches to reduce IO overhead.

The read and write operations of the database can also be combined as far as possible. For example, it is better to query a key-value database for multiple keys at once rather than breaking them into multiple queries.

When network requests are involved, the network transmission time can be much longer than the request processing time, so it is also necessary to consolidate network requests. For upper-layer protocols, try not to have too many Chatty end-to-end conversations, otherwise the overall application layer throughput will not be very high.

5. More advanced algorithms and data structures

The final category of optimization strategies is “more advanced Algorithms and Data Structures.” The two strategies work closely together. For example, advanced algorithms sometimes require advanced data structures. And they are often directly related to the design code of the program, so they are put together.

9. Advanced algorithms

The same problem, there will certainly be different algorithm implementation, and then there will be different performance. The sorting algorithms, for example, are different. Some implementations may be time for space, some may be space for time, so you need to make trade-offs based on your own situation.

For each specific scenario (including the size of the input set, the requirements of time and space, the size distribution of data, etc.), there is always one algorithm that is most suitable. We need to consider the actual situation to choose the optimal algorithm.

10. Efficient data structures

As with algorithms, the properties of different data structures vary greatly.

There is no single data structure that is best for all situations. For example, the various implementations of lists in Java that you are likely to use a lot, including various flavors of List, Vector, LinkedList, depend on a number of metrics: add elements, delete elements, query elements, traverse time, and so on. We also need to make trade-offs to find the most efficient data structure for the actual situation.

Six, summarized

The solution of various performance problems requires some strategies; And different people and different scenes will use sometimes the same and sometimes different strategies, just as Han Yu said, “The grass and trees will return soon in spring, all kinds of red and purple fight against the flowers.” But flowers and trees, in the final analysis, because “know that spring will soon return”.

Similarly, these performance optimization strategies are sometimes easy to think of, and sometimes not so intuitive. So, we need to think systematically and hierarchically, and this lecture is going to help you establish that.

By summarizing these ten strategies, I hope you can think about the same problem from different angles. Sometimes a problem seems to have no solution, but by thinking about it in multiple directions, you may suddenly find a very good solution.

Performance tuning information and video can also be shared with you, follow the public account below