background

Received a notice one day, someone angry anchor to do voice room activities, need to do quality assurance work.

Because the room has learned from the previous IM pre-typesetting experience, plus the performance of the iPhone itself is good, I thought it was like an old dog…

However, as the pressure gauge rose to hundreds per second, the iPhone6 Plus test machine froze and the screen didn’t respond at all.

Problem orientation

I’m sure many developers have read about performance optimization, but now we need to do something about it.

Press the article a fierce operation such as tiger, looked down to see zero bar five.

The positioning of the key problems is not something that can be solved by just following the rules, especially if we have done some general optimization.

CPU or GPU

First, determine which cause caton belongs to:

  • CPU card,

  • GPU caton

We can use tools to see what happens to the CPU/GPU when stuttering occurs, and Xcode comes with these features as well.

The DoraemonKit of The Amway Didi team integrates a number of development tools to improve efficiency, such as view hierarchy, network requests, and CPU changes.

Because the project itself has been, I directly use it to observe CPU changes, convenient one.

Pressure measurement again, the CPU curve suddenly rose to 100%, the naked eye visible seconds burst.

In fact, CPU consumption is also a feature of heat, but we need to use data to speak, do business grasp is what? Data:)

As for why caton happened, this article on iOS Tips for Keeping the Interface Smooth is a highly recommended read and worth relearning. It’s a great book indeed.

What operations cause CPU stutter?

Now that you have located the CPU, do you immediately find the answer?

The answer is no, we are still some way from the problem.

If you go straight to the actions that caused the CPU consumption problem, you will find the following seven items:

  • Object creation
  • Object to adjust
  • Object is destroyed
  • The text calculated
  • Text rendering
  • Image decoding
  • Image rendering

In fact, if you go through the code line by line, you’ll find that almost every line of code falls into one of seven categories: object creation/adjustment/destruction, text calculation/rendering, and image/decode/draw.

Where exactly is the line of code in question, and what do you do next? An experienced engineer might be able to spot the problem code at a glance.

Well, if it’s not the big guy, obviously we need some way.

Collating business processes

It’s like fighting a war. You can only win if you know your enemy and you know yourself.

First, sort out the business process of the problem scenario and understand the whole process in detail.

A more recommended way to organize business processes is to draw a diagram, which can intuitively see the relationship between various parts of the business, processing flow, and data flow.

Large and complex modules are mostly handled or co-maintained by many people. At this time, we must learn to cooperate with team members, find people who know each part best, and finish the finishing steps as soon as possible.

For the voice room/live room scene that went wrong this time, we can first make a large classification: gift message (with banner animation, full-screen animation, etc.), picture message, text message and other large types.

Then sort out the large types, such as the general process of text message type:

Of course, the diagram above is only a rough example 🌰, in the real scene, there are many processing processes in RoomChatHandle alone.

Sorting out the process is a great opportunity to straighten out the business, as well as to identify design issues such as interdependence in what should be a top-down relationship. Business relationships can be well exposed if they are graphically represented.

In short, first put their business ideas in order.

Control variable method

Once you’ve sorted out the business, you can get down to business.

It is recommended to use the control variable method, more commonly specific to say, is the elimination method!

This is the simplest way to Debug… As long as you have done a little experiment in middle school, or usually Debug should understand.

Such as:

A = white B = white C = blue // Actual expected D = white D =a+ B + C = blueCopy the code

Then the influence of A/B/C was removed once, and it was finally found that:

D =a+b= white d=a+ C = blue d=b+c= blueCopy the code

It’s easy to see that something is wrong with this variable c, which is affecting our expected outcome.

The method is relatively primitive, after all, the most high-end ingredients often need the most simple cooking method, high-end problems only need the most simple Debug effort!

The use of control variables to find problems is indeed much like cooking, the speed of finding problems is to master the heat, that is, the problem of granularity:

I’m eliminating one line of code at a time, right? Or 1 method call? Or several method calls?

By gradually narrowing the scope of the problem and moving forward.

It’s easy to say, but the whole process is actually quite boring. The more complex the module, the harder it is.

Finally found the key code to the problem:

[tableView reload]
Copy the code

For the original way, 1 piece of data will be to brush the UI once, 100 pieces of data per second will brush the UI 100 times. The refresh frequency is too high, the server to a lot of data, so the frequent refresh, the load is too large directly caused by the jam.

To solve the problem

After locating the problem, it can be targeted to solve it. At present, the core problem lies in:

The NUMBER of UI refreshes in a short time causes excessive CPU pressure

So the first step is to control the frequency.

Control frequency

The obvious goal is to minimize the number of flushes, but there are still two issues involved:

  • Frequency control strategy
  • Take the threshold

The clock-control strategy is explained in iOS functions and Throttles, which are Throttle/Debounce:

  • Throttle: executes the method once per unit of time.
  • Debounce, whenever there is a method call per unit of time, the method is executed by waiting another cycle until there are no new calls.

Because of the refresh method, if messages keep coming, Debounce is likely to be delayed forever and the UI will not be refreshed. Therefore, it is appropriate to Throttle the control frequency, executing it once per unit time.

In experiments, if the normal text message type is changed to once a second, the CPU is directly reduced by 30-40%.

High/low performance machine differentiation

Controlling the refresh rate has been very effective, but the granularity is still too coarse.

The machines that are stuck are low performance machines, and the same strategy is not good enough for high performance machines.

For high-performance machines to make full use of appropriate release of restrictions on data processing, improve user experience.

Therefore, more precise thresholds can be used to test, and appropriate values can be selected for high-performance machines. Even some powerful cpus can be released to enable control strategies when CPU usage reaches a certain critical point.

Data distribution & processing

Batch data distribution

While the main issues have been addressed, the original distribution strategy is also found to be straightforward:

If I get one piece of data, I pass one piece of data directly, if I get 100 pieces of data, I flip 100 times.

Create a consolidated distribution strategy for data, or batch distribution, for example:

Within 0.5 seconds, only one piece of data will be transmitted. If 100 pieces of data come in a row, it will wait until the last time and directly send all of them.

In this way, the cost of distribution can be greatly reduced, and multiple times can be combined into one, which is the same as the above UI refresh control frequency, but now it is data level control.

Elimination strategy

There is another problem with bulk distribution:

Because data is distributed in bulk, incoming data is stored.

If too much data comes in, for example, we process 300 entries per second, but 500 entries come in, over time, messages will accumulate and even cause a new memory problem (OOM).

To avoid a backlog of messages, we also need to implement a cull strategy, which simply means discarding some of the data.

Discard unnecessary messages, which will involve product problems, need to confirm with the product or operation students, which are not important: such as in and out of the room message, general chat message…

Of course, I believe that there are also what do not want to give up the product: all for me to brush out!

The promotion of technology often comes from the promotion of business, abnormal products will put forward higher requirements for technology.

Asynchronous multithreading

The original data processing did not use asynchronous multithreading, so the data processing, but also do another asynchronous processing optimization.

Asynchronous multi-threaded processing, using GCD can be.

It looks simple, but in fact there are pits, constantly open threads will also produce the problem of thread explosion, too many threads is not necessarily good.

The machine also has a limited number of cores, and a limited number of threads that can actually be used. To put it simply, if the machine is dual-core, then I have 2 threads at the same time, processing data is very comfortable, if 4 threads, in fact, through thread scheduling, back and forth constantly switching tasks, not true multithreading. It’s the difference between concurrency and parallelism.

We can directly stand on the shoulders of predecessors, use or imitate YYDispatchQueuePool in YYKit, and put the tasks of processing data into a desired priority queue.

In addition to the number of threads, there are deadlocks to deal with.

Think further

After the above optimizations, not only the iPhone6 Plus, but even the iPhone6’s stormy server messages have no impact.

On this basis, continue to think, if the amount of data keeps increasing, and there is a tricky situation: the remaining important messages also exceed our limit, for example, all the gift messages are more than 500 yuan, can not be discarded, all the display, how should we optimize?

coroutines

Is there a better way to handle data than using multiple threads? That might be the coroutine.

The performance advantages of coroutines for multithreading:

  • Faster scheduling performance: Coroutines themselves do not require kernel-level thread switching, scheduling performance is fast, even if the creation of thousands of coroutines is no pressure
  • Reduce latency: The use of coroutines helps to reduce lock and semaphore abuse. By encapsulating coroutine interfaces such as IO that cause blocking, the use of coroutines can reduce latency and latency at the root and improve the overall performance of the application

Asynchronous rendering

Go one step further and use asynchronous drawing methods like YYAsyncLayer or Texture.

Since we also use YYLabel, after opening asynchronous drawing, it will be blank for a while and then display. The obvious lag display experience is not very good. It is suggested to open it selectively in extreme cases.

It’s a trade-off between performance and experience.

conclusion

Although this is an article about performance optimization, it’s still worth saying that too much premature optimization is the root of all evil.

Multithreading and asynchronous drawing, for example, can be optimized to introduce other experience issues. One of the best ways to avoid multithreading problems is to not use multithreading.

For performance optimization, more is to get the experience of checking/correcting problems, but also master some methods and thinking.

The most important thing is that the process of finding the problem, often one or a few lines of code may cause the impact, but it takes a lot of time to locate.

Hope you have better thinking and experience more exchanges!