This article was translated from Overlooked Essentials For Optimizing Code by Walter Bright of Dr. Dobb’s Blogger

A sample

  1. Use a profiler.
  2. View the pool code as the program executes.

People who use these two techniques will succeed in writing fast code, people who don’t will not. Let me explain it for you.

Use a Profiler

We know that 90% of the time a program runs is spent on 10% of the code. I found that to be inaccurate. Over and over again, I’ve found that almost all programs spend 99% of their running time on 1% of the code. But which 1%? A good Profiler can tell you the answer. Even if it took 100 hours to optimize on the 1% of code, it would be far more profitable than if it took 100 hours to optimize on the other 99%. What’s the problem? People don’t use profilers? It isn’t. One place I worked used a flashy and luxurious Profiler, but the packaging was still new 3 years after I bought the Profiler. Why don’t people use it? I really don’t know. Once, my colleague and I went to an overloaded exchange, and my colleague insisted that he knew where the bottleneck was. After all, he was an experienced expert. Eventually, I ran my Profiler on his project, and we found that the bottleneck was entirely in an unexpected place. It’s like a racing car. The team is winning with sensors and logs, which provide all of it. You can adjust a racer’s pants to make them more comfortable during a race, but that won’t win the race, nor will it make you more competitive. If you don’t know that your speed is down because of the engine, exhaust, empty body dynamics, tyre pressure, or the driver, then you won’t win. Why is programming different? As long as you don’t measure, you can never improve. There are so many profilers available in the world. Pick any one and you can see the hierarchy of calls to your function, the number of calls, the previous breakdown of each code (even down to assembly level). I’ve seen too many programmers shy away from profilers and instead spend their time on useless, misguided “optimizations” while being humiliated by their competitors. When using Profiler, the key points are: 1) functions that take a lot of time to optimize their algorithms, and 2) functions that are called a lot of times — if a function is called 300K times per second, you only need to optimize 0.001 milliseconds, which is a pretty big optimization. This is what the authors call 1% of code using 99% of CPU time.)

Call waiting welfare

1. Recently sorted out 20G resources, including product/operation/test/programmer/market, etc., and Internet practitioners [necessary skills for work, professional books on the industry, precious books on interview questions, etc.]. Access:

  • Scan the code of wechat to follow the public account “Atypical Internet”, forward the article to the moments of friends, and send the screenshots to the background of the public account to obtain dry goods resources links;

2. Internet Communication Group:

  • Pay attention to the public account “atypical Internet”, in the background of the public account reply “into the group”, network sharing, communication;

Viewing assembly code

A few years ago, I had a colleague, Mary Bailey, who taught remedial algebra at the University of Washington, and at one point, she wrote x + 3 = 5 on the blackboard and asked her students, “Solve for X,” and they didn’t know the answer. So she wrote down: __ + 3 = 5. Then she asked the students to “fill in the blanks” and all the students could answer. The unknown x is like a magic letter that makes people think, “X means algebra, and I didn’t take algebra, so I don’t know how to do this.” Assembler is the algebra of the programming world. If someone asks me “Is the inline function expanded by the compiler?” Or “If I write I *4, will the compiler optimize it for a left shift?” . At these times, I recommend that they look at the compiler’s assembly code. Isn’t that a rude and useless answer? Usually, after I answer the questioner like this, the question usually says, SORRY, I don’t know what assembly is! Even C++ experts would say so. Assembly language is the simplest programming language (even compared to C++), as in:

ADD ESI,x

Is (C-style code)

ESI += x;

And:

CALL foo

It is:

foo();

The details vary depending on the type of CPU, but this is how it works. Sometimes, we don’t even need the details, just look at what assembly code looks like and compare it to the source code to see how much assembly code there is. So how does this help code optimization? For example, I met a programmer a few years ago who thought he should discover a new, faster algorithm. He has a benchmark that proves this algorithm, and he wrote a really nice article about his algorithm. However, someone looked at his original algorithm and a compilation of the new algorithm and found that his improved version of the algorithm allowed his compiler to convert two division operations into one. It really has nothing to do with algorithms. We know that division is an expensive operation, and in his algorithm, the two divisions are in an embedded loop, so of course his improved algorithm is faster. However, with only a small change to the old algorithm – a division operation – the old algorithm will be just as fast as the new one. And his new discovery was nothing. In the next example, a D user posts a benchmark to show that DMD (the Digital Mars D compiler) is bad at integer algorithms, while LDC (the LLVM D compiler) is much better. For such a result, it is quite doubtful. I took a quick look at the assembly and saw that the two compilers compiled pretty much the same, and that nothing obvious was responsible for such a 2:1 difference. But we see that there is a division on a long integer that calls the runtime. This library becomes a time killer, and all other additions and subtractions have no effect on speed. Surprisingly, Benchmark has nothing to do with algorithmic code generation and is all about division of long integers. This exposes the poor implementation of long division in DMD’s runtime. After correction, you can increase the speed. So, it has nothing to do with the compiler, but you won’t see it without looking at the assembly. Looking at assembly code often gives you some unexpected insight into why a program behaves the way it does. Unexpected function calls, unexpected pride, things that shouldn’t exist, everything. But you don’t need to be an assembly code hacker to do it.

conclusion

If you feel you need better execution, the basic approach is to use a profiler and be willing to look at the assembly code to find bottlenecks. Only when the bottleneck is found is the time to really think about how to improve, such as thinking about a better algorithm, using faster language optimization, and so on. The conventional wisdom is to pick the best algorithm rather than microoptimize it. While there’s nothing wrong with that, there are two things you should focus on that school didn’t teach you. First and most important, if the algorithm you are optimizing is not involved in the performance of your program, then you are wasting time and energy optimizing it and taking your attention away from the parts that should be optimized. Second, the sum of the algorithm’s performance is closely related to the data processing, even if bubble sort has so many jokes, but if the data processing is basically sorted, only a few of the data is not sorted, then bubble sort is also the best performance of all sorting algorithms. So worrying about not using a good algorithm and not measuring it is a waste of time, yours or your computer’s. Just as a bottom order for racing parts doesn’t get you closer to the championship (even if you install the parts correctly), without profilers you won’t know what the problem is, without looking at the assembly you may know what the problem is, but you often don’t know why. (Full text)

Author: increasingly, blog: https://coolshell.cn/articles/18190.html