Qin Di, a technical expert in Weibo RESEARCH and development Center, joined Weibo in 2013. She is responsible for the design and development of the communication system of weibo platform, the development and maintenance of basic tools of Weibo platform, and the improvement of the architecture of Weibo platform. She is good at troubleshooting all kinds of difficult and miscellaneous diseases of complex systems. In recent years, I focus on the architecture design and performance optimization of large-scale systems. I am a serious code cleanliness addict: I take code review as my responsibility. I am a serious tool addict: I use tools to solve problems with ready-made tools, and write tools to solve problems that cannot be solved by tools. In my spare time, I like to write code in another language occasionally to relax.

“There are two types of code duplication: intra-module duplication and inter-module duplication. Whatever kind of repetition, it says something about the quality of the programmers.” – Qin Di

In this second installment of my series on bad code, I’ll discuss how to evaluate code as efficiently and objectively as possible. After Posting all that crap about bad code (above) (see link at the end of this article), it turned out to be surprisingly popular, with many people writing about one problem or another in their own code. Recently, the department organized bootcamp, and I was in charge of training code quality. During the training course, we spent a lot of time discussing, improving and perfecting our own code. Although the students who just graduated were very careful about the quality of the code, the quality of the final presentation still did not reach the “very good” level. The main reason is a lack of understanding of what good code “should” look like.

What is good code

The first step in writing code is to understand what good code is. When preparing for bootcamp, I had a hard time with this problem. I tried to distinguish between “excellent”, “good” and “bad” with some precise definitions. But in the process of summarizing, the descriptions of “what makes good code” are mostly unworkable.

1.1. Definition of good code

A casual Internet search for “elegant code” turned up the following definition:

Bjarne Stroustrup, father of C++ :

The logic should be clear and bugs hard to hide;

Minimum dependence, easy to maintain;

Error handling is entirely based on a clear policy;

Near-optimal performance, avoiding code clutter and unprincipled optimization;

Clean code only does one thing.

Grady Booch, author of Object-oriented Analysis and Design:

Clean code is simple and straightforward;

Clean code that reads like well-written prose;

Clean code never disgusts the designer’s intent, but has a small amount of abstraction and clear lines of control.

Michael Feathers, The Art of Tinkering with Code

Clean code always looks like it was written by someone who cares about code quality;

There are no obvious areas for improvement;

The author of the code seems to have thought of everything.

It all seems to make sense, but when it comes to judging, it can be difficult, especially for newcomers, to understand “simple, straightforward code” or “no obvious improvements”.

And practice, many students are faced with this problem: to your own code is always in a state of the in the mind is not steadfast, or feel very good, but others think that is very bad, even a few times and my new classmates because of code quality standard discussion for several days in a row, but no one can persuade who: We all insist that our standards for good code are correct.

After countless code reviews, I think this chart summarizes it better:

The evaluation standard of code quality is similar to literary works in a sense. For example, the evaluation of the quality of novels mainly comes from its readers, forming a relatively objective evaluation from individual subjective evaluation. It does not depend on the number of words, or what figures of speech the author uses, which seem completely objective but actually have no meaning.

But code, unlike fiction, actually has two readers: computers and programmers. As I said in the previous article, this code can be understood and run by a computer even if all programmers can’t read it.

Therefore, I need to analyze the definition of code quality from two dimensions: subjective, understood by human beings; And then there’s the objective, what’s going on inside the computer.

Since there is a subjective part, there will be individual differences, and the evaluation of the same piece of code will come to different conclusions depending on the level of the people who are looking at the code. This is also the problem faced by most newcomers: they don’t have an evaluation standard that can be implemented, so the quality of the written code is difficult to improve.

Some of the articles on code quality are all about trends or principles that are true, but don’t provide much practical guidance. So in this article I want to make the criteria for evaluating code as far as possible in a way that (I think) has nothing to do with actual performance.

1.2. Readable code

After much deliberation, I decided to prioritize readability: would a programmer want to work on a project that has bugs but can be understood, or a project that doesn’t have bugs but can’t be understood? If it’s the latter, just close the page and do something more meaningful to you.

1.2.1. Word for word translation

Many books on code quality emphasize the idea that programs should be seen first and executed by machines second, and I tend to agree with that view. When evaluating whether a piece of code can be understood, I used to ask the author to translate the code word for word into Chinese, try to form a sentence, and then read the Chinese sentence to another person who has not read the code. If the other person can understand the code, the readability of the code is basically qualified.

The reason for this is simple: this is what other people do when they understand a piece of code. The person reading the code will read the sentence word by word and infer the meaning of the sentence. If the sentence doesn’t make sense, they will need to understand the code in context. If the context doesn’t make sense, they may need to know more details about other parts of the code to help them infer the meaning. In most cases, the more context you need to understand what a sentence of code is doing, the worse the quality of the code.

The benefit of word-for-word translation is that it makes it easy for the author to spot assumptions and readability traps that are not reflected in the code but are only known to him. Most code that doesn’t translate literally is bad code, like “ms for messageService,” or “Ms. Proc () for sending a message,” or “TMP for current file.”

1.2.2. Follow the conventions

Conventions include conventions for how code and documentation are organized, how comments are written, coding styles, and so on, which are important for future maintenance of code. There’s no hard and fast rule on which conventions to follow, but I prefer to keep more people’s conventions.

Being consistent with open source projects is generally a good idea, and it’s also a good idea to follow your company’s internal coding style. However, if the coding style within the company is in conflict with the current open source project style, it is often a sign that the company’s technology tends to be closed, or is a little out of date.

But either way, it’s better to stick to an agreement than to create your own rules, which reduces the cost of understanding, communication, and maintenance. If a project creates some strange rules on its own, it probably means the author hasn’t seen enough code.

Whether a project follows a convention often requires some experience from the code reader or a static checking tool like CheckStyle. If you don’t feel like you have a place to start, you probably won’t have a problem following Google in most cases: check out Google Code Style, some of which is available in Chinese.

In addition, there is no need to obsess over the benefits of following the agreement, like whether it is better to walk on the left or on the right. Even if the conclusion is reached, it is meaningless. Most of the agreement is just kept.

1.2.3. Documentation and notes

Documentation and comments are an important part of a program. They are one of the ways to understand a project or project. In some cases, the positioning of the two can overlap or cross (for example, javadoc is actually a document).

The standard for documents is very simple, can be found, can read it, generally SPEAKING, I am more concerned about these types of documents:

For an introduction to the project, including features, authors, directory structure, etc., the reader should have a general understanding of what the project does in 3 minutes.

With QuickStart for newcomers, the reader should be able to build code and use it easily in under an hour, as documented.

Detailed user description documents, such as interface definitions, parameter meanings, and design, help readers understand how to use these functions (or interfaces).

Some annotations are actually documents, such as the javadoc mentioned earlier. This will put the source code and comments together, making it clearer to the reader and simplifying the maintenance of many documents.

There is also a class of comments that are not part of the documentation, such as comments inside functions, whose job is to explain what the author was thinking while coding that the code itself cannot express, such as “why is XXX not done here?” or “Note the XXX problem here.”

In general, the first thing I care about is the number of comments: there shouldn’t be many or none inside a function, and my rule of thumb is that scrolling a few screens and seeing one or two is normal. Too much can mean that the code itself is not readable, while nothing at all can mean that some hidden logic is not explained, and you need to consider adding a little comment.

Second, consider the quality of comments: Comments should provide more information than the code, on the basis that the code is readable enough. More documentation and comments is not always better, and they can lead to increased maintenance costs. For a discussion of this section, please refer to the brief section.

1.2.4. Recommended reading

The Code Clean Way

1.3. Releasable code

A typical feature of rookie code is that it has a lot of inadequacies due to lack of experience in maintaining projects. For example, there seems to be nothing wrong with the test, and there are many unexpected problems after the project is released. And after the problem does not know where to start troubleshooting, or can only let the system in an unstable state, rely on some coincidence to operate reluctantly.

1.3.1. Handling exceptions

Novice programmers generally don’t have the awareness to handle exceptions, but the actual environment in which your code runs is full of exceptions: servers will crash, networks will time out, users will mess around, and malicious people will attack your system.

My first impression of a piece of code’s exception-handling capabilities comes from unit test coverage. Most exceptions are difficult to reproduce in a development or test environment, and it is difficult for even a professional test team to simulate all exceptions in an integrated test environment.

And unit tests can be relatively simple simulate all kinds of abnormal situation, if a module of the unit test coverage is even less than 50%, it is hard to imagine the code takes into account the abnormal case handling, even considering the, the branch of the exception handling could not be verified, how to count on the problems in the actual running environment well?

1.3.2. Processing concurrency

I get a lot of resumes that say, “I’m good at concurrent programming/I’m good at multithreading,” and I talk to them about locks and mutexes and thread pools and synchronization and semaphores and stuff. And give an applicant a real scene, let the applicant write a very simple concurrent programming small procedures, can write good but not much.

In fact, concurrent programming is really hard. If the difficulty of writing good synchronous code is 5, the difficulty of concurrent programming is 100. This is not to be alarmist, many seemingly stable programs may still have problems in the face of concurrent scenarios: for example, recently we encountered a Linux kernel crash due to synchronization problems when calling a system function.

The key to high-quality concurrent programming is not whether a synchronization strategy is applied, but whether the code protects shared resources:

Access to memory outside of local variables has concurrency risks (accessing properties of objects, accessing static variables, etc.)

Accessing shared resources also has concurrency risks (such as caches, databases, and so on).

Callers that are not declared thread-safe are likely to have concurrency problems (such as Java’s HashMap).

All time-dependent operations, even if each step is thread-safe, still have concurrency issues (such as deleting a record and then subtracting the number of records by one).

The first three cases can be easily identified in the code itself by simply developing sensitivity to calls to shared resources.

In the latter case, however, it is often hard to tell simply by looking at the code, or even if the two concurrent calls are not in the same program (such as two systems reading and writing to the same database, or two concurrent calls to different modules of the same program). However, any “do A first, do B later” logic in your code that allows you to access shared resources without locking them may be A cause for alarm.

1.3.3. Optimize performance

Performance is an important index to evaluate the ability of programmers, many programmers are also interested in the performance of the program. However, the performance of the program is difficult to see directly from the code, often with the help of some performance testing tools, or in the real environment to get results.

From a code standpoint alone, there are two ways to evaluate execution efficiency:

The time complexity of the algorithm, the high time complexity of the program running efficiency will inevitably be low.

A single step is time-consuming. Do less time-consuming operations, such as accessing databases and I/OS.

In the actual work, we also see that some programmers are too keen on optimizing efficiency, which will lead to the decrease of program legibility, increase of complexity, or increase of construction period. In this case, it’s easy to ask the author to say what the bottleneck is, why it’s there, and what the benefits of optimization are.

Of course, the best way to judge performance metrics, whether underoptimized or overoptimized, is to use data rather than code, and performance testing is beyond the scope of this article.

1.3.4. Log

Log represents the degree of difficulty in troubleshooting when problems occur in the program, jing (Chang) feng (CAI) rich (Keng) programmers will probably have encountered this scene: when troubleshooting problems, there is not a log, can not find the value of a variable do not know what is, leading to the analysis of the problem in the end where.

There are three criteria for evaluating logs:

Whether the logs are sufficient requires logs for all exceptions, external calls, and key points on the entry, exit, and path of a call link.

The log is clearly expressed, including whether it can be understood and the log style is consistent. This is evaluated by the same criteria as code readability.

Whether the log contains sufficient information, including the context of the call, external return values, and keywords used for query, to facilitate information analysis.

For online systems, the number of logs can generally be controlled by adjusting the log level, so the code that prints logs is generally acceptable as long as it does not create a barrier to reading.

1.3.5. Read more

Release It!: Design and Deploy Production-Ready Software

Numbers Everyone Should Know

1.4. Maintainable code

Maintainable code is a little more ambiguous than the first two types of code, because it refers to the future, and it is hard for the average newcomer to imagine what the future will be like if something is done now. But in my experience, generally speaking, just ask two questions repeatedly:

What if he leaves?

What if he didn’t?

1.4.1. Avoid duplication

Almost all programmers know to avoid copying code, but copying code is inevitably a maintainability killer.

There are two types of code duplication: intra-module duplication and inter-module duplication. No matter what kind of repetition, to a certain extent, it shows that the level of the programmer has a problem, the problem of module repetition is bigger, if in the same file can appear a large number of repeated code, it means that he is likely to write any incredible code.

Repeated detection does not require repeated reading of the code, and modern ides generally provide tools to check for repeated code with just a few mouse clicks.

In addition to code duplication, many new programmers interested in maintaining code quality are prone to another type of duplication: information duplication.

I’ve seen newbies like to preface each line of code with a comment such as:

// length of memberList >0 and 0 && memberlist.size () < 200) {

// Returns the current member list

return memberList;

}

This might seem obvious, but after a few years, this code becomes:

/ / the length of the member list > 0 and 0 && memberList. The size () < 200 | | (TMP) isOpen () && flag)) {

// Returns the current member list

return memberList;

}

After that it might look something like this:

//

/ / the length of the member list > 0 and 0 && memberList. The size () < 200 | | (TMP) isOpen () && flag)) {

// Returns the current member list

//    return memberList;

/ /}

if(tmp.isOpen() && flag) {

return memberList;

}

As the project progresses, useless information accumulates and eventually it becomes impossible to tell what is valid and what is not.

If you find several things doing the same thing in a project, such as using comments to describe what the code is doing, or relying on comments to replace versioning capabilities, that code is not good code either.

1.4.2. Module division

High cohesion within modules and low coupling between modules are the standards followed by most designs. Through reasonable module division, complex functions can be broken down into smaller function points that are easier to maintain.

Generally speaking, a module partition can be preliminarilyevaluated from the code length, a class length greater than 2000 lines, or a function length greater than two screens are relatively dangerous signals.

Another area that shows the level of module partitioning is dependencies. If a module is particularly dependent, or even cyclic dependence, it can also reflect the author’s poor planning of the module, and it is likely to affect the whole situation when maintaining the project in the future.

In general, there are many tools to provide dependency Analysis, such as the Dependencies Analysis function provided in IDEA. Learning to use these tools will be very helpful in evaluating code quality.

It is worth noting that in most cases, poor module partitioning is accompanied by extremely low unit test coverage: unit tests for complex modules are very difficult to write, or even impossible. So looking directly at unit test coverage is also a reliable way to measure it.

1.4.3. Simplicity and abstraction

Whenever you talk about code quality, you’re bound to get adjectives like simplicity and elegance. The word “simplicity” actually covers a lot of things. Code that avoids duplication is simplicity, design that is abstract enough is simplicity, and any attempt to improve maintainability is really an attempt to subtract.

Inexperienced programmers often fail to recognize the importance of simplicity and are happy to tinker with complex things. But complexity is the enemy of code maintainability and a barrier to programmer competence.

Programmers crossing the threshold should be able to control the increasing complexity, summarize and abstract the essence of things, and incorporate it into their own design and code. The life cycle of a program is also an iterative process from simplicity to complexity.

It is difficult for me to conclude a simple and easy evaluation standard for this part. It is more like a way of thinking, which needs to be understood and practiced. Look more, think more, communicate more, many times can simplify things will be much more than the original estimate.

1.4.4. Recommended reading

Refactoring – Improving the Design of Existing Code

Design Patterns: The Foundation of Reusable Object-oriented Software

Software Architecture Patterns-Understanding Common Architecture Patterns and When to Use Them

2, endnotes

This article focuses on some of the tools for evaluating code quality, some of which are more objective than others. As mentioned earlier, evaluating code quality is a subjective matter, and there are many ways to do it in this article. But in fact, a lot of the code that I think is ok will be ridiculed by others, so this article is only a first draft, and more content needs to be added and improved in the future. While the propensity to evaluate code quality varies from person to person, overall the ability to evaluate code quality can be likened to a programmer’s “taste,” with accuracy increasing with experience. In this process, you need to keep thinking, learning and critical spirit at all times.

In the next article, I’ll talk about how to improve the quality of your code.


Original: http://www.open-open.com/lib/view/open1454117276261.html