Reprinted from Wang Yin, original address
Programming is a creative work. It’s an art. Mastering any art requires a lot of practice and comprehension. Therefore, the “wisdom” proposed here is not a weight-loss medicine that claims to lose ten catty a day. It cannot replace your own diligence. However, since the software industry likes to be creative and make simple things complicated, I hope that these words can point people in the right direction, so that they can take fewer detdetments and basically achieve what you reap.
Iterate over the code
Some people like to brag about how many thousands of lines of code they have written, as if the number of lines of code is the measure of good programming. However, it is impossible to improve your programming skills if you always write code in a hurry and never go back to review, modify, and refine it. You’ll produce more and more mediocre or even bad code. In this sense, what many people call “work experience” is not necessarily proportional to the quality of their code. If you have decades of experience and never go back to refine and reflect on your code, you’re probably not as good as someone with a year or two of experience who likes to iterate and understand.
A great writer once said, “The quality of a writer is not measured by how many words he publishes, but by how many he throws away in his wastebasket.” I think the same theory applies to programming. Good programmers delete far more code than they leave behind. If you see a person who writes a lot of code and doesn’t delete much of it, their code must have a lot of junk.
Like literature, code can’t be created overnight. Inspiration seems to come in dribs and drabs. It’s impossible for anyone to write in one stroke, and even the most talented programmer takes a while to discover the simplest and most elegant way to write. Sometimes you refine a piece of code over and over again and think you’re at the top of your game, and then you look back a few months later and see a lot of things you could have improved and simplified. It’s the same with writing articles. You can always look back at something you wrote months or years ago and see some improvements.
So if refining code over and over is no longer going anywhere, you can put it down for a while. Look back in a few weeks or months with a fresh idea. Over and over again, you’ll have the inspiration and wisdom to move directly in the right direction, or close to the right direction, when faced with a new problem.
Write elegant code
People hate spaghetti code because it goes round and round like a noodle. So what does elegant code look like? After years of observation, I found that elegant code has some distinct features in shape.
If we ignore the specifics, elegant code in general looks like neat boxes that fit together. It’s easy to understand if you draw an analogy with tidying your room. If you throw everything in a big drawer, they all get mixed up. It’s harder to organize and find what you need quickly. But if you put a few smaller boxes in the drawer and put things in different categories, they won’t wander around and you can find and manage them more easily.
Another characteristic of elegant code is that its logic, by and large, looks like a tree with distinct branches. That’s because almost everything a program does is pass and branch information. You can think of code as a circuit where current flows through wires, shunt, or confluence. If you think about it this way, your code will have fewer if statements with only one branch, and it will look something like this:
if (...) { if (...) {... } else { ... } } else if (...) {... } else { ... }Copy the code
Notice that? In my code, if statements almost always have two branches. They can be nested, have multiple levels of indentation, and else branches can have a small amount of duplicate code. However, such a structure, the logic is very tight and clear. I’ll tell you later why it’s best to have two branches for an if statement.
Write modular code
Some people argue about making programs “modular” and end up dividing code into multiple files and directories called “modules”. They even put these directories in different VCS repOs. As a result, this approach did not lead to smooth cooperation, but to a lot of trouble. This is because they don’t really understand what a “module” is, and superficially slicing up the code and placing it in different places, not only fails to achieve the goal of modularity, but also creates unnecessary problems.
True modularity is not in a textual sense, but in a logical sense. A module should be like a circuit chip, with well-defined inputs and outputs. In fact, a good modularity approach already exists, and its name is “function”. Each function has a specific input (parameter) and output (return value). Multiple functions can be contained in the same file, so you don’t need to separate your code into multiple files or directories. I can write it all in the same file and still have very modular code.
To achieve good modularity, you need to do the following:
-
Avoid writing long functions. If you find that the function is too large, you should break it up into smaller ones. Usually I don’t write functions longer than 40 lines. For comparison, the average laptop screen can hold 50 lines of code. I can see a 40-line function at a glance without scrolling. The reason it’s only 40 lines instead of 50 is that 40 lines of code is the maximum I can see without moving my eyes.
If I look at the code without blinking, I can map the entire piece of code to my optic nerve, so that I can see the code even when I suddenly close my eyes. I’ve found that when you close your eyes, your brain processes code more efficiently, and you can imagine what other shapes that piece of code can take. 40 lines is not a big limit, because the more complex parts of a function are often extracted and made into smaller functions and then called from the original function.
-
Make small tool functions. If you look at the code, there’s a lot of overlap. This common code, however short, can be useful to extract as a function. Some helper functions may be only two lines long, but they greatly simplify the logic in the main function.
Some people don’t like using small functions because they want to avoid the overhead of function calls and end up writing functions that are hundreds of lines long. This is an outdated concept. Modern compilers automatically inline a small function to the place where it is called, so no function call is made at all and no extra overhead is incurred.
Similarly, some people prefer to use macros instead of small functions, which is also an outdated concept. In early C compilers, only macros were statically “inlined,” so they used macros for the purpose of inlining. Inlining, however, is not the fundamental difference between macros and functions. Macros are very different from functions (which I’ll cover later) and should be avoided as much as possible. Using macros for inline purposes is an abuse of macros, which can cause all sorts of problems, such as making programs hard to understand, difficult to debug, error-prone, and so on.
-
Each function does one simple thing. Some people like to make “generic” functions that do both this and that, internally “choosing” what the function does based on certain variables and conditions. For example, you might write a function like this:
void foo() { if (getOS().equals("MacOS")) { a(); } else { b(); } c(); if (getOS().equals("MacOS")) { d(); } else { e(); }}Copy the code
The person who writes this function does different things depending on whether the system is “MacOS.” You can see that in this function, only c() is common to both systems, and the other a(), b(), D (), and e() belong to different branches.
This “reuse” is actually harmful. If a function can do two things, and they have less in common than they have in difference, then you’d better write two different functions, otherwise the logic of the function will not be very clear and error-prone. In fact, the above function can be rewritten as two functions:
void fooMacOS() { a(); c(); d(); } Copy the code
and
void fooOther() { b(); c(); e(); } Copy the code
If you find that two things are mostly the same with only a few differences, more often than not you can extract the same parts and make an auxiliary function. For example, if you have a function that looks like this:
void foo() { a(); b() c(); if (getOS().equals("MacOS")) { d(); } else { e(); }}Copy the code
Where a(), b() and c() are the same, only D () and e() differ according to the system. So you can extract a(), b(), c() :
void preFoo() { a(); b() c(); Copy the code
Then make two functions:
void fooMacOS() { preFoo(); d(); } Copy the code
and
void fooOther() { preFoo(); e(); } Copy the code
This way, we both share code and do one simple thing for each function. This code, the logic is much clearer.
-
Avoid using global variables and class members to pass information. Use local variables and parameters instead. Some people write code that often uses class members to pass information, like this:
class A { String x; void findX() { ... x = ... ; } void foo() { findX(); . print(x); }}Copy the code
First, he uses findX() to write a value to the member X. And then we use the value of x. In this way, x becomes the data channel between findX and print. Since X belongs to class A, the program loses its modular structure. Since these two functions depend on member X, they no longer have explicit inputs and outputs, but rely on global data. FindX and Foo can no longer exist without Class A, and since class members can be changed by other code, the code becomes difficult to understand and ensure correctness.
If you use local variables instead of class members to pass information, the two functions don’t need to depend on a class, and are easier to understand and less error-prone:
String findX() { ... x = ... ; return x; } void foo() { String x = findX(); print(x); }Copy the code
Write readable code
Some people think that writing lots of comments will make their code more readable, but find that it doesn’t. Instead of making your code readable, comments make your program hard to read by flooding it with comments. And once the logic of the code is changed, many comments become outdated and need to be updated. Modifying comments can be quite a burden, so a large number of comments can become a stumbling block to improving your code.
In fact, truly elegant and readable code requires almost no comments. If you find yourself writing a lot of comments, your code is likely to be vague and illogical. In fact, procedural language is more powerful and rigorous than natural language. It actually has the main elements of natural language: subject, predicate, object, noun, verb, if, then, otherwise, yes, no… So if you take full advantage of the expressive power of a programming language, you can use the program itself to express what it is really doing, without the help of natural language.
On rare occasions, you might do something counterintuitive to get around design issues in other code. You can use a very short comment to explain why it’s written that way. This should happen less often, otherwise it means that the entire code design is flawed.
If you don’t take advantage of what programming languages offer, you’ll find that your programs are still so difficult to understand that you’ll need to write comments. So HERE are some tips that may help you greatly reduce the need to write comments:
-
Use meaningful function and variable names. If the names of your functions and variables actually describe their logic, then you don’t need to write comments to explain what they’re doing. Such as:
// put elephant1 into fridge2 put(elephant1, fridge2); Copy the code
Since my function name, put, plus the two meaningful variable names elephant1 and Fridge2, already say what this is about (putting elephants in the fridge), that comment is completely unnecessary.
-
Local variables should be as close to where they are used as possible. Some people like to define a lot of local variables at the beginning of a function and then use it far below, like this:
void foo() { int index = ... ; . . bar(index); . }Copy the code
Since neither index is used nor the data on which it depends is changed, the variable definition can actually be moved closer to where it is used:
void foo() { ... . int index = ... ; bar(index); . }Copy the code
This allows the reader to look at bar(index) and see how index is computed without looking far up. And this short distance can strengthen the reader’s understanding of the “order of calculation” here. Otherwise, if index is at the top, the reader might suspect that it actually holds some kind of data that changes, or that it has been modified later. If index is placed below, the reader knows clearly that index does not hold a variable value and that it has not changed since it was calculated.
If you see the local variables for what they are — the wires in the circuit — you can better understand the benefits of proximity. The closer the variable definition is to where it is used, the shorter the length of the wire. You don’t have to touch a wire and look far around to find the port that receives it, so the circuit is easier to understand.
-
Local variable names should be short. This seems to conflict with the first point: how can short variable names be meaningful? Note that I’m talking about local variables here, because they are local, plus point 2 has put them as close to where they are used as possible, so you can easily know what they mean based on the context:
For example, you have a local variable that indicates whether an operation succeeded:
boolean successInDeleteFile = deleteFile("foo.txt"); if (successInDeleteFile) { ... } else { ... } Copy the code
The local variable successInDeleteFile doesn’t have to be so verbose. Because it is used only once and is used on the next line, the reader can easily see that it is the result returned by deleteFile. If you rename it success, the reader will know from a little context that it means “success in deleteFile”. So you can change it to this:
boolean success = deleteFile("foo.txt"); if (success) { ... } else { ... } Copy the code
Not only does this not omit any useful semantic information, but it is also easier to read. SuccessInDeleteFile, a camelCase, is an eyesore if it’s more than three words in a row. So if you can use one word to mean the same thing, that’s all the better.
-
Do not reuse local variables. Many people write code that doesn’t like to define new local variables. Instead, they like to “reuse” the same local variables, assigning them over and over to mean completely different things. For example:
String msg; if (...) { msg = "succeed"; log.info(msg); } else { msg = "failed"; log.info(msg); } Copy the code
While this is logically fine, it is difficult to understand and confusing. The variable MSG is assigned twice, representing two completely different values. They are immediately used by log.info and not passed anywhere else. This assignment unnecessarily increases the scope of a local variable, giving the impression that it may change in the future and be used elsewhere. It would be better to define two variables:
if (...) { String msg = "succeed"; log.info(msg); } else { String msg = "failed"; log.info(msg); } Copy the code
Since the scope of the two MSG variables is limited to the if statement branch they are in, you can clearly see the scope of the two MSGS and know that there is no relationship between them.
-
Take complex logic out and make it a “help function.” Some people write functions that are so long that they can’t see what’s going on inside, so they mistakenly think they need to write comments. If you look closely at the code, the bits of code that are not clear can often be extracted, made into a function, and then called in the same place. Since functions have a name, you can use meaningful function names instead of comments. Here’s an example:
. // put elephant1 into fridge2 openDoor(fridge2); if (elephant1.alive()) { ... } else { ... } closeDoor(fridge2); .Copy the code
If you take this code out and define it as a function:
void put(Elephant elephant, Fridge fridge) { openDoor(fridge); if (elephant.alive()) { ... } else { ... } closeDoor(fridge); } Copy the code
So the original code can be changed to:
. put(elephant1, fridge2); .Copy the code
It’s clearer, and there’s no need for comments.
-
Extract complex expressions and make intermediate variables. Some people hear that “functional programming” is a good thing, but don’t understand what it really means, and use a lot of nested functions in their code. Like this:
Pizza pizza = makePizza(crust(salt(), butter()), topping(onion(), tomato(), sausage())); Copy the code
Such a line of code is too long and nested to be easy to read. Well trained functional programmers know the benefits of intermediate variables and do not blindly use nested functions. They would change the code to something like this:
Crust crust = crust(salt(), butter()); Topping topping = topping(onion(), tomato(), sausage()); Pizza pizza = makePizza(crust, topping); Copy the code
This not only effectively controls the length of a single line of code, but also makes the steps clear and easy to understand due to the “meaning” of the introduced intermediate variables.
-
Break lines where it makes sense. For most programming languages, the code logic is whitespace independent, so you can wrap almost anywhere, or not. Such language design is a good thing because it gives programmers the freedom to control the formatting of their code. However, it also causes some problems, because many people do not know how to make a proper line break.
Some people like to take advantage of the IDE’s line wrapping mechanism. After editing, the IDE reformats the entire code with a hotkey, and the IDE automatically folds the code that exceeds the line size limit. But this automatic line, often not according to the logic of the code to carry out, can not help understand the code. Wrapping might result in code like this:
if (someLongCondition1() && someLongCondition2() && someLongCondition3() &&
someLongCondition4()) {
...
}
Copy the code
Because someLongCondition4() exceeds the line width limit, the editor automatically changes it to the following line. Although the line size limit is met, the position of the newline is quite arbitrary and does not help you understand the logic of the code. These Boolean expressions are all concatenated with &&, so they are on equal footing. To express this, when you need to fold lines, you should place each expression on a new line, like this:
if (someLongCondition1() &&
someLongCondition2() &&
someLongCondition3() &&
someLongCondition4()) {
...
}
Copy the code
So everything is aligned, and the logic is clear. Here’s another example:
log.info("failed to find file {} for command {}, with exception {}", file, command,
exception);
Copy the code
This line, because it’s too long, is folded automatically like this. File, command, and exception are the same thing, but two of them are left on the first line, and the last one is folded into the second line. It would have been better if the manual line feed had looked like this:
log.info("failed to find file {} for command {}, with exception {}",
file, command, exception);
Copy the code
The logic is clearer by placing the format string on its own line and its arguments on a separate line.
To prevent the IDE from messing up the manually adjusted newlines, many IDES (such as IntelliJ) have a “keep the newline” option in their auto-formatting Settings. If you find that the IDE’s line breaks are illogical, you can modify these Settings and keep your own manual line breaks in some places.
At this point, I must warn you that when I say “let the code speak for itself without comments,” I don’t mean to make the code look like some kind of natural language. There is a JavaScript test tool called Chai that lets you write code like this:
expect(foo).to.be.a('string');
expect(foo).to.equal('bar');
expect(foo).to.have.length(3);
expect(tea).to.have.property('flavors').with.length(3);
Copy the code
This is profoundly wrong. Programming language is simpler and clearer than natural language, and this makes it look like natural language, which makes it more complex and difficult to understand.
Write simple code
Programming languages love to be different and offer “features”, some of which are not such a good thing. Many features didn’t stand the test of time and ended up causing more problems than they solved. Many people blindly pursue “short” and “concise”, or in order to show their smart mind, learn fast, so like to use some special structures in the language, write too “smart”, difficult to understand the code.
Just because a language has something to offer doesn’t mean you have to use it. In fact, you only need a fraction of these features to write good code. I’ve always been against “taking full advantage” of all features in programming languages. In fact, I have the best set of structures in mind. No matter how “magical”, “new” features the language offers, I mostly use the ones that I have honed and found trustworthy.
Now, for some of the language features in question, I’ll introduce some of the code specifications I use myself, and explain why they make code simpler.
-
Avoid adding and subtracting expressions (I ++, ++ I, I -, -i). This add-subtract operation expression is actually a design error left over from history. They have strange meanings and are very easy to mistake. They confound and intertwine two very different operations, reading and writing, and make a mess of semantics. Expressions that contain them may turn out to depend on the order in which they are evaluated, so it may run correctly under one compiler and mysteriously wrong with another.
In fact, these two expressions can be decomposed into two steps to separate reading and writing: one is to update the value of I, and the other is to use the value of I. For example, if you wanted to write foo(I ++), you could have just broken it out as int t = I; i += 1; foo(t); . If you want to write foo(++ I), you can split it into I += 1; foo(i); Open after the code, meaning is completely consistent, but a lot of clarity. It is clear whether the update is before or after the value.
You might think I ++ or ++ I is more efficient than if you took it apart, but that’s just an illusion. After basic compiler optimization, the resulting machine code is completely indistinguishable. The increment and subtraction expression is safe to use only in two situations. One is in the update part of the for loop, such as for(int I = 0; i < 5; I++). The other way is to write it as a single line, like i++; . There is no ambiguity between the two cases. You need to avoid other situations such as using complex expressions such as foo(I ++), foo(++ I) + foo(I)… No one should know, or seek to know, what that means.
-
Never omit curly braces. Many languages allow you to omit curly braces in certain situations. For example, C and Java allow you to omit curly braces when there is only one sentence in an if statement:
if (...) action1(); Copy the code
At first glance, I missed two words. How nice. But this often raises strange questions. For example, if you later want to add action2() to the if, you change the code to:
if (...) action1(); action2(); Copy the code
For aesthetics, you are careful to indent action1(). At first glance they are together, so you subconsciously think they will only be executed if the if condition is true, whereas action2() is outside the if and will be executed unconditionally. I call this “optical illusion,” a mistake that should be noticed by every programmer in theory, but is easily overlooked in practice.
So you ask, who would be so stupid as to put curly braces when I add action2()? But from a design point of view, this is actually not a reasonable approach. First of all, you might want to remove action2() later, so that you have to remove the curly braces for consistency. Does that bother you? Second, it makes code styles inconsistent, with some ifs with curly braces and some without. Besides, why do you need to remember this rule? So if you don’t want to do anything else, you just put all the curly braces, and you don’t even have to think about it, just assume that C and Java don’t provide you with this special notation. This way you can maintain complete consistency and reduce unnecessary thinking.
Some people may say, all but one sentence in curly braces, what an eyesore? However, after practicing this code specification for a few years, I don’t find it any more of an eyesore. Instead, the presence of curly braces makes the code more delimited and less burdening for my eyes.
-
Use parentheses wisely and do not blindly rely on operator precedence. Using the precedence of the operator to reduce parentheses is fine for common arithmetic expressions like 1 + 2 * 3. Yet some people hate parentheses so much that they write expressions like 2 << 7-2 * 3 without parentheses at all.
The problem here is that the priority of the shift operation << is unfamiliar to many people, and it is counterintuitive. Since x << 1 is the same thing as multiplying x by 2, many people mistakenly think this expression is the same thing as (2 << 7) – (2 times 3), so it equals 250. However, << has lower precedence than addition +, so this expression is actually equal to 2 << (7-2 * 3), so it is equal to 4!
The solution to this problem is not for everyone to memorize the operator priority table, but for the proper parentheses. For example, in the example above, it’s better to write 2 << (7-2 * 3) in parentheses. While the same meaning can be achieved without parentheses, parentheses make it clearer that the reader no longer needs to memorize the priority of << to understand the code.
-
Avoid using continue and break. A return in a loop (for, while) is fine, but if you use a continue or break, you complicate the logic and termination conditions of the loop.
A “continue” or “break” occurs because the logic of the loop has not been clearly thought through. If you’re careful, there should be little need for a continue or break. If a “continue” or “break” occurs in your loop, you should consider rewriting the loop. There are several ways to rewrite loops:
- If a continue occurs, you can often eliminate the continue by simply reversing the conditions of the continue.
- If there is a break, you can often combine the break condition with the termination condition in the head of the loop to get rid of the break.
- Sometimes you can get rid of “break” by replacing “break” with “return”.
- If all else fails, you might be able to extract the complex parts of the loop into a function call that can then be removed with a continue or break.
Let me give you some examples of these situations.
Case 1: The following code contains a continue:
List<String> goodNames = new ArrayList<>(); for (String name: names) { if (name.contains("bad")) { continue; } goodNames.add(name); . }Copy the code
It says: “If name contains the word ‘bad’, skip the loop…” Note that this is a “negative” description, which is not telling you when to “do” something, but when to “not” something. In order to know what it’s doing, you have to figure out which statements continue is causing to be skipped, and then mentally reverse the logic so you know what it’s trying to do. This is why loops with “continue” and “break” are hard to understand. They rely on “control flow” to describe “what not to do”, “what to skip”, and in the end you have no idea what it “does”.
In fact, we can easily convert this code into equivalent code without continue by simply reversing the continue condition:
List<String> goodNames = new ArrayList<>(); for (String name: names) { if (! name.contains("bad")) { goodNames.add(name); . }}Copy the code
goodNames.add(name); And all the code after it is placed inside the if, with an extra indentation, while the continue is missing. If you read the code again, it will be even clearer. Because it’s a more “positive” description. It says: “Add name to the list of goodNames when it doesn’t contain the word ‘bad’…”
Case 2: The for and while headers both have a loop “termination condition” that should be the only exit condition for the loop. If you have a break in the middle of the loop, it actually adds an exit condition to the loop. You often just need to incorporate this condition into the head of the loop to get rid of the break.
Take this code for example:
while (condition1) { ... if (condition2) { break; }}Copy the code
When condition is true, break exits the loop. You can get rid of the break by simply reversing condition2 and placing the termination condition at the head of the while. The rewritten code looks like this:
while (condition1 && ! condition2) { ... }Copy the code
This may seem to only apply when a break occurs at the beginning or end of the loop, but most of the time, the break can somehow move to the beginning or end of the loop. I don’t have specific examples yet, but I’ll add them when they come up.
Case 3: Many breaks exit the loop, followed by a return. This break can often be replaced by a return. Take this example:
public boolean hasBadName(List<String> names) { boolean result = false; for (String name: names) { if (name.contains("bad")) { result = true; break; } } return result; } Copy the code
This function checks for the presence of a name in the names list that contains the word “bad”. Its loop contains a break statement. This function can be rewritten as:
public boolean hasBadName(List<String> names) { for (String name: names) { if (name.contains("bad")) { return true; } } return false; } Copy the code
The improved code returns “bad” in name with a return true instead of assigning the result variable, breaks out, and returns last. If there is no return at the end of the loop, return false to indicate that no such name was found. Use return instead of break, so that both the break statement and the result variable are eliminated.
I’ve seen many other examples of use of continue and break, and almost invariably they can be eliminated, resulting in much cleaner code. My experience is that 99% of breaks and continues can be eliminated by replacing them with return statements or flipping the if condition. The remaining 1% contains complex logic, but can also be eliminated by extracting a helper function. The modified code becomes easy to understand and easy to ensure is correct.
Write intuitive code
I have an important rule for writing code: if there’s a more direct, clear way to write it, go for it, even if it looks longer and dumber. For example, the Unix command line has a “clever” way of writing:
command1 && command2 && command3
Copy the code
Since the Shell language logic operation a && b is “short-circuited”, if a is equal to false, then b should not be executed. This is why Command2 is executed when Command1 is successful and Command3 when Command2 is successful. In the same way,
command1 || command2 || command3
Copy the code
The operator | | have similar features. On the command line above, if Command1 succeeds, neither Command2 nor command3 will be executed. If Command1 fails and Command2 succeeds, command3 will not be executed.
This seems more subtle and concise than using an if statement to determine failure, so someone borrowed it and used it in their program code. For example, they might write code like this:
if (action1() || action2() && action3()) {
...
}
Copy the code
Can you see what this code is trying to do? Action2 and ACTION3 are executed under what conditions and are not executed under what conditions? Maybe if you think about it a little bit, you know what it’s doing: “If action1 fails, execute Action2, if action2 succeeds, execute action3.” That semantics, however, is not directly “mapped” onto the code. For example, what word in the code does the word “failure” correspond to? You can’t find out, because it is contained in the | | semantics, you need to know | | short circuit characteristics, as well as the logical or semantic to know this inside says “if action1 failure……” . Every time you look at this line of code, you need to think about it, and the load that you’re building up is going to be very tiring.
Actually, this kind of writing was an abuse of the logical operation && and | | short circuit characteristics. These two operators may not execute the expression on the right for the sake of machine efficiency, not for the sake of providing such “clever” usage for humans. These two operators are intended only as logical operations, and they are not meant to replace if statements. That is, they just happen to do what some if statements do, but you shouldn’t use them instead of if statements because of that. If you do that, you’ll make your code obscure.
The above code would be much clearer if it were written a little more clumsily:
if (!action1()) {
if (action2()) {
action3();
}
}
Copy the code
Here I can see what this code is saying without thinking: if action1() fails, execute action2(); if action2() succeeds, execute action3(). Do you see a one-to-one correspondence here? If = if! = failure… You don’t need logic to know what it says.
Write impeccable code
In the previous section, I mentioned that my code rarely has a single branch if statement. Most of the if statements I write have two branches, so a lot of my code looks like this:
if (...) { if (...) {... return false; } else { return true; } } else if (...) {... return false; } else { return true; }Copy the code
In fact, using this method is to deal with all possible situations flawlessly and avoid missing the corner case. The reason every if statement has two branches is that if the condition is true, you do something; But if the if condition is not true, you should know what else to do. Whether you have an if or an else, you have to think about it.
Many people who write if statements like to omit the else branch because they feel that some else branches are duplicated. For example, in my code, both else branches return true. To avoid duplication, they omit the two else branches and only use a return true at the end. In this case, the if statement missing the else branch “drops” the control flow to the final return true. Their code looks something like this:
if (...) { if (...) {... return false; } } else if (...) {... return false; } return true;Copy the code
This writing method seems to be more concise and avoids repetition, but it is prone to negligence and loopholes. Nested if statements omit some else, and relying on the statement’s “control flow” to handle else cases is difficult to analyze and reason about correctly. If you the if condition in use && and | | logic operations, such as whether it’s harder to see covers all situations.
Any branches that are inadvertently missed will all “drop” and return unexpected results. Even if you look at it once and are sure it’s correct, each time you read the code, you’re never sure it’s taken care of all the cases and have to reason all over again. This brevity leads to repeated, heavy mental overhead. This is called “noodle code” because the logical branches of a program are not like a tree, but like a noodle.
Another way to omit the else branch is like this:
String s = "";
if (x < 5) {
s = "ok";
}
Copy the code
The people who write this code like to use a “default” approach in mind. S defaults to null, and if x<5, mutate it to OK. The downside of this is that when x<5 doesn’t work, you have to look up to know what s is. And that’s when you get lucky, because the S is not far up there. Many people write this code when the initial value of s is some distance from the statement, and some other logic and assignment may be inserted in between. Such code, variable change to change, see people dazzling, easy to make mistakes.
Now compare how I wrote it:
String s;
if (x < 5) {
s = "ok";
} else {
s = "";
}
Copy the code
This may seem like an extra word or two, but it’s much clearer. That’s because we’ve explicitly stated what s is when x<5 is not true. It’s right there. It’s “” (empty string). Note that I did not “change” the value of S, although I also used assignment. S starts out with no value, gets assigned and never changes. The way I write it, it’s often called more “functional” because I only assign once.
If I omit the else branch, the Java compiler will not let me off the hook. It complains: “In some branch, S is not initialized.” This forces me to explicitly set the value of S for all conditions, not forgetting any of them.
Of course, since the case is relatively simple, you could also write it like this:
String s = x < 5 ? "ok" : "";
Copy the code
For more complex cases, I recommend writing the if statement instead.
Correct handling of errors
Using an IF statement with two branches is just one of the reasons my code is watertight. The idea of writing an if statement like this encapsulates a general idea for making code reliable: run through all the scenarios and leave none out.
The vast majority of the function of the program is information processing. Cut through the clutter of complex, ambiguous information and find the one you need. Correctly inferring all the “possibilities” is the core idea of writing unassailable code. What I want to do in this video is show you how to apply this idea to error handling.
Error handling is an old problem, but after decades, many people still don’t understand it. The Unix system API manual generally tells you the return values and error messages that may occur. For example, the Read system call manual for Linux reads as follows:
RETURN VALUE
On success, the number of bytes read is returned...
On error, -1 is returned, and errno is set appropriately.
ERRORS
EAGAIN, EBADF, EFAULT, EINTR, EINVAL, ...
Copy the code
Many beginners forget to check if the return value of read is -1 every time they call read. This kind of thinking is actually very dangerous. If the function returns a value that tells you to either return a positive number, which is the length of the data you read, or return -1, then you have to do something meaningful with that -1. Don’t think you can ignore this particular return value because it is a “possibility.” Missing any of the possible scenarios in your code can have catastrophic consequences.
For Java, this is relatively convenient. When a Java function has a problem, it is usually indicated by an exception. You can think of the exception plus the return value of the function as a “union type.” Such as:
String foo() throws MyException {
...
}
Copy the code
Here MyException is an error return. You can think of this function as returning a union type: {String, MyException}. Any code that calls foo must handle MyException reasonably to be sure that the program will run correctly. The Union type is a fairly advanced type that only very few languages (such as Typed Racket) currently have. I only mention it here for the sake of explaining the concept. Once you know the concept, you can actually implement a system of Union types in your head so that you can write solid code in a normal language.
Because Java’s type system forces functions to declare possible exceptions in types, and forces the caller to handle possible exceptions, it’s almost impossible to accidentally miss. But some Java programmers have a bad habit of making this security mechanism almost completely ineffective. Whenever the compiler reports an error saying, “You don’t catch the exception that foo might have,” some people don’t think twice and just change their code to something like this:
try {
foo();
} catch (Exception e) {}
Copy the code
Either put a log in it at most, or throw throws Exception on its own function type so the compiler doesn’t complain. These may seem easy, but they are wrong, and you will eventually pay for them.
If you ignore the exception catch, then you don’t know that Foo actually failed. It’s like driving on a road that says “Road closed for construction ahead”. Of course it’s going to go wrong, because you don’t know what you’re doing.
When you catch an Exception, you shouldn’t use a broad type like Exception. You should catch exactly the kind of exception A that might happen. Using a broad exception type is problematic because it inadvertently catches other exceptions (such as B). Your code logic is based on whether A is present, but you catch all exceptions, so when another Exception B is present, your code will have A puzzling problem because you think A is present when it isn’t. Such bugs are sometimes hard to find even using the debugger.
If you add throws Exception to your function type, then you inevitably need to handle the Exception in the same place as the call. If the calling function also writes throws Exception, the bug spreads even further. My rule of thumb is to try to deal with exceptions as soon as they occur. Otherwise if you return it to your caller, it may not know what to do at all.
Also, try {… } Catch should contain as little code as possible. For example, if foo and bar can both generate exception A, your code should write as much as possible:
try { foo(); } catch (A e) {... } try { bar(); } catch (A e) {... }Copy the code
Rather than
try { foo(); bar(); } catch (A e) {... }Copy the code
The first one tells you exactly which function is going wrong, whereas the second one mixes it all up. There are many advantages to being able to tell which function is going wrong. For example, if your catch code includes a log, it can give you more accurate error information, which can greatly speed up your debugging process.
Handle null Pointers correctly
The idea of exhaustion is so useful that, based on this principle, we can derive some basic principles that will give you impeccable handling of null Pointers.
First you should know that many languages (C, C++, Java, C#…) The null type system is completely wrong. This mistake originated from Tony Hoare’s earliest design. Hoare called this mistake his “billion dollar mistake”, because the loss of property and manpower caused by it was far more than one billion dollars.
The type systems of these languages allow NULL to appear anywhere an object (pointer) type can appear, but NULL is not a legal object at all. It’s not a String, it’s not an Integer, and it’s not a custom class. The type of NULL should be NULL, which is null itself. Based on this basic idea, we derive the following principles:
-
Try not to generate null Pointers. Try not to initialize variables with NULL, and try not to return null from functions. If your function returns “nothing”, “error” and so on, try to use Java’s exception mechanism. Although it’s a little awkward to write, Java exceptions, combined with the return value of a function, can be used basically as union types. For example, if you have a function find that will help you find a String or nothing, you could write:
public String find() throws NotFoundException { if (...) { return ...; } else { throw new NotFoundException(); } } Copy the code
Java’s type system forces you to catch the NotFoundException, so you can’t miss it like you missed checking for NULL. Java exceptions are also easy to abuse, but I showed you how to use them properly in the last section.
Java try… Catch syntax is pretty cumbersome and lame, so if you’re careful, functions like find can return null to indicate “not found.” It’s a little bit nicer because you don’t have to use try… The catch. Many people write functions that return null to indicate “error”, which is a misuse of NULL. “Wrong” and “not” are two very different things. “None” is a very common, normal situation, like looking at the hash table and not finding it, normal. “Gone wrong” is the rare case where something goes wrong that would normally be meaningful. If your function is going to say “something went wrong”, use exceptions instead of null.
-
Do not catch NullPointerException. Some people write nice code. They like fault tolerance. First, they write some functions that accidentally don’t check for null Pointers:
void foo() { String found = find(); int len = found.length(); . }Copy the code
When foo calls an exception, they change the location of the call to something like this:
try { foo(); } catch (Exception e) { ... } Copy the code
When found is null, NullPointerException will be caught and handled. This is actually very wrong. First, as mentioned in the previous section, catch (Exception e) is an absolute no-no because it catches all exceptions, including NullpointerExceptions. This will cause you to accidentally catch nullPointerExceptions inside a try statement and mess up your code logic.
A NullPointerException e is a catch. A NullPointerException occurs because foo lacks an internal NULL check. Now, instead of using the right medicine, you add a catch to every place you invoke it, and your life will get worse and worse. The right thing to do would be to change Foo without changing the code that calls it. Foo should be changed to this:
void foo() { String found = find(); if (found ! = null) { int len = found.length(); . } else { ... }}Copy the code
Check for null as soon as it can appear, and act accordingly.
-
Do not put NULL inside a container data structure. A collection is a collection of objects in a certain way, so NULL should not be placed in Array, List, Set, etc., and should not appear in a key or value of a Map. Putting null in containers is a source of some puzzling errors. Because the position of the object in the container is dynamically determined, once null enters from a certain entry point, it’s hard to figure out where it went, and you’re forced to check for NULL at all the values from the container. It’s also hard to know who put it in, and too much code makes debugging extremely difficult.
The solution: if you really want to mean “no,” you can either leave it out (Array, List, Set has no elements, Map has no entry at all), or you can specify a special, truly legal object to mean “no.”
It should be noted that class objects do not belong to containers. So null, if necessary, can be used as the value of an object member to indicate that it does not exist. Such as:
class A { String name = null; . }Copy the code
This is possible because null can only occur in the name member of the A object, and you don’t have to suspect that other members are null because of this. So every time you access the name member, check it for null. You don’t need to do the same for other members.
-
Function callers: understand exactly what NULL means, check and process the null return value early, and reduce its propagation. The annoying thing about null is that it can mean different things in different places. Sometimes it means “no”, “not found”. Sometimes it means “gone wrong” or “failed”. Sometimes it can even mean “success”… There are a lot of misuses, but no matter what, you have to understand the meaning of each null and not get confused.
If you call a function that has the potential to return NULL, then you should handle null “meaningfully” in the first place. For example, if find returns null indicating “not found,” the code calling find should check for null the first time it returns and do something meaningful to handle the “not found” case.
What does “meaningful” mean? I mean, the people who use this function should know exactly what to do when they get null, and take responsibility. He should not just “report to the top” and pass the buck to his callers. If you violate this, you risk writing it in an irresponsible and dangerous way:
public String foo() { String found = find(); if (found == null) { return null; }}Copy the code
Foo returns NULL when seeing that find() returns NULL. So null goes from one place to another, and it means something else. If you write code like this without thinking about it, you’ll end up with null in your code anytime, anywhere. Eventually, to protect yourself, each of your functions will look like this:
public void foo(A a, B b, C c) { if (a == null) { ... } if (b == null) { ... } if (c == null) { ... }... }Copy the code
-
Function author: explicitly states that null arguments are not accepted, and crashes immediately when the argument is null. Do not attempt to “fault tolerate” NULL, and do not let the program continue. If the caller uses NULL as an argument, then the caller (not the function author) is solely responsible for the program’s crash.
What makes the above example problematic is people’s “tolerant attitude” towards NULL. This “protective” approach, which attempts to be “fault-tolerant” and “gracefully handle null,” has the effect of letting callers pass NULL to your function even more recklessly. At the end of the day, there’s a lot of nonsense in your code, and null can appear anywhere without knowing where it’s coming from. No one knew what null meant, what to do, and everyone kicked null to everyone else. Eventually the null spread like a plague, everywhere, and became a nightmare.
The right approach, in fact, is a tough attitude. You have to tell the user of this function, none of my arguments can be NULL, and if you give me null, it’s your fault if the program crashes. The caller should know how to handle null in his code (see above), not the function author.
An easy way to adopt a tough attitude is to use objects.requirenonNULL (). Its definition is simple:
public static <T> T requireNonNull(T obj) { if (obj == null) { throw new NullPointerException(); } else { return obj; }}Copy the code
You can use this function to check for every argument that you don’t want to accept NULL, and a NullPointerException will crash as soon as it is passed in, effectively preventing the null pointer from passing unnoticed elsewhere.
-
Use @notnull and @nullable tags. IntelliJ provides @notnull and @nullable tags in front of types to provide concise and reliable protection against null Pointers. IntelliJ itself statically analyzes code that contains such tags, pointing out where nullPointerExceptions might occur at runtime. At runtime, illegalArgumentExceptions are raised where null Pointers should not appear, even if you never put out a Deference for that null pointer. This way you can detect and prevent null Pointers as early as possible.
-
Use the Optional type. Languages like Java 8 and Swift provide a type called Optional. When used correctly, the null problem can be largely avoided. The problem with null Pointers is that you can “access” the members of an object without “checking” for NULL.
The Optional type is designed to combine the “check” and “access” operations into one “atomic operation.” So you can’t just visit and not check. This is a special case of pattern matching in ML, Haskell, and other languages. Pattern matching brings the two operations of type determination and access to members together, so you can’t do anything wrong.
For example, in Swift, you could write:
let found = find() if let content = found { print("found: " + content) } Copy the code
You get a value of type Optional found from find(). Suppose it’s of type String, right? That question mark means it could contain a String or it could be nil. You can then use a special if statement to both null check and access the contents. This if statement is different from the normal if statement. Instead of a Bool, the condition is a variable binding let Content = found.
I don’t really like this syntax, but the whole point of this statement is: if found is nil, then the whole if statement is skipped. If it is not nil, then the variable content is bound to the value inside found (unwrap), and print(“found: “+ content) is executed. Because this method combines checking and accessing, you can’t just access without checking.
Java 8 takes a crappy approach. If you get an Optional
value found, you must use “functional programming” to write the following code:
Optional<String> found = find(); found.ifPresent(content -> System.out.println("found: " + content)); Copy the code
This Java code is equivalent to the Swift code above, which contains a “judge” and a “value” operation. IfPresent checks whether found has a value (equivalent to null). If so, “bind” its content to the content argument of the lambda expression (the unwrap operation), and then execute the content in the lambda. Otherwise, if found has no content, the lambda in ifPresent will not execute.
There is a problem with this design of Java. Everything in the branch after null must be written in the lambda. In functional programming, this lambda is called a continuation, and Java calls it a Consumer, which means “If found is not NULL, get its value, and then what to do.” Since lambda is a function, you can’t write a return statement inside it to return the outer function. For example, if you want to rewrite the following function (with null) :
public static String foo() { String found = find(); if (found != null) { return found; } else { return ""; } } Copy the code
It would be more troublesome. Because if you write it like this:
public static String foo() { Optional<String> found = find(); found.ifPresent(content -> { return content; // can't return from foo here }); return ""; } Copy the code
The return A in here, it doesn’t return from foo. It will only return from the lambda, and since the return type of that lambda (consumer.accept) must be void, the compiler will report an error saying you returned a String. Since the closure’s free variable in Java is read-only, you can’t assign to variables outside the lambda, so you can’t write it like this:
public static String foo() { Optional<String> found = find(); String result = ""; found.ifPresent(content -> { result = content; // can't assign to result }); return result; } Copy the code
So, although you get found content in lambda, how to use the value and how to return a value is confusing. Your usual Java programming skills are almost completely lost here. In fact, once null is determined, you have to use a bunch of weird functional programming operations that Java 8 provides: map, flatMap, orElse, etc., and somehow combine them to express what the original code means. For example, the previous code can only be rewritten like this:
public static String foo() { Optional<String> found = find(); return found.orElse(""); } Copy the code
This simple case is fine. I don’t really know how to express more complex code, and I wonder if Java 8’s Optional methods provide enough expression. The few things in it that don’t express themselves well can talk about functor, continuation, even monad… It’s as if the language is no longer Java with Optional.
So although Java offers Optional options, I think the usability is actually low and difficult to accept. Swift’s design, by contrast, is simpler and more intuitive, approaching normal procedural programming. You only need to remember one special syntax if let content = found {… }, where the code is written no differently than in a normal procedural language.
Anyway, just remember that with Optional, the key is atomic manipulation, making null checks and values work together. That requires you to use the special notation THAT I just introduced. If you violate this rule, you can still make mistakes by dividing the checking and evaluating into two steps. In Java 8, for example, you can access the content of found directly using a method like found.get(). In Swift you can also use found! To access directly without checking.
You can write Java code like this to use the Optional type:
Option<String> found = find(); if (found.isPresent()) { System.out.println("found: " + found.get()); } Copy the code
If you take this approach and split the checking and evaluating into two steps, you may get runtime errors. If (found.isPresent()) is essentially the same as a normal null check. If you forget to judge found.ispresent () and go straight to found.get(), NoSuchElementException will appear. This is essentially the same thing as NullPointerException. So this is actually the same as the normal null notation. If you want to take advantage of the Optional type, be sure to follow the “atomic manipulation” notation I described earlier.
Prevention of over-engineering
The human brain is a wonderful thing. Although we all know that over-engineering is not good, over-engineering often occurs involuntarily in actual engineering. I’ve made this mistake so many times myself that I feel the need to analyze the signs and signs of overengineering so that they can be detected and avoided early on.
An important sign of impending overengineering is when you think too much about the “future,” thinking about things that haven’t happened yet, needs that haven’t come up yet. For example, “If we have millions of lines of code and thousands of people in the future, such a tool won’t work”, “I may need this feature in the future, so I’ll write the code and put it there now”, “many people will expand this code in the future, so let’s make it reusable now”…
This is why many software projects are so complex. Not much was actually done, but a lot of unnecessary complexity was added for the sake of the so-called “future”. The present problem is not solved, but the “future” to drag down. People don’t like short-sighted people, but in real engineering, sometimes you just have to look closer and get the problem at hand done before you can expand.
Another source of overengineering is an excessive concern with “code reuse.” Many people are concerned with “reuse” before their “usable” code has even been written. You end up getting tied up in all the frameworks you’ve created to make your code reusable, and you end up not even writing usable code. If the usable code can’t be written well, how can you reuse it? A lot of projects that start out with too much reuse are abandoned because the code is so hard to understand that it saves a lot of work to write from scratch.
Too much focus on “testing” can also lead to over-engineering. Some people change simple code to “test-friendly” for testing purposes, introducing so much complexity that code that should have been written right at first turns out to be very complex and buggy.
There are two types of “bug-free” code. One is “code with no obvious bugs” and the other is “code with no obvious bugs”. In the first case, because the code is so complicated, and there are so many tests, and so many coverage, it looks like the tests have passed, so you assume that the code is correct. In the second case, because the code is simple and straightforward, even if you haven’t written many tests, you can tell at a glance that it can’t be buggy. What kind of “bug-free” code do you prefer?
Based on these, I have summarized the principles to prevent overengineering as follows:
- Solve the immediate problem first, solve it, and then consider the expansion problem in the future.
- Write usable code first, iterate over it, and then consider whether you need to reuse it.
- Write usable, simple, and obviously bug-free code first, and then worry about testing.
To the end.