Programming is creative work. It’s an art. Mastering any art requires a lot of practice and comprehension. Therefore, the “wisdom” proposed here is not a weight-loss medicine that claims to lose ten catty a day. It cannot replace your own diligence. I hope, however, that it points people in the right direction so that they take fewer detours and basically reap what you sow.
Iterate over the code
Since “genius is one percent inspiration and ninety-nine percent perspiration,” let me talk about the perspiration part first. Someone asked me, what is the most effective way to improve your programming? I thought about it for a long time and realized that the most effective way to do this was to revise and refine the code over and over again.
At IU, we were ashamed to write long and complex code because of Dan Friedman’s strict guidance. If you write a few too many lines of code, the old urchins laugh and say, “I wrote five lines of code when I solved this problem, go back and think about it…” Of course, sometimes he’s just exaggerating and trying to irritate you, but no one can do it with 5 lines of code. But this habit of refining code to reduce redundancy was ingrained in me.
Some people like to brag about how many thousands of lines of code they have written, as if the number of lines of code is the measure of good programming. However, it is impossible to improve your programming skills if you always write code in a hurry and never go back to review, modify, and refine it. You’ll produce more and more mediocre or even bad code. In this sense, what many people call “work experience” is not necessarily proportional to the quality of their code. If you have decades of experience and never go back to refine and reflect on your code, you’re probably not as good as someone with a year or two of experience who likes to iterate and understand.
A great writer once said, “The quality of a writer is not measured by how many words he publishes, but by how many he throws away in his wastebasket.” I think the same theory applies to programming. Good programmers delete far more code than they leave behind. If you see a person who writes a lot of code and doesn’t delete much of it, their code must have a lot of junk.
Like literature, code can’t be created overnight. Inspiration seems to come in dribs and drabs. It’s impossible for anyone to write in one stroke, and even the most talented programmer takes a while to discover the simplest and most elegant way to write. Sometimes you refine a piece of code over and over again and think you’re at the top of your game, and then you look back a few months later and see a lot of things you could have improved and simplified. It’s the same with writing articles. You can always look back at something you wrote months or years ago and see some improvements.
So if refining code over and over is no longer going anywhere, you can put it down for a while. Look back in a few weeks or months with a fresh idea. Over and over again, you’ll have the inspiration and wisdom to move directly in the right direction, or close to the right direction, when faced with a new problem.
Write elegant code
People hate spaghetti code because it goes round and round like a noodle. So what does elegant code look like? After years of observation, I found that elegant code has some distinct features in shape.
If we ignore the specifics, elegant code in general looks like neat boxes that fit together. It’s easy to understand if you draw an analogy with tidying your room. If you throw everything in a big drawer, they all get mixed up. It’s harder to organize and find what you need quickly. But if you put a few smaller boxes in the drawer and put things in different categories, they won’t wander around and you can find and manage them more easily.
Another characteristic of elegant code is that its logic, by and large, looks like a tree with distinct branches. That’s because almost everything a program does is pass and branch information. You can think of code as a circuit where current flows through wires, shunt, or confluence. If you think about it this way, your code will have fewer if statements with only one branch, and it will look something like this:
if (...) { if (...) {... } else { ... } } else if (...) {... } else { ... }Copy the code
Notice that? In my code, if statements almost always have two branches. They can be nested, have multiple levels of indentation, and else branches can have a small amount of duplicate code. However, such a structure, the logic is very tight and clear. I’ll tell you later why it’s best to have two branches for an if statement.
Write modular code
Some people argue about making programs “modular” and end up dividing code into multiple files and directories called “modules”. They even put these directories in different VCS repOs. As a result, this approach did not lead to smooth cooperation, but to a lot of trouble. This is because they don’t really understand what a “module” is, and superficially slicing up the code and placing it in different places, not only fails to achieve the goal of modularity, but also creates unnecessary problems.
True modularity is not in a textual sense, but in a logical sense. A module should be like a circuit chip, with well-defined inputs and outputs. In fact, a good modularity approach already exists, and its name is “function”. Each function has a specific input (parameter) and output (return value). Multiple functions can be contained in the same file, so you don’t need to separate your code into multiple files or directories. I can write it all in the same file and still have very modular code.
To achieve good modularity, you need to do the following:
-
Avoid writing long functions. If you find that the function is too large, you should break it up into smaller ones. Usually I don’t write functions longer than 40 lines. For comparison, the average laptop screen can hold 50 lines of code. I can see a 40-line function at a glance without scrolling. The reason it’s only 40 lines instead of 50 is that 40 lines of code is the maximum I can see without moving my eyes.
If I look at the code without blinking, I can map the entire piece of code to my optic nerve, so that I can see the code even when I suddenly close my eyes. I’ve found that when you close your eyes, your brain processes code more efficiently, and you can imagine what other shapes that piece of code can take. 40 lines is not a big limit, because the more complex parts of a function are often extracted and made into smaller functions and then called from the original function.
-
Make small tool functions. If you look at the code, there’s a lot of overlap. This common code, however short, can be useful to extract as a function. Some helper functions may be only two lines long, but they greatly simplify the logic in the main function.
Some people don’t like using small functions because they want to avoid the overhead of function calls and end up writing functions that are hundreds of lines long. This is an illusion left over from history. Modern compilers automatically inline a small function to the place where it is called, so no function call is made at all and no extra overhead is incurred.
The same people prefer to use macros instead of small functions, which is also a relic of history. In early C compilers, only Macro was statically “inline”, so they used macros for inline purposes. Inlining, however, is not the fundamental difference between macros and functions. Macros are very different from functions (which I’ll cover later) and should be avoided as much as possible. Using macros for inline purposes is an abuse of macros, which can cause all sorts of problems, such as making programs hard to understand, difficult to debug, error-prone, and so on.
-
Each function does one simple thing. Some people like to make “generic” functions that do both this and that, internally “choosing” what the function does based on certain variables and conditions. For example, you might write a function like this:
void foo() { if (getOS().equals("MacOS")) { a(); } else { b(); } c(); if (getOS().equals("MacOS")) { d(); } else { e(); }}Copy the code
The person who writes this function does different things depending on whether the system is “MacOS.” You can see that in this function, only c() is common to both systems, and the other a(), b(), D (), and e() belong to different branches.
This “reuse” is actually harmful. If a function can do two things, and they have less in common than they have in difference, then you’d better write two different functions, otherwise the logic of the function will not be very clear and error-prone. In fact, the above function can be rewritten as two functions:
void fooMacOS() { a(); c(); d(); } Copy the code
void fooOther() { b(); c(); e(); } Copy the code
If you find that two things are mostly the same with only a few differences, more often than not you can extract the same parts and make an auxiliary function. For example, if you have a function that looks like this:
void foo() { a(); b() c(); if (getOS().equals("MacOS")) { d(); } else { e(); }}Copy the code
Where a(), b() and c() are the same, only D () and e() differ according to the system. So you can extract a(), b(), c() :
void preFoo() { a(); b() c(); Copy the code
Then make two functions:
void fooMacOS() { preFoo(); d(); } Copy the code
void fooOther() { preFoo(); e(); } Copy the code
This way, we both share code and do one simple thing for each function. This code, the logic is much clearer.
Write readable code
Some people think that writing lots of comments will make their code more readable, but find that it doesn’t. Instead of making your code readable, comments make your program hard to read by flooding it with comments. And once the logic of the code is changed, many comments become outdated and need to be updated. Modifying comments can be quite a burden, so a large number of comments can become a stumbling block to improving your code.
In fact, truly elegant and readable code requires almost no comments. If you find yourself writing a lot of comments, your code is likely to be vague and illogical. In fact, procedural language is more powerful and rigorous than natural language. It actually has the main elements of natural language: subject, predicate, object, noun, verb, if, then, otherwise, yes, no… So if you take full advantage of the expressive power of a programming language, you can use the program itself to express what it is really doing, without the help of natural language.
On rare occasions, you might do something counterintuitive to get around design issues in other code. You can use a very short comment to explain why it’s written that way. This should happen less often, otherwise it means that the entire code design is flawed.
If you don’t take advantage of what programming languages offer, you’ll find that your programs are still so difficult to understand that you’ll need to write comments. So HERE are some tips that may help you greatly reduce the need to write comments:
-
Use meaningful function and variable names. If the names of your functions and variables actually describe their logic, then you don’t need to write comments to explain what they’re doing. Such as:
// put elephant1 into fridge2 put(elephant1, fridge2); Copy the code
Since my function name, put, plus the two meaningful variable names elephant1 and Fridge2, already say what this is about (putting elephants in the fridge), that comment is completely unnecessary.
-
Take complex logic out and make it a “help function.” Some people write functions that are so long that they can’t see what’s going on inside, so they mistakenly think they need to write comments. If you look closely at the code, the bits of code that are not clear can often be extracted, made into a function, and then called in the same place. Since functions have a name, you can use meaningful function names instead of comments. Here’s an example:
. // put elephant1 into fridge2 openDoor(fridge2); if (elephant1.isDead()) { ... } else { ... } closeDoor(fridge2); .Copy the code
If you take this code out and define it as a function:
void put(Elephant elephant, Fridge fridge) { openDoor(fridge); if (elephant.isDead()) { ... } else { ... } closeDoor(fridge); } Copy the code
So the original code can be changed to:
. put(elephant1, fridge2); .Copy the code
It’s clearer, and there’s no need for comments.
-
Extract complex expressions and make intermediate variables. Some people hear that “functional programming” is a good thing, but don’t understand what it really means, and use a lot of nested functions in their code. Like this:
Pizza pizza = makePizza(crust(salt(), butter()), topping(onion(), tomato(), sausage())); Copy the code
Such a line of code is too long and nested to be easy to read. Well trained functional programmers know the benefits of intermediate variables and do not blindly use nested functions. They would change the code to something like this:
Crust crust = crust(salt(), butter()); Topping topping = topping(onion(), tomato(), sausage()); Pizza pizza = makePizza(crust, topping); Copy the code
This not only effectively controls the length of a single line of code, but also makes the steps clear and easy to understand due to the “meaning” of the introduced intermediate variables.
At this point, I must warn you that when I say “let the code speak for itself without comments,” I don’t mean to make the code look like some kind of natural language. There is a JavaScript test tool called Chai that lets you write code like this:
expect(foo).to.be.a('string');
expect(foo).to.equal('bar');
expect(foo).to.have.length(3);
expect(tea).to.have.property('flavors').with.length(3);
Copy the code
This is profoundly wrong. Programming language is simpler and clearer than natural language, and this makes it look like natural language, which makes it more complex and difficult to understand.
Write simple code
Programming languages love to be different and offer “features”, some of which are not such a good thing. Many features didn’t stand the test of time and ended up causing more problems than they solved. Many people blindly pursue “short” and “concise”, or in order to show their smart mind, learn fast, so like to use some special structures in the language, write too “smart”, difficult to understand the code.
Just because a language has something to offer doesn’t mean you have to use it. In fact, you only need a fraction of these features to write good code. I’ve always been against “taking full advantage” of all features in programming languages. In fact, I have the best set of structures in mind. No matter how “magical” and “new” features the language offers, I’ve mostly just been hammered into what I think is worth believing.
Now, for some of the language features in question, I’ll introduce some of the code specifications I use myself, and explain why they make code simpler.
-
Avoid adding and subtracting expressions (I ++, ++ I, I –, — I). This add-subtract operation expression is actually a design error left over from history. They have strange meanings and are very easy to mistake. They confound and intertwine two very different operations, reading and writing, and make a mess of semantics. Expressions that contain them may turn out to depend on the order in which they are evaluated, so it may run correctly under one compiler and mysteriously wrong with another.
In fact, these two expressions can be decomposed into two steps to separate reading and writing: one is to update the value of I, and the other is to use the value of I. For example, if you wanted to write foo(I ++), you could have just broken it out as int t = I; i += 1; foo(t); . If you want to write foo(++ I), you can split it into I += 1; foo(i); Open after the code, meaning is completely consistent, but a lot of clarity. It is clear whether the update is before or after the value.
You might think I ++ or ++ I is more efficient than if you took it apart, but that’s just an illusion. After basic compiler optimization, the resulting machine code is completely indistinguishable. The increment and subtraction expression is safe to use only in two situations. One is in the update part of the for loop, such as for(int I = 0; i < 5; I++). The other way is to write it as a single line, like i++; . There is no ambiguity between the two cases. You need to avoid other situations such as using complex expressions such as foo(I ++), foo(++ I) + foo(I)… No one should know, or seek to know, what that means.
-
Never omit curly braces. Many languages allow you to omit curly braces in certain situations. For example, C and Java allow you to omit curly braces when there is only one sentence in an if statement:
if (...) action1(); Copy the code
At first glance, I missed two words. How nice. But this often raises strange questions. For example, if you later want to add action2() to the if, you change the code to:
if (...) action1(); action2(); Copy the code
For aesthetics, you are careful to indent action1(). At first glance they are together, so you subconsciously think they will only be executed if the if condition is true, whereas action2() is outside the if and will be executed unconditionally. I call this “optical illusion,” a mistake that should be noticed by every programmer in theory, but is easily overlooked in practice.
So you ask, who would be so stupid as to put curly braces when I add action2()? But from a design point of view, this is actually not a reasonable approach. First of all, you might want to remove action2() later, so that you have to remove the curly braces for consistency. Does that bother you? Second, it makes code styles inconsistent, with some ifs with curly braces and some without. Besides, why do you need to remember this rule? So if you don’t want to do anything else, you just put all the curly braces, and you don’t even have to think about it, just assume that C and Java don’t provide you with this special notation. This way you can maintain complete consistency and reduce unnecessary thinking.
Some people may say, all but one sentence in curly braces, what an eyesore? However, after practicing this code specification for a few years, I don’t find it any more of an eyesore. Instead, the presence of curly braces makes the code more delimited and less burdening for my eyes.
-
Use parentheses wisely and do not blindly rely on operator precedence. Using the precedence of the operator to reduce parentheses is fine for common arithmetic expressions like 1 + 2 * 3. Yet some people hate parentheses so much that they write expressions like 2 << 7-2 * 3 without parentheses at all.
The problem here is that the priority of the shift operation << is unfamiliar to many people, and it is counterintuitive. Since x << 1 is the same thing as multiplying x by 2, many people mistakenly think this expression is the same thing as (2 << 7) – (2 times 3), so it equals 250. However, << has lower precedence than addition +, so this expression is actually equal to 2 << (7-2 * 3), so it is equal to 4!
The solution to this problem is not for everyone to memorize the operator priority table, but for the proper parentheses. For example, in the example above, it’s better to write 2 << (7-2 * 3) in parentheses. While the same meaning can be achieved without parentheses, parentheses make it clearer that the reader no longer needs to memorize the priority of << to understand the code.
-
Avoid using continue and break. A return in a loop (for, while) is fine, but if you use a continue or break, you complicate the logic and termination conditions of the loop.
A “continue” or “break” occurs because the logic of the loop has not been clearly thought through. If you’re careful, there should be little need for a continue or break. If a “continue” or “break” occurs in your loop, you should consider rewriting the loop. There are several ways to rewrite loops:
- If a continue occurs, you can often eliminate the continue by simply reversing the conditions of the continue.
- If there is a break, you can often combine the break condition with the termination condition in the head of the loop to get rid of the break.
- Sometimes you can get rid of “break” by replacing “break” with “return”.
- If all else fails, you might be able to extract the complex parts of the loop into a function call that can then be removed with a continue or break.
Let me give you some examples of these situations.
Case 1: The following code contains a continue:
List goodNames = new ArrayList<>(); for (String name: names) { if (name.contains("bad")) { continue; } goodNames.add(name); . }Copy the code
It says: “If name contains the word ‘bad’, skip the loop…” Note that this is a “negative” description, which is not telling you when to “do” something, but when to “not” something. In order to know what it’s doing, you have to figure out which statements continue is causing to be skipped, and then mentally reverse the logic so you know what it’s trying to do. This is why loops with “continue” and “break” are hard to understand. They rely on “control flow” to describe “what not to do”, “what to skip”, and in the end you have no idea what it “does”.
In fact, we can easily convert this code into equivalent code without continue by simply reversing the continue condition:
List goodNames = new ArrayList<>(); for (String name: names) { if (! name.contains("bad")) { goodNames.add(name); . }}Copy the code
goodNames.add(name); And all the code after it is placed inside the if, with an extra indentation, while the continue is missing. If you read the code again, it will be even clearer. Because it’s a more “positive” description. It says: “Add name to the list of goodNames when it doesn’t contain the word ‘bad’…”
Case 2: The for and while headers both have a loop “termination condition” that should be the only exit condition for the loop. If you have a break in the middle of the loop, it actually adds an exit condition to the loop. You often just need to incorporate this condition into the head of the loop to get rid of the break.
Take this code for example:
while (condition1) { ... if (condition2) { break; }}Copy the code
When condition is true, break exits the loop. You can get rid of the break by simply reversing condition2 and placing the termination condition at the head of the while. The rewritten code looks like this:
while (condition1 && ! condition2) { ... }Copy the code
This may seem to only apply when a break occurs at the beginning or end of the loop, but most of the time, the break can somehow move to the beginning or end of the loop. I don’t have specific examples yet, but I’ll add them when they come up.
Case 3: Many breaks exit the loop, followed by a return. This break can often be replaced by a return. Take this example:
public boolean hasBadName(List names) { boolean result = false; for (String name: names) { if (name.contains("bad")) { result = true; break; } } return result; } Copy the code
This function checks for the presence of a name in the names list that contains the word “bad”. Its loop contains a break statement. This function can be rewritten as:
public boolean hasBadName(List names) { for (String name: names) { if (name.contains("bad")) { return true; } } return false; } Copy the code
The improved code returns “bad” in name with a return true instead of assigning the result variable, breaks out, and returns last. If there is no return at the end of the loop, return false to indicate that no such name was found. Use return instead of break, so that both the break statement and the result variable are eliminated.
I’ve seen many other examples of use of continue and break, and almost invariably they can be eliminated, resulting in much cleaner code. My experience is that 99% of breaks and continues can be eliminated by replacing them with return statements or flipping the if condition. The remaining 1% contains complex logic, but can also be eliminated by extracting a helper function. The modified code becomes easy to understand and easy to ensure is correct.
Write intuitive code
I have an important rule for writing code: if there’s a more direct, clear way to write it, go for it, even if it looks longer and dumber. For example, the Unix command line has a “clever” way of writing:
command1 && command2 && command3
Copy the code
Since the Shell language logic operation a && b is “short-circuited”, if a is equal to false, then b should not be executed. This is why Command2 is executed when Command1 is successful and Command3 when Command2 is successful. In the same way,
command1 || command2 || command3
Copy the code
The operator | | have similar features. On the command line above, if Command1 succeeds, neither Command2 nor command3 will be executed. If Command1 fails and Command2 succeeds, command3 will not be executed.
This seems more subtle and concise than using an if statement to determine failure, so someone borrowed it and used it in their program code. For example, they might write code like this:
if (action1() || action2() && action3()) {
...
}
Copy the code
Can you see what this code is trying to do? Action2 and ACTION3 are executed under what conditions and are not executed under what conditions? Maybe if you think about it a little bit, you know what it’s doing: “If action1 fails, execute Action2, if action2 succeeds, execute action3.” That semantics, however, is not directly “mapped” onto the code. For example, what word in the code does the word “failure” correspond to? You can’t find out, because it is contained in the | | semantics, you need to know | | short circuit characteristics, as well as the logical or semantic to know this inside says “if action1 failure……” . Every time you look at this line of code, you need to think about it, and the load that you’re building up is going to be very tiring.
Actually, this kind of writing was an abuse of the logical operation && and | | short circuit characteristics. These two operators may not execute the expression on the right for the sake of machine efficiency, not for the sake of providing such “clever” usage for humans. These two operators are intended only as logical operations, and they are not meant to replace if statements. That is, they just happen to do what some if statements do, but you shouldn’t use them instead of if statements because of that. If you do that, you’ll make your code obscure.
The above code would be much clearer if it were written a little more clumsily:
if (!action1()) {
if (action2()) {
action3();
}
}
Copy the code
Here I can see what this code is saying without thinking: if action1() fails, execute action2(); if action2() succeeds, execute action3(). Do you see a one-to-one correspondence here? If = if! = failure… You don’t need logic to know what it says.
Write impeccable code
In the previous section, I mentioned that my code rarely has a single branch if statement. Most of the if statements I write have two branches, so a lot of my code looks like this:
if (...) { if (...) {... return false; } else { return true; } } else if (...) {... return false; } else { return true; }Copy the code
In fact, using this method is to deal with all possible situations flawlessly and avoid missing the corner case. The reason every if statement has two branches is that if the condition is true, you do something; But if the if condition is not true, you should know what else to do. Whether you have an if or an else, you have to think about it.
Many people who write if statements like to omit the else branch because they feel that some else branches are duplicated. For example, in my code, both else branches return true. To avoid duplication, they omit the two else branches and only use a return true at the end. In this case, the if statement missing the else branch “drops” the control flow to the final return true. Their code looks something like this:
if (...) { if (...) {... return false; } } else if (...) {... return false; } return true;Copy the code
This writing method seems to be more concise and avoids repetition, but it is prone to negligence and loopholes. Nested if statements omit some else, and relying on the statement’s “control flow” to handle else cases is difficult to analyze and reason about correctly. If you the if condition in use && and | | logic operations, such as whether it’s harder to see covers all situations.
Any branches that are inadvertently missed will all “drop” and return unexpected results. Even if you look at it once and are sure it’s correct, each time you read the code, you’re never sure it’s taken care of all the cases and have to reason all over again. This brevity leads to repeated, heavy mental overhead. This is called “noodle code” because the logical branches of a program are not like a tree, but like a noodle.
Another way to omit the else branch is like this:
String s = "";
if (x < 5) {
s = "ok";
}
Copy the code
The people who write this code like to use a “default” approach in mind. S defaults to null, and if x<5, mutate it to OK. The downside of this is that when x<5< code=””> doesn’t work, you need to look up to see what s is. And that’s when you get lucky, because the S is not far up there. Many people write this code when the initial value of s is some distance from the statement, and some other logic and assignment may be inserted in between. Such code, variable change to change, see people dazzling, easy to make mistakes.
Now compare how I wrote it:
String s;
if (x < 5) {
s = "ok";
} else {
s = "";
}
Copy the code
This may seem like an extra word or two, but it's much clearer. This is because we explicitly specify what s is when x<5< code=""> is not true. It's right there. It's "" (empty string). Note that I did not "change" the value of S, although I also used assignment. S starts out with no value, gets assigned and never changes. The way I write it, it's often called more "functional" because I only assign once.
If I omit the else branch, the Java compiler will not let me off the hook. It complains: "In some branch, S is not initialized." This forces me to explicitly set the value of S for all conditions, not forgetting any of them.
Of course, since the case is relatively simple, you could also write it like this:
String s = x < 5 ? "ok" : "";
Copy the code
For more complex cases, I recommend writing the if statement instead.
Correct handling of errors
Using an IF statement with two branches is just one of the reasons my code is watertight. The idea of writing an if statement like this encapsulates a general idea for making code reliable: run through all the scenarios and leave none out.
The vast majority of the function of the program is information processing. Cut through the clutter of complex, ambiguous information and find the one you need. Correctly inferring all the "possibilities" is the core idea of writing unassailable code. What I want to do in this video is show you how to apply this idea to error handling.
Error handling is an old problem, but after decades, many people still don't understand it. The Unix system API manual generally tells you the return values and error messages that may occur. For example, the Read system call manual for Linux reads as follows:
RETURN VALUE On success, the number of bytes read is returned... On error, -1 is returned, and errno is set appropriately.ERRORS EAGAIN, EBADF, EFAULT, EINTR, EINVAL, ...
Many beginners forget to check if the return value of read is -1 every time they call read. This kind of thinking is actually very dangerous. If the function returns a value that tells you to either return a positive number, which is the length of the data you read, or return -1, then you have to do something meaningful with that -1. Don't think you can ignore this particular return value because it is a "possibility." Missing any of the possible scenarios in your code can have catastrophic consequences.
For Java, this is relatively convenient. When a Java function has a problem, it is usually indicated by an exception. You can think of the exception plus the return value of the function as a "union type." Such as:
String foo() throws MyException {
...
}
Copy the code
Here MyException is an error return. You can think of this function as returning a union type: {String, MyException}. Any code that calls foo must handle MyException reasonably to be sure that the program will run correctly. The Union type is a fairly advanced type that only very few languages (such as Typed Racket) currently have. I only mention it here for the sake of explaining the concept. Once you know the concept, you can actually implement a system of Union types in your head so that you can write solid code in a normal language.
Because Java's type system forces functions to declare possible exceptions in types, and forces the caller to handle possible exceptions, it's almost impossible to accidentally miss. But some Java programmers have a bad habit of making this security mechanism almost completely ineffective. Whenever the compiler reports an error saying, "You don't catch the exception that foo might have," some people don't think twice and just change their code to something like this:
try {
foo();
} catch (Exception e) {}
Copy the code
Either put a log in it at most, or throw throws Exception on its own function type so the compiler doesn't complain. These may seem easy, but they are wrong, and you will eventually pay for them.
If you ignore the exception catch, then you don't know that Foo actually failed. It's like driving on a road that says "Road closed for construction ahead". Of course it's going to go wrong, because you don't know what you're doing.
When you catch an Exception, you shouldn't use a broad type like Exception. You should catch exactly the kind of exception A that might happen. Using a broad exception type is problematic because it inadvertently catches other exceptions (such as B). Your code logic is based on whether A is present, but you catch all exceptions, so when another Exception B is present, your code will have A puzzling problem because you think A is present when it isn't. Such bugs are sometimes hard to find even using the debugger.
If you add throws Exception to your function type, then you inevitably need to handle the Exception in the same place as the call. If the calling function also writes throws Exception, the bug spreads even further. My rule of thumb is to try to deal with exceptions as soon as they occur. Otherwise if you return it to your caller, it may not know what to do at all.
Also, try {... } Catch should contain as little code as possible. For example, if foo and bar can both generate exception A, your code should write as much as possible:
try { foo(); } catch (A e) {... } try { bar(); } catch (A e) {... }Copy the code
Rather than
try { foo(); bar(); } catch (A e) {... }Copy the code
The first one tells you exactly which function is going wrong, whereas the second one mixes it all up. There are many advantages to being able to tell which function is going wrong. For example, if your catch code includes a log, it can give you more accurate error information, which can greatly speed up your debugging process.
Handle null Pointers correctly
The idea of exhaustion is so useful that, based on this principle, we can derive some basic principles that will give you impeccable handling of null Pointers.
First you should know that many languages (C, C++, Java, C#...) The null type system is completely wrong. This mistake originated from Tony Hoare's earliest design. Hoare called this mistake his "billion dollar mistake", because the loss of property and manpower caused by it was far more than one billion dollars!
The type systems of these languages allow NULL to appear anywhere an object (pointer) type can appear, but NULL is not a legal object at all. It's not a String, it's not an Integer, and it's not a custom class. The type of NULL should be NULL, which is null itself. Based on this basic idea, we derive the following principles:
-
Try not to generate null Pointers. Try not to initialize variables with NULL, and try not to return null from functions. If your function returns "nothing", "error" and so on, try to use Java's exception mechanism. Although it's a little awkward to write, Java exceptions, combined with the return value of a function, can be used basically as union types. For example, if you have a function find that will help you find a String or nothing, you could write:
public String find() throws NotFoundException { if (...) { return "found"; } else { throw new NotFoundException(); }}Copy the code
Java's type system forces you to catch the NotFoundException, so you can't miss it like you missed checking for NULL. Java exceptions are also easy to abuse, but I showed you how to use them properly in the last section.
-
Do not put NULL inside a container data structure. A collection is a collection of objects in a certain way, so NULL should not be placed in Array, List, Set, etc., and should not appear in a key or value of a Map. Putting null in containers is a source of some puzzling errors. Because the position of the object in the container is dynamically determined, once null enters from a certain entry point, it's hard to figure out where it went, and you're forced to check for NULL at all the values from the container. It's also hard to know who put it in, and too much code makes debugging extremely difficult.
The solution: if you really want to mean "no," you can either leave it out (Array, List, Set has no elements, Map has no entry at all), or you can specify a special, truly legal object to mean "no."
It should be noted that class objects do not belong to containers. So null, if necessary, can be used as the value of an object member to indicate that it does not exist. Such as:
class A { String name = null; . }Copy the code
This is possible because null can only occur in the name member of the A object, and you don't have to suspect that other members are null because of this. So every time you access the name member, check it for null. You don't need to do the same for other members.
-
Function callers: understand exactly what NULL means, check and process the null return value early, and reduce its propagation. The annoying thing about null is that it can mean different things in different places. Sometimes it means "not", "not found", sometimes it means "wrong", "failed"... You must understand the meaning of each null and not get confused.
If you call a function that has the potential to return NULL, then you should handle null "meaningfully" in the first place. For example, if find returns null indicating "not found," the code calling find should check for null the first time it returns and do something meaningful to handle the "not found" case.
What does "meaningful" mean? I mean, the people who use this function should know exactly what to do when they get null, and take responsibility. He should not just "report to the top" and pass the buck to his callers. If you violate this, you risk writing it in an irresponsible and dangerous way:
public String foo() { String found = find(); if (found == null) { return null; }}Copy the code
Foo returns NULL when seeing that find() returns NULL. So null goes from one place to another. If you write code like this without thinking about it, you'll end up with null in your code anytime, anywhere. Eventually, to protect yourself, each of your functions will look like this:
public void foo(A a, B b, C c) { if (a == null) { ... } if (b == null) { ... } if (c == null) { ... }... }Copy the code
-
Function author: explicitly states that null arguments are not accepted, and crashes immediately when the argument is null. Do not attempt to "fault tolerate" NULL, and do not let the program continue. If the caller uses NULL as an argument, then the caller (not the function author) is solely responsible for the program's crash. What makes the above example problematic is people's "tolerant attitude" towards NULL.
The above "protected" approach, which attempts to be "fault-tolerant" and "gracefully handle null", has the effect of making callers more reckless in passing NULL to your function. At the end of the day, there's a lot of nonsense in your code, and null can appear anywhere without knowing where it's coming from. No one knew what null meant, what to do, and everyone kicked null to everyone else. Eventually the null spread like a plague, everywhere, and became a nightmare.
The right approach, in fact, is a tough attitude. If you give me null, the program crashes and you are responsible for it! The caller should know how to handle null in his code (see above), not the function author.
-
Use @notnull and @nullable tags. IntelliJ provides @notnull and @nullable tags to precede types to reliably prevent null Pointers. IntelliJ itself statically analyzes code that contains such tags, pointing out where nullPointerExceptions might occur at runtime. At runtime, illegalArgumentExceptions are raised where null Pointers should not appear, even if you never put out a Deference for that null pointer. This way you can detect and prevent null Pointers as early as possible.
Extended topic: Optional and Union types
Some languages, such as Java 8 and Swift, offer something called the "Optional type." In Java 8, for example, you can use Optional to say "Maybe String, maybe not." Many people think that the Optional type is the perfect solution to the null pointer problem, but it's not as perfect as you might think.
Because you see the type Optional, not String, the type system doesn't allow you to use it as a String. This extra level prevents you from taking its value without asking, you always have to think about it. However, this does not fundamentally solve the problem. Optional does not completely prevent you from generating runtime errors equivalent to nullPointerExceptions. Because you can still write code like this:
Optional x = Optional.empty();
String y = x.get();
Copy the code
X.et () is used without checking X. sipresent (), resulting in a NoSuchElementException. This is actually equivalent to not checking for NULL in dereference it. Except now instead of a NullPointerException, NoSuchElementException appears. Both are runtime errors, and the program crashes just the same. So you see, Optional is just a friendly "tip" so you don't make a mistake without knowing it. But you can make the same mistake if you ignore it. Optional does not have any coercive power.
Swift's Optional type has the same problem as Java's. The Swift manual states: "Using the! Operator to unwrap an optional that has a value of nil results in a Runtime error. Swift does not statically prevent you from doing something with an Optional value of nil! Operation. If you do, you'll get a "runtime error."
In addition, the Optional type can complicate your application. There's a big difference in structure between Optional and null Pointers. Optional has one more layer of data structure than null Pointers. Optional puts the required value in another object. You have to use x.get() to get the inside value, which is quite different from null. When you determine that a String can't be null, you don't need to do get again to get the content out. Such as:
String found = find(); if (found ! = null) { total += found.length(); }Copy the code
Once found is not null, we can use found. Length () to get its length without first using found. This example may seem trivial, but if the Optional type is put into another structure or container, or contains another type, you know how tedious and painful it can be. Optional's problem, like Haskell's Maybe problem, often results in too many nested types.
By contrast, the Union type system can prevent NullPointerExceptions completely statically without causing excessive nesting of types. The Union type can fully cover the functionality of the Optional type, which is very simple, and has many other benefits. This type system already exists in the Typed Racket language (a descendant of Scheme), and the unpublished Yin language also implements the Union type. PySonar's type derivation system also has union type. The Union type system is powerful enough not only to completely statically eliminate NullPointerExceptions, but also to replace the Exception mechanism in languages like Java. It makes error handling very tight, yet very convenient.
Note, however, that even if you have a Union type system that completely statically prevents NullPointerExceptions, the above principles for dealing with NULL are still useful. An easy mistake to make in languages with union types is to unthinkingly extend the union types, adding in all the possibilities and ending up with large union types. This results in a lot of variables and arguments having a union type, and each variable can be so many things that you need to make several judgments to pass the type check. This phenomenon is not fundamentally different from the null pointer overflow problem, because you don't control the "possibility" effectively. Programming languages may not help you well with this "explosion of possibilities" problem. Only on your own, by following the rules above, can you eliminate the union type early or reduce the possibility of it.
Prevention of over-engineering
The human brain is a wonderful thing. Although we all know that over-engineering is not good, over-engineering often occurs involuntarily in actual engineering. I've made this mistake so many times myself that I feel the need to analyze the signs and signs of overengineering so that they can be detected and avoided early on.
An important sign of impending overengineering is when you think too much about the "future," thinking about things that haven't happened yet, needs that haven't come up yet. For example, "If we have millions of lines of code and thousands of people in the future, such a tool won't work", "I may need this feature in the future, so I'll write the code and put it there now", "many people will expand this code in the future, so let's make it reusable now"...
This is why many software projects are so complex. Not much was actually done, but a lot of unnecessary complexity was added for the sake of the so-called "future". The present problem is not solved, but the "future" to drag down. People don't like short-sighted people, but in real engineering, sometimes you just have to look closer and get the problem at hand done before you can expand.
Another source of overengineering is an excessive concern with "code reuse." Many people are concerned with "reuse" before their "usable" code has even been written. You end up getting tied up in all the frameworks you've created to make your code reusable, and you end up not even writing usable code. If the usable code can't be written well, how can you reuse it? A lot of projects that start out with too much reuse are abandoned because the code is so hard to understand that it saves a lot of work to write from scratch.
Too much focus on "testing" can also lead to over-engineering. Some people change simple code to "test-friendly" for testing purposes, introducing so much complexity that code that should have been written right at first turns out to be very complex and buggy.
There are two types of "bug-free" code. One is "code with no obvious bugs" and the other is "code with no obvious bugs". In the first case, because the code is so complicated, and there are so many tests, and so many coverage, it looks like the tests have passed, so you assume that the code is correct. In the second case, because the code is simple and straightforward, even if you haven't written many tests, you can tell at a glance that it can't be buggy. What kind of "bug-free" code do you prefer?
Based on these, I have summarized the principles to prevent overengineering as follows:
- Solve the immediate problem first, solve it, and then consider the expansion problem in the future.
- Write usable code first, iterate over it, and then consider whether you need to reuse it.
- Write usable, simple, and obviously bug-free code first, and then worry about testing.