Why read other people’s code?
When we read other people’s code, we usually do so with some purpose. A complete “read” of a system’s code requires a great deal of energy. So it’s important to have a clear goal for reading code, because it determines how much effort, or cost, you end up putting into it.
Broadly speaking, we can divide goals into the following types:
- I want to evaluate whether to introduce a third-party module;
- I need to fix a Bug locally in a module (either because of a problem with a third party module or because your supervisor temporarily assigned a Bug to you).
- I want to follow the example of an open source module;
- I have to take over and maintain a module for a long time.
Why do we need to be clear about our goals?
Because it’s really hard to read the source code, it’s actually the reverse of the architecture. It is similar to decompilation, but instead of instruction-level decompilation, you need to work back to higher-dimensional ideas based on instructions.
We know that decompiler can decompiler exact software into assembly because the process information is lossless and is only an equivalent transformation. But getting decompiler software to accurately restore high-level language code is more difficult. Because the compilation process is lossy, the names of most software entities have been removed during compilation. Of course, most compilers generate symbol files at compile time. It is mainly used for debugging purposes. Otherwise, the debug software will not be able to display the name of the variable when we step through it.
Even if we were able to get our hands on the symbol files, it would still be very difficult to accurately retrieve the original high-level language code. It needs to bring a certain model of reasoning in it, by identifying the familiar “routines” in it, and then restore according to the routines. We can imagine how “an intelligent decompiler that restores exactly” would work.
First, it needs to identify the programming language and compiler used. This is usually relatively easy and can be done with a very crude classifier. In particular, many compilers have a “signature,” or the habit of putting your signature on the software you program. If you assume that all software has signatures, this step doesn’t even require training or learning.
The second step is to decompile the binary of the software with an optional symbol file (the absence of a symbol file results in many software entities, such as the name of a class or function, being a randomly assigned symbol) and its understanding of the compiler’s routines.
The routine of a compiler, like the behavior of a human being, can be summed up by continuous observational learning. All it takes is for the decompiler to keep learning enough of the samples produced by the compiler.
The reason I use the decompilation analogy is to understand that reading source code is both difficult and productive.
Productive learning is the best way to learn.
So what should be the output of reading the source code? The answer is, the way the program is built, the architecture.
2. Understand the core context of the architecture
How do you do that?
First of all, if you have documentation, be sure to look at documentation first. It would be foolish to insist on working backwards through code to understand an architecture that has already been documented.
However, it is important to keep in mind that documentation and code can easily become disjointed. So we’re probably looking at a design from a previous version, or even the original version.
Even if it has changed, reading outdated architectural design ideas can be a great help in understanding the source code. On this basis, we look at the source code, we can verify each other. Of course, if a conflict occurs, we need to update the documentation to a consistent version of the code.
Looking at the source code, the first thing we need to do is understand the general design of the system. The focus of the profile design is on the business scope of each software entity and the relationships between them. With this in mind, we can understand the core context of the architectural design of the system.
Specifically, what are the steps to look at the source code?
First, sort out the specifications of exposed software entities (modules, classes, functions, constants, global variables, and so on).
There are often tools available for this step. For example, for the Go language, running Go Doc helps clean up an auto-generated version. Open source tools such as Doxygen can do similar things, and it supports almost all major languages.
Of course, this step only allows us to find out what software entities there are and what their specifications are. But what are the businesses of these software entities, and how do they relate to each other? Further analysis is required.
Generally speaking, I will look at example, unit test and so on first. These are the customers of our study, the users. They help us understand the semantics of various software entities.
Based on the specifications, explanatory documents, example, unit test and other information of software entities, we can preliminarily infer the business scope of each software entity and the relationship between them according to these known information, even the semantic understanding implied behind the name of the software entity itself.
Next, we need to further confirm or falsify our conclusions. If it is falsified, we need to reframe the relationship between the various software entities. How to prove or prove false? We confirm our guesses by selecting the most important classes or functions and understanding their business processes by looking at their source code.
Of course, if you can find someone who has done this business before, don’t hesitate to find them and talk to them for an hour or so, and prepare a list of questions you might be confused about in advance. This dramatically shortens your understanding of the entire system.
Finally, to ensure that we understand the system correctly, we need to write down and document our conclusions. That way, the next time someone else takes over the system, you don’t have to decompile all over again.
Understand the business implementation mechanism
After the outline design of the business system, the interface is clear, generally speaking, we have a preliminary spectrum of this system. If we were evaluating the adoption of a third party module with a relatively light goal, we would basically be done here.
We only investigate implementation mechanisms when necessary. When we talked about the system architecture combing process, we also covered source code understanding in part. However, it is important to be clear that the purpose of the implementation of some of the core code we studied in the previous section was to confirm our guesses about the business division, not to implement the mechanism itself.
Researching the implementation can be very time-consuming, given the number of userstories in the system. It is very time consuming to study and write down the specific business processes of each UserStory. If the business system is not what we are going to focus on next, there is no need to overinvest in it.
That’s when goals are important.
If we solve a Bug in passing, either with third-party code or with a temporary task assigned by a superior, we naturally focus on the business process associated with the Bug to be solved.
If we were taking on a new business system, we wouldn’t have the energy to figure out all the details at once. At this point we need to sort out the key business processes.
How do you figure out the business process?
Program = data structure + algorithm
It’s the same basic formula. To understand business processes, the next thing to do is to first understand the data structures associated with those business processes.
Data structure is easy to comb, class member variables, database table structure, usually have a quick way to extract. Except MongoDB can be a little more difficult because of the weak schema, we need to read the code to understand the schema. To make matters worse, we’re not sure how many rounds of schema changes we’ve gone through historically, which is probably not apparent from the latest version of the source code. If we’re not careful, we might end up with data in an unexpected schema. Figure out the data structure, and you’re done.
What is left is to process the business processes for each UserStory and draw their UML sequence diagrams for the business processes. This process can be supplemented at any time. So let’s just pick the ones that are most relevant to our current job.
Finally, and again, we need to write down the results we’ve compiled in time as part of the architecture document. As more and more people supplement the full architectural design document, it is possible to take our project out of the chaos.
conclusion
The ability to read code is extremely important for any project team. Even if you think your team has good consensus management, good team chemistry, good engineering habits, and a willingness to document, none of that replaces the basic activity of reading code.
Reading code is an essential skill.
Why do you say so? Because: Code is documentation, and code is more consistent documentation to understand.
In addition, as a small sidebar, we need to point out that the result of reading code is not always a mere complement to the architectural design documentation. Sometimes we change a few lines of code.
This is normal and should be encouraged. Why encourage code change? It’s because we encourage the elimination of smells wherever and whenever we can. It’s a good thing to change a few lines of code that are obviously not in style.
But we also have to have principles.
- First, don’t make big changes, such as limiting changes to no more than 10 lines in a single function.
- Second, make sure the semantics are exactly the same before and after the change. Such consistency needs to include semantic consistency on all corner cases, such as error codes, boundary of conditional statements, etc.
- Third, no matter how confident you are, you need to complete the relevant unit tests to ensure that the conditional boundaries of the modified code are covered.
\
This article is shared from wechat official number – May Heaven have no BUG (MA214617). If there is infringement, please contact [email protected] to delete. This article participates in “OSC source innovation Program”, welcome you to join us and share with us.