By Xu Lun, F(X) Team, Ali Tao Department

According to the charts, we have roughly three routes:

  • The first is to start with the bug itself, first to predict the bug, then to understand the bug, and finally to try to fix the bug itself.
  • The second is to start with the program, because a bug is essentially a program problem. To understand a bug, the understanding of the program itself is an inescapable direction. There are four main directions: program understanding, program analysis, program self-repair and program intelligent composition.
  • The third is a more general direction, using the power of other tools, such as knowledge graph technology based on the construction of software knowledge graph.

We’ll look at the details of each branch once we have a full picture.


Defect prediction

For bugs, the first step is to predict whether there are bugs, how many bugs there may be, and if there are bugs, how serious they are. In addition, how much time and manpower are needed to solve these bugs?


At this point, we actually don’t have a deep understanding of the bug itself, and that’s what the next part of defect understanding is going to tackle. We will mainly use some machine learning and statistical methods to make predictions. Although this step of defect prediction is relatively simple compared with the work behind, its doorways are also many. Let’s take a look at what you need to do to make a real-time prediction of software defects:


As you can see, our main approach is to monitor code changes and extract various dimensions from them for statistical analysis and model training.

Defects in understanding

In the defect prediction step, we may or may not get good results, but the problem is the lack of interpretability. So we try to understand the root of the problem from the perspective of defects through test cases, static analysis, and the correlation between them.


Defect location and repair

After defect prediction and defect understanding, we finally face the problem directly, that is to locate the problem and fix the problem. Defect location is also a technical direction. Traditionally, we have used techniques such as logging, setting assert, breakpoint debugging, and profiling. Advanced technologies include test coverage technologies such as program spectrum analysis, program analysis-based technologies, and related technologies based on machine learning and data mining.


Some of the older location-based technologies are relatively mature, but new ones that take advantage of new machine learning technologies are still emerging. Let’s take, for example, location technology based on information retrieval. It can compare semantic similarity based on text similarity method, deep neural network and machine translation technology.


Once the problem is located, we can use automatic software repair techniques to find or generate patches.


Automatic repair can be broadly divided into four categories of techniques: based on heuristic search, based on manual templates, based on semantic constraints, and based on statistical analysis. The above is directly considered from the perspective of defects. What causes defects is primarily code related, so we need to understand both defects and code.

Program understanding

Program understanding is similar to defect location, which is also static analysis, dynamic analysis and the use of machine learning.


Program analysis

For code, program analysis is the most core technical means. The main directions are listed below, and we’ll talk more about them later. Some directions require some basic knowledge and academic training.


Automatic program repair

Automatic program repair requires some knowledge of program analysis, based on the protocol. If there is a complete specification, that is, we have a clear understanding and definition of the problem, then we can fix it for the classification. In the absence of a full specification, you can try using a contract as a specification or manually writing a program specification. If the above is not enough, we have to fix the program based on the test set.


With prediction as the starting point, understanding as the basis, repair as the core, and intelligent synthesis as the ultimate goal, our simple navigation journey ends here.

Programmed intelligent synthesis

Bug to automatic repair so far, also can’t have no matter their own new bug to play ah. But programs can go one step further and use program synthesis techniques for intelligent composition. We can learn from examples, compose them based on code frameworks or rules, and use natural language processing techniques.


reference

The good news is that this direction has been a hot one in recent years, so there are plenty of Chinese review articles. Only one defect location review is required to read in English. Of course, most of the specific papers are in English.

  1. Gong Lina, Jiang Shujuan, Jiang Li. Research progress of software defect prediction technology. Journal of software,2019,30(10):3090-3114. (in Chinese) www.jos.org.cn/1000-9825/5…
  2. CAI Liang, FAN Yuanrui, YAN Meng, XIA Xin. Advances in real-time software defect prediction. Journal of software,2019,30(5):1288−1307. www.jos.org.cn/1000-9825/5…
  3. Li Bin, He Yeping, Ma Hengtai. Automatic program repair: key issues and techniques. Journal of software,2019,30(2):244−265. www.jos.org.cn/1000-9825/5…
  4. Jin Zhi, Liu Fang, Li Ge. Program understanding: Present and future. Journal of software,2019,30(1):110-126. www.jos.org.cn/1000-9825/5…
  5. Zhang Jian, ZHANG Chao, Xuan Jifeng, XIONG Yingfei, WANG Qianxiang, Liang Bin, Li Lian, Dou Wensheng, Chen Zhenbang, Chen Liqian, CAI Yan. Progress in program analysis. Journal of Software,2019, 30(1):80-109. www.jos.org.cn/1000-9825/5…
  6. Li Xiaozhuo, He Yeping, Ma Hengtai. Research on understanding defects: Current status, Problems and Development. Journal of software,2020,31(1):20-46. (in Chinese) www.jos.org.cn/1000-9825/5…
  7. Gu Bin, YU Bo, DONG Xiaogang, LI Xiaofeng, ZHONG Rui-ming, Yang Meng-fei. Research progress of intelligent programming synthesis technology. Journal of Software. www.jos.org.cn/1000-9825/6…
  8. Li Zhengliang, Chen Xiang, Jiang Zhiwei, Gu Qing. Overview of software defect location methods based on information Retrieval. Journal of Software, 201,32(2):247−276. www.jos.org.cn/1000-9825/6…
  9. Wong WE, Gao RZ, Li YH, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): The 707-740. [doi: 10.1109 / TSE. 2016.2521368]



Tao department front – F-X-team opened a weibo! (Visible after microblog recording)
In addition to the article there is more team content to unlock 🔓