Planning to edit | | Natalie author JOHANNES BADER | translator such as nuclear coke edit | Vincent
Facebook has developed a tool called Getafix that automatically finds bug fixes and provides them to engineers for approval, greatly improving their productivity and overall code quality. Getafix not only uses powerful clustering algorithms to analyze the context of the problem code to find a more appropriate fix, but also provides solutions that are easy for human engineers to understand. Getafix is the first automated repair tool to be deployed on a large scale in Facebook’s production environment, further improving the stability and performance of Facebook’s applications with billions of users.
Please pay attention to the wechat public account “AI Front”, (ID: AI-front)
Modern production codebase is complex and constantly updated. To create a system that would automatically find bug fixes — without the help of an engineer — we built a tool that would learn how to fix bugs from the engineer’s previous changes to the code base. It finds hidden patterns and uses them to identify the remedies that are most likely to fix new bugs.
The tool, called Getafix, has been deployed to Facebook’s production environment to further improve the stability of applications used by billions of people. Getafix is generally used in conjunction with two other Facebook tools, though the technology can be used elsewhere. It is currently able to suggest fixes for bugs found by Infer, our static analysis tool that can identify issues such as null pointer exceptions in Android and Java code. It also provides fix suggestions via SapFix — for bugs detected by Sapienz, our intelligent automated test system. Now, we’ll take a closer look at how Getafix learns to fix bugs (any code problem, not just one that causes an application to crash).
Getafix aims to let computers do routine tasks, but under human supervision, because a human still has to decide whether a bug needs a complex fix. This tool applies a new hierarchical clustering approach to thousands of previous code changes while examining the code changes themselves and their context. It detects underlying patterns of bugs and provides fixes that were not detected by previous autofix tools.
Getafix also significantly Narrows down the specific areas of your application that might need to be changed during bug fixes, making it easier to select the appropriate fix. In addition, it does not require as much computing time as previous brute force and logic-based technologies. This more efficient approach allowed Getafix to be successfully deployed into production. At the same time, Getafix’s ability to learn from past code changes is enough to produce fixes that are easier for human engineers to understand.
Getafix has been deployed in The Facebook production environment and is responsible for automatically fixing null dereference bugs provided by Infer reports and providing suggestions for fixing null dereference related crash errors marked by Sapienz. In addition, Getafix was used to address code quality issues that were discovered when revisiting existing code at newer versions of Infer.
How Getafix differs from traditional simple automatic repair tools
In current industry practice, automatic repair is mainly used for fundamental problems, while code repair is simpler. For example, a profiler might raise a “fatal Exception” warning, emphasizing that a developer might forget to use a new Exception(…) Let’s add a throw. The autofix tool can do this directly, and how it can be adjusted can be defined by lint rules — in other words, it doesn’t need to know the specific situation in which it operates.
Getafix is quite different in that it provides more generality and context-dependent problem solving. In the following code example, Getafix provides the following fixes to the Infer error in line 22:
Note that this fix depends not only on the variable CTX, but also on the return type of the method. Unlike simple Lint fixes, such fixes cannot be incorporated into Infer itself.
The following figure shows the fixes provided by Getafix to Infer bug. Although the bugs from Infer are always the same (null method calls, potentially raising the risk of NullPointerException), each specific fix action remains unique. It’s also important to note that Getafix’s fixes are exactly the same as what human developers would normally do.
In-depth understanding of Getafix key technical details
Getafix is organized as shown in the tool chain below. In this section, we describe the three main components of Getafix and their respective capabilities and challenges.
Tree Differencer Indicates changes in the Tree level
The abstract syntax tree-based Differencer is first responsible for identifying actual editing traces between two source files, such as successive revisions to the same file. For example, it detects fine-grained edits such as packing statements with if, adding @Nullableannotations or imports, and returning conditions earlier to an existing method. In the following example, inserting the condition statement if dog is null and returning ahead of time, renaming public to private, and moving a method are all detected as actual edits. Whereas the line-based Diffing tool only marks the method as a complete remove and insert, the Tree Differencer detects this move and treats the insert within the move method as the actual edit.
The main challenge of the Tree Differencer is how to effectively and accurately align the “before” and “after” portions of the Tree level to identify the correct actual edits and their mappings.
New method of mining repair patterns
Getafix mines patterns by leveraging new hierarchical clustering techniques and an antisyncretism approach, an existing approach that enables generalization between different symbolic expressions. After that, it builds a set of potentially related tree differences, then selects the most common programs in that set and switches to repair mode. These patterns may be abstract and contain different “vulnerabilities” for which the program transformation is directed.
The following example image shows a set of hierarchies, or tree charts, generated by a set of edits. (In this example, we’ll just use the edits from the previous example.) Each line shows an editing mode — purple for “before” and blue for “after” — and some metadata. Each vertical black bar corresponds to a specific level in the hierarchy, where the edit mode at the top of the black bar represents the pattern obtained by unmerging all other edits at the same level in that structure. Other edits are connected by thinner black lines. The anti-union combines the “return early if dog is null” condition from the previous example with another edit — the only difference being that “Dog is drinking water.” As a result, it produces an abstract repair pattern that represents commonality. The symbol H0, introduced by antisyncretism, represents a “vulnerability” that can be instantiated based on context.
This editing mode can then be combined with other editing modes that have more varied variable names but still have the same overall structure. The whole process produces increasingly abstract editing patterns as it sorts through the tree. For example, it can combine this edit with cat-related edits to get an abstract edit at the top of the chart.
More importantly, this hierarchical matching process provides Getafix with a powerful framework for discovering reusable patterns in code changes. As shown in the following figure, a total of 2288 edits used to fix the null pointer error reported by Infer in our code base were summarized into a set of trees (horizontal layout, minified). The restoration patterns that we want to explore are definitely hidden in this tree.
Pattern mining based on the antisyntactic approach is nothing new, but to solve new bugs with as few fixes as possible, we need to further enhance the pattern results mined.
One of these changes is the introduction of some peripheral code, which is the part of the editing result that has not changed. In this way, we can discover not only the patterns that people are taking in the change, but also some of the patterns that exist in the context when the change is applied. For example, in the first tree above, we noticed two different edits in dog.drink(…) ; Add if(dog==null)return earlier. Although t ultimately responds (…). ; There are no changes, but they should be considered as contextual information in the “before” and “after” parts of the pattern to help us understand how this fix is applied. At a higher editing level, the dog.drink() context is merged with other contexts to form an abstract context h0.h1() that restricts where patterns can be applied. In the next section, we’ll look at another, more realistic example.
Greedy clustering algorithms are often unlikely to learn this, as described in the previous auto-repair tool literature. This is because greedy clustering algorithms tend to maintain a single representation of each cluster, so if a context does not exist in all edits of training data, the algorithm will not introduce that context. For example, if an edit is in do(list.get()); Insert if (list! When merged with dog.drink() mentioned in the example above. = null) return, then the greedy clustering algorithm will discard all context about returning the specific insertion location ahead of time. Getafix’s hierarchical clustering approach, in contrast, preserves context at all levels as much as possible to ensure a general level of the overall structure. To some extent, some of the normal context we want to learn may be lost, but it will still exist in some low-level place in the structure.
In addition to the peripheral code, we also associated edits with the Infer Bug reports that prompted them to understand the mapping between the edit mode and the corresponding bug reports. From the first tree diagram above, Infer regarded “errorVar” as the bug source variable in the bug report and gave vulnerability H0 after de-integration. Based on this, we can then change the variable we need to focus on to H0 during the release of a new Infer Bug report to make the overall fix pattern more specific.
How does Getafix create patches
As a final step, we need to consider how to capture the buggy source code and generate a fix pattern from the results of the mining to generate a fix patch for the source code. In this regard, we often have multiple repair modes to choose from (as shown in the previous tree). The challenge then is to choose the right mode to fix specific bugs. If the pattern applies to more than one location, Getafix also needs to select the correct match. The following example illustrates the general approach we took and how this challenge can actually be addressed in Getafix.
Example 1: Consider the pattern we mined earlier: h0.h1(); → if (h0 == null) return; h0.h1();
Below, we’ll briefly describe how to generate the following patches for completely unfamiliar code.
Getafix creates the patch through the following steps:
Find sub-ast: mlistView.clearlisteners () that match the “before” section;
Instantiate vulnerabilities H0 and H1
Replace the sub-Ast with the instantiated part
Note that h0 in the later section is bound because it contains the unmodified context h0.h1(); This will help limit the number of locations where the pattern applies. If the context is not changed, the mode will be → if (h0 == null) return; . It is clear that this pattern will work for many places that have nothing to do with expectations, such as mlistView.clearlisteners (); And then even mListView = null; After.
In fact, insert-only patterns are also likely to appear higher up in the tree, with h0.h1(); The schema for this context has been ununified by the schema responsible for inserting a return before a different statement. The following example shows how Getafix handles situations where such patterns are too broad.
Example 2: Consider the following pattern: h0.h1() → h0! =null && h0.h1()
Typically, this patch should come from a fix pattern for an if condition or return expression, so we certainly want it to work in that context. However, they can also be used in other situations, such as mlistView.clearlisteners (); . Getafix’s ranking strategy attempts to estimate the repair effectiveness of a pattern and assign it the context in which it is most likely to be fixed. This strategy allows the system to run without relying on validation steps, significantly reducing computation time.
The above schema will compete with other schemas, such as the more specific if (h0.h1()) {… } – > if (h0! =null && h0.h1()) { … } or the pattern in Example 1 that applies only to calling statements rather than expressions. Because more specific patterns tend to have fewer matching locations, Getafix sees them as a better solution for the situation and assigns them a higher ranking.
Getafix application and performance
Getafix has been deployed in Facebook’s production environment and is responsible for suggesting automatic fixes for null dereferencing bugs reported by Infer. Infer, by the way, is our statistical analysis tool that suggests fixes to crash bugs Sapienz found related to null dereference. In addition, Getafix is responsible for resolving some of the key bugs that Infer has previously raised.
In an experiment, we compared the proposed fixes calculated by Getafix with previous manual fixes and found that fewer than five lines needed to be modified to fix various Infer NULL method calls in a dataset of about 200 small edits. In addition, in about a quarter of cases, the highest-rated fixes proposed by Getafix matched the human-created fixes exactly.
In another experiment, we looked at a subset of the Instagram codebase and tried to batch fix about 2,000 NULL method calls in it. Getafix can attempt to use a patch in about 60% of bugs, and 90% of those attempts are automatically validated — meaning it is compilable and Infer will not issue warnings. Overall, Getafix successfully fixed 1,077 (approximately 53%) null method call errors automatically.
In addition to suggesting fixes for the new Infer bugs, we used the same approach to clear the backlog of old Infer bugs from the original code review. We have cleaned up hundreds of Infer bugs that return non-null and Infer bugs that field non-null. Interestingly, after this work was done, Getafix became better at handling unnullable returns and unnullable fields in its autofix recommendations, with the percentage of successful fixes increasing from 56% and 51% to 62% and 59%, respectively. Overall, a series of suggestions from Getafix helped us successfully fix hundreds of additional bugs over the past three months.
Getafix also generates fix recommendations for SapFix to handle crashes detected by Sapienz. Over the past few months, about half of the fixes adopted by SapFix came from Getafix and actually worked (all tested). About 80% of the fixes Getafix provided to SapFix passed all tests.
Increase Getafix’s influence
Getafix helped us achieve our big goal of getting computers to handle routine bug fixes. As we improve our testing and validation tools, we anticipate that Getafix will be able to better protect against all types of post-deployment failures in the future.
We also note that the repair patterns mined by Getafix aren’t just the bugs reported at Response Infer; In fact, it can also suggest fixes for manual code reviews. This additional source of fix patterns opens up exciting possibilities for automated repeat code reviews. In other words, it’s possible that bugs in the code base that have been flagged and fixed many times will be handed over directly to automated tools, without any need for human screening.
Getafix is part of our overall effort to build a large code corpus and an intelligent tool for associated metadata statistical analysis. The advent of such tools promises to improve all aspects of the software development life cycle, including code discovery, code quality, and execution efficiency. The valuable insights we gained from Getafix will help us build and deploy many other similarly important tools in this area.