Yesterday zhihu was once again busy with wang Yin’s new article “Why I stopped being a PL person”, I did not participate in his comments before. Because to be honest, I feel really sorry that Wang Yin has developed to where she is today, and his words once made sense to me. A few years ago, I also learned something from his articles and learned some profound nouns. Then I made these nouns no longer profound in my mind through a series of study and training. But while I was working my way from tool slave to tool master, the man who had proclaimed himself to destroy all superstitious beliefs about authority was now establishing himself as an even more unquestionable authority.
As can be seen from several articles (including this one), wang Yin is most proud of making “the world’s most advanced profiler Pysonar2”. Then, in keeping with the indisputably advanced nature of the Three Representatives, he has repeatedly rubbished Control flow analysis in different places:
So when you look through all of this, you see that the PL academia, over and over again, is solving problems that have already been solved. Sometimes several sub-areas actually solve the same problem. Yet people in each subfield claim that their problems are intrinsically different and claim to be the progenitor of that subfield. Some have even created generations of PHDS and professorships for more than 20 years. Their theories are updated from generation to generation, but in the end they fail to solve practical problems. One such subfield is so-called Control-flow Analysis (CFA).
People who don’t understand may just brush off the CFA slur without wondering why control flow is the focus of so much static analysis. The PL field does have a big problem, is too few people! Layman is no one dare to doubt this “advanced”. Until now many people have assumed that his ability is a matter of character (it’s all over Zhihu’s discussion of his new article). So I’m going to show you how Pysonar2 is; And why Wang is so crazy about vilification of control flow analysis technology in order to maintain Pysonar2’s position.
The problem of concrete type inference in dynamic languages can be easily realized by controlled flow analysis. However, there was a flaw in the old control flow analysis, so Wang invented a “wrong” method to partially solve this flaw in Pysonar2. But Pysonar2 is too low for today’s control-flow analysis techniques.
Pysonar2 is officially Python’s Type inferencer and indexer. How do I systematically learn dynamic language type derivation, type system and other knowledge? – Peng Fei stopped in the middle of his answer at the most critical point (because static analysis was not well learned at that time…) . It’s easy to do pysonar2-type reasoning for dynamic languages using Control flow analysis. The core idea of Control Flow analysis is to trace the flow of functions (closures) through the program in the way that the simulation program runs, for example (lambda (x) x) (lambda (y) y)). The analysis process of the program is as follows: We know that this is a function call (application), so the argument (id function with y) flows into the parameter (x), and then the function body with id function with x runs knowing x -> (lambda (y) y), Since the function body is simply a variable extraction, (lambda (y) y) flows out of the function call expression and continues to flow to the peripheral program until it stops, resulting in x pointing to (lambda (y) y). Here, the “function (closure)” we are tracing is called an Absract Value. This abstract Value can also be something else, such as simply substituting “type” to basically get a type derivator. So type inference in dynamic languages is about tracking the flow of types through programs. In other words, type flow analysis is a sub-problem of control flow analysis. The research on this topic has been around for A long time, and anyone interested can just Google it. There are many papers (A Type System Equivalent to Flow Analysis by my advisor’s advisor, Jens Palsberg, and A series of subsequent papers, For example, the inventor of CFA2 made Javascript Type Inference, which is much better than Pysonar2, using control flow analysis method during his internship in Mozilla.
One might ask why Wang Yin makes another Pysonar2 when control flow analysis can solve type analysis. The reason is that
At the time ofThe control flow analysis method was too weak (this is a bit unfair because CFA2 was already around, and although CFA2 is a hot chicken by today’s standards, it was at least sound compared to Pysonar2).
Here is a slideshow of Pysonar by Yin Wang, in which Recursion Detection is what he calls Pysonar’s most distinctive core technology:
When a wise man sees it,The call/return mismatch (Wang Yin is doing the same thing herself while spitting out other people’s reinventing ideas). But his solution is a specific approach to type derivation that looks efficient, but in reality the code is long, ugly and does not guarantee correct results (unsound, as Wang himself concedes).
So why is the call/return mismatch problem so difficult to solve in control flow analysis? Because the traditional K-CFA control flow analysis method is to abstract the infinite state space of the program into a finite “abstract state space” and then use a finite state automata to process. However, due to the existence of functions, exceptions, call/ CC and other cross-process control flow structures, the actual program state has recursion (infinite but regular), so the correct approach is to use a push-down automaton to deal with it. This problem has been noted in data flow analysis for many years, and Reps’s IFDS/IDE uses push-down automata to solve a few cross-process data flow problems that call/return mismatch. Then the research on the method of Pushdown Control Flow Analysis, or context-free Language Reachability, has gradually begun. From CFA2 to PDCFA to AAC to the awesome P4F. For control flow analysis, type derivation has long been a piece of cake. Not only is the theory simple and elegant, but the program is easy to implement, and the operating efficiency has been breaking through the barrier of progressive complexity (big O).
That’s the theory, but let’s take a look at some of the drawbacks of Pysonar2 through a few examples (the first two are the current key to dynamic language typing, and Pysonar2 doesn’t consider them at all) :
1. Pysonar2 claims to derive the “most powerful” Intersection Type. In fact, it only uses the Union Type. In fact, Union Type is a very intuitive and helpless choice, because the Type deduction made by control flow analysis method is to infer the definition of Type from the use of Type. Therefore, Pysonar2 made concrete type inference rather than the polymorphic type inference of Haskell. Meaning that it can give {int – > int} | {bool – > bool} this type, but can not accurately guess it is forall a. a – > a. As shown in figure
2. Another disadvantage of this “infer a type definition from its use” approach is that you cannot detect all type errors, because the use of the wrong type will eventually propagate to the wrong type definition. Take this example:
The first call to function f, f(A()), is an obvious type error, but Pysonar2 has no power to detect it.
3. In the previous example, you can also see the error that the x in “a.x” refers to Class A instead of Class B. The reason for this is that the author simply does not understand the nature of static analysis. It is mentioned above that Abstract values can be functions, closures, types, etc., when it is more accurate to say that Abstract values are collections of functions, closures, and types. Because static analysis can approximate the running behavior of a program, that is, it can’t find the results of the program (because it can’t), but it can find all the possibilities of the results of the program.
4. I also found a seemingly serious Bug in Pysonar2, but it was really just a design glitch. Finally, I will tell you how to fix this Bug. Examples of the following programs:
The fib function is derived from the type int -> int, but the function does not know the type of the argument n (because this function is not called), and the result is different every time n occurs. . This is because Pysonar2 treats each local variable independently,The correct approach is to do a semantic analysis across the AST prior to analysis, associating each variable’s access to its definition so that the analysis results of the same variable in different locations can be naturally propagated to each other.