An article this week accusing Vue of copying Angular caused a stir. To this end, the author, as a neutral React user, analyzed the naming styles of tens of thousands of variables in 13 mainstream front-end framework versions and applied the text similarity algorithm in natural language processing to make an objective evaluation of the validity of this argument.

Train of thought

It is an effective technique to compare text similarity with algorithm in the analysis of book plagiarism and paper duplication. So, how do we through this means, to analyze the source level of plagiarism?

When comparing the similarity of two sentences such as “I like to write code and don’t like to quarrel” and “I don’t like to quarrel and like to write code”, the general idea is to first divide words, then calculate word frequency, then vectorize word frequency, and finally compare the included Angle of two high-dimensional vectors. The smaller the included Angle is, the more similar it is.

In the case of does Vue copy Angular, the object of analysis changes from a sentence to code. The main differences here are these two:

  1. The code is highly structured text, and word segmentation has been done through a lexical analyzer.
  2. The code of a programming language is filled with a large number of keywords for that language. Such asvarfunctionThe repetition of these keywords has nothing to do with similarity.

As for the relationship between plagiarism and similarity, we make the following hypotheses:

  1. When solving the same open problem, codes written independently without plagiarism usually have huge differences in coding styles represented by variable names, and the similarity is low.
  2. There is large-scale plagiarized code, similar to different versions of the same framework without large-scale refactoring. At this point, the coding style is similar, and the similarity is high.
  3. Before and after the switch module declaration order, does not affect the similarity.

Given these premises, we can determine the following analysis strategy:

  1. Input each big frame uncompressed source code, parsing out its syntax tree.
  2. Discard extraneous parts of syntax tree and extract its variable declaration to represent its encoding style.
  3. The text similarity algorithm was used to calculate the similarity between variable names and analyze the results.

Variable name extraction

When you reference framework dependencies through Webpack, you typically import the framework source code packaged into a single file and unscrambled. This is a very nice feature. I wrote a simple Webpack Loader to extract variable names in this process:

// loader/index.js
// The content passed to the loader is the JS source code
module.exports = function (content) {
  return demo(content)
}Copy the code

After obtaining the framework source code in the demo function, parsing the syntax tree is not a difficult problem. With acorn, the Parser, we can do this:

function demo (content) {
  const ast = acorn.parse(content, { sourceType: 'module' })
  walk.simple(ast, {
    // In the walk, extract all variable names in the variable declaration statement
    VariableDeclaration (node) {
      const name = node.declarations[0].id.name + '\n'
      fs.appendFileSync(resolve('./result.txt'), name)
    }
  })
  return content
}Copy the code

At this point we can get all the variable names in a front-end framework in result. TXT, as follows:

p
i
resolved
c
segs
i
...Copy the code

What the hell is all this… The text we get at this point is not preliminary processing, we are really interested in the word frequency of each frame variable name. Word frequency calculation is a good interview question, but here we will do it directly through Wordclouds’ service. This step also includes basic cleaning to remove nonsense variable names such as I/A/B. Our result is something like this:

29    value
19    arg
18    result
16    key
14    index
...Copy the code

These are the Top 5 variable names of one of the React/Vue/Angular frameworks. Guess which one? Okay, these variables are really bad names. For the time being there was no sign of anything. Let’s continue the similarity comparison and find out later.

Similarity algorithm

We have actually obtained objects like this from above:

const a = {
  'foo': 5.'bar': 4.'baz': 3
}
const b = {
  'foo': 4.'bar': 6.'baz': 0
}Copy the code

We can think of each variable name as an independent dimension, and all types of variable names existing in each frame constitute a vector of higher dimensional space. So our problem is simplified to how do we compare the similarity between a and B. Ruan Yifeng’s introduction is quoted here:

We can think of them as two line segments in space, both from the origin ([0, 0… Set off and point in different directions. An Angle is formed between two line segments. If the Angle is 0 degrees, it means that the direction is the same and the line segments coincide. If it’s 90 degrees, that means it’s a right Angle, completely different directions; If it’s 180 degrees, that means it’s going in the opposite direction. Therefore, we can judge the similarity of vectors by the magnitude of the included Angle. The smaller the Angle, the more similar it is.

theta-1

Assuming that a vector is [x1, y1] and b vector is [x2, y2], the law of cosines can be rewritten in the following form, where the calculated cosine theta represents the similarity:

theta-2

Generalize to the general case of higher dimensional vectors:

theta-3

Write simplified sample code according to the algorithm:

function getTheta () {
  let x = 0
  Object.keys(dictAll).forEach(key= > {
    if (dictA[key] && dictB[key]) x += dictA[key] * dictB[key]
  })
  let yA = getY(dictA)
  let yB = getY(dictB)
  const result = x / (yA * yB)
  console.log(result)
}Copy the code

Finally, run our analysis algorithm to process the variable name in the previous step:

➜ node analyse [email protected] [email protected]
0.9436438155995188Copy the code

Experimental results and summary

After a series of foreshadowing, it was finally time to test the truth. Firstly, we verify whether Vue conforms to the hypothesis that similar versions have high similarity based on:

➜ node analyse [email protected] [email protected]
0.9436438155995188Copy the code

It can be seen that there is indeed a high degree of similarity between the latest VUE 2.4.2 and 2.4.1. Now compare the similarity between vue’s latest version and 2.0.0’s Major version:

➜ node analyse [email protected] [email protected]
0.8838059164881868Copy the code

The decrease in similarity indicates that the latest version is a significant change from last year’s V2. Then compare the similarity between V2 and V1 series:

➜ node analyse [email protected] [email protected]
0.5883193867742227Copy the code

The similarity is significantly reduced, so reconstruction is clearly true. Finally, compare the latest version of Vue with the first version:

➜ node analyse [email protected] [email protected]
0.4590386014371645Copy the code

This is the lowest similarity in the Vue family, reaching a level of 0.45. Here’s the kicker, comparing the latest version of Vue to Angular:

➜ node analyse [email protected] [email protected]
0.19322280449484375Copy the code

A mere 0.19 similarity! Well, Angular’s latest version has also been refaced.We can directly compare the original Vue and Angular 1.x series:

➜ node analyse [email protected] [email protected]
0.294527560626686Copy the code

This similarity is also significantly lower than the vertical comparison of the entire Vue series! For a more effective comparison, let’s use a gun next door to React (the unnumbered version indicates the latest version) :

➜ node analyse [email protected] react 
0.27592736925848194Copy the code

The comparison of 0.27 to 0.29 shows that even the earliest (and most similar to Angular) Vue was only as similar as Vue and React today! To be fair, let’s let jQuery join in the fun:

➜ node analyse jquery [email protected]
0.2508302720623658Copy the code

This is also less than 0.3, from which we can even make a bold conclusion that Vue is as similar to Angular as Angular is to jQuery! No one thinks jQuery and Angular are clones, right?

Of course, the similarities between Vue and Angular exist. We can find another pair of examples in the front-end domain: jQuery VS Zepto. How similar are they?

➜ node analyse jquery zepto
0.25994377334635854Copy the code

This similarity is almost the same as Angular VS jQuery, indicating that even with similar design concepts, the similarity between original frameworks with different implementations is very low. Vue VS Angular is exactly the same.

HMMM At present our arguments have been quite sufficient. Finally, let’s compare a situation: how similar are original frameworks with completely different design concepts? Let’s pull jQuery and React:

➜ node analyse jquery react        
0.1007248324385447Copy the code

The lowest similarity of the game… So you can understand how unaccustomed the Front-End of the jQuery era was to React 😂

So far, our conclusions are as follows:

  • The similarity between Vue series iterations is high.
  • Even the original Vue bears little resemblance to classic Angular.
  • The similarity between the latest Vue and the latest Angular is even lower, indicating that the two paths have evolved more independently.
  • Even if the design concept is similar, the similarity between original frameworks with different implementations is very low.
  • React is very, very similar to jQuery (digress).

Therefore, the argument that Vue copies Angular is not valid.

The experimental data of this paper is hosted on Github. Interested students are welcome to verify and improve these conclusions. In the end, frameworks are just tools, and fighting with each other is not good for the community. Quoting the Boss’s opinion of our company: “First-class people do things, second-rate people comment on others, and third-rate people comment on others”, WE hope that we can spend our time bicker on more practical things, and promote the improvement of technology level, community atmosphere and average salary…