I read a blog post titled “Why We’re Going from X to Y”, which programming language, I forget. And I started thinking, well, could we put together a contingency table of N by N for moving from X to Y?

So I wrote a little script that I could use to run queries on Google, and a little bit of code to get the number of results. I tried searching with several different keywords, like “move from

to

“, “Switch to

from

” and so on. You end up with an N by N contingency list for all languages.



This chart is very large, so let’s explain how to read it:

  • The left longitudinal language is the “source language of defection”;
  • The horizontal language above is the “target language of defection”;
  • For example, the number of transfers from C to C# is 3619, and the number from C# to C is 37229;

 

(Click for larger version)

Here’s the fun part. We can actually think of the number of search results as the probability of switching between programming languages to draw some conclusions about the popularity of programming languages in the future. A key point is that the smooth distribution of these does not depend on their initial distribution. It turns out that this is just the first eigenvector of the matrix. So there’s no need to make assumptions about which programming language is popular today, and our prediction of the future smooth distribution is independent of the initial state.

We need to convert the above list into the form of a transition matrix describing the probability of going from state II to state JJ. Very simple — to interpret a syndicated matrix as a transition probability, you can regularize each row of the syndicated matrix. So that gives you a rough approximation of the probability of going from X to Y.

It’s not important to find the first eigenvector, we just multiply a vector multiple times this transition matrix, and it will converge to the first eigenvector. By the way, check out the notes below for more discussion on how I do this.

Go is the future of programming languages

Without further ado, here are the top languages in the smooth distribution:

16.41% Go
14.26% C
13.21% Java
11.51% C++
9.45% Python

I sorted the transition matrix by programming language based on future popularity (prediction based on the first feature vector)

(Click for larger version)

Surprisingly (to me at least), Go was the big winner. There are so many search results showing people switching to Go from other languages. I’m not even sure how I feel about it (I have mixed feelings about Go). But my absolute analysis points to the inevitable conclusion that Go deserves attention.

C turns 45 this year and is still doing well. I did some searches by hand, and a lot of them were people actually writing that they had optimized for specific tight loops by migrating from another programming language to C. Is the result wrong? I don’t think so. C is the lingua Franca in which computers work, so this is not surprising if people are still actively converting bits of other languages into C. Seriously, I think C will get even stronger before its 100th birthday, which is 2072. With my support for C on LinkedIn, I’m hoping recruiters will give me some jobs in C in the 2050s (I take that back — hopefully C will outlive LinkedIn).

In addition to the above, these analyses are in line with my expectations. Java is here, Perl is dead, Rust is doing pretty well.

By the way, this analysis reminds me of the following tweet

This graph is very interesting, showing the conversion rate between R and Python in data analysis.

JavaScript framework

I did the same analysis for the front-end framework:

I expected React to come out on top, but interestingly, Vue also performed very well. I was surprised by Angular’s behavior — rumors of a mass exodus from Angular.

The database

I started looking at bike-sharing apps, deep learning frameworks and other things, but the data was sparse and unreliable. If there is a result, we will announce it in time!

Matters needing attention

  • Check out the discussion on Hacker News and /r/ Programming for this post;
  • This post gave me some insight into why I switched from PROGRAMMING language 1 to programming language 2.
  • Here’s how to grab Google and get the number of results:

 

Python

1 2 3 4 5 6 7 8 9 10 11 defget_n_results_dumb(q):     r=requests.get(‘http://www.google.com/search’,                      params={‘q’:q,                              “tbs”:”li:1″})     r.raise_for_status()     soup=bs4.BeautifulSoup(r.text)     s=soup.find(‘div’,{‘id’:’resultStats’}).text     ifnots:         return0     m=re.search(r'([0-9,]+)’,s)     returnint(m.groups()[0].replace(‘,’,”))

 

  • Unfortunately, Google has a rate limit on the IP queried, but I ended up using Proxymesh to grab all the data I needed for this N*N combination.
  • Note: I put the exact query statement in double quotes when searching, such as:Switch from go to c++
  • Careful readers may ask why JavaScript is not included in the analysis. The reason is that :(a) if you use it in front end development, you’ll stick with it and there will be no transition, unless you’re crazy enough to do transpiling (compiling from one programming language to another), which is rare; (b) People will think of the JavaScript on the back end as “Node”.
  • What about the diagonal elements? Of course, it’s possible that some people will stick with just one programming language. But I chose to ignore it because :(a) it turns out that 99% of search results for things like “stay with Swift” are related to Taylor Swift; (b) The stationary distribution is independent of adding a constant diagonal (unit) matrix. (c) It’s my blog, so I can do what I want [smirk]
  • For (b), e(αS+(1−α)I) = e(S) is true, where e(… Is the first eigenvector, and I is the identity matrix. This conclusion may not be entirely realistic, and you may not be equally likely to stick with it in different programming languages.
  • The method of multiplying repeatedly to get the first eigenvector is called Power iteration.
  • Can this model, represented by eigenvectors, be a super accurate description of the actual situation? Probably not. A quote from George Box came to mind: “All models are wrong, but some work.”All models are wrong, some are useful“, meaning that no model can describe the actual situation completely accurately, but there are models that can be used to solve the problem. George Box was a famous British statistician).
  • I know there are other constraints that need to be considered, but that’s pretty much what happens.
  • The code can be found on Github.