Reference:
Why is Python Growing So Quickly?


By David Robinson

It was recently revealed that Python is the fastest growing major programming language based on Stack Overflow question visits, and it has become the most visited TAB on Stack Overflow in high-income countries.

Why is Python growing so fast?

Python is used for a variety of purposes, from Web development to data science to DevOps, and it’s worth knowing that Python-specific applications have recently become more common. I’m a data scientist who uses R, so I’m somewhat interested in the growth of Python itself. In this article, I’ll take another look at Stack Overflow data to understand the types of Python development and the companies and organizations that are most commonly used.

These analyses suggest two conclusions. First, Python is growing fastest in data science, machine learning, and academic research. This is especially evident in the growth of the PANDAS package, the fastest growing Python-related tag on the site.

As to which industries are using Python, we find that we have more access to several industries: electronics, manufacturing, software, government, and especially universities. Python’s growth, however, is evenly distributed across industries. Taken together, this will show how data science and machine learning are becoming more common in many types of companies, with Python becoming a common choice for this purpose.

As in the previous article, all of this analysis is limited to the bank’s high-income countries.

Type of Python development

Python is a common language for a variety of tasks, such as Web development and data science. How can we address Python’s recent developments in these areas?

For starters, we can examine the increase in traffic to represent distinct Python package labels in each field. We can compare the Web frameworks Django and Flask to the data science packages NumPy, Matplotlib, and Pandas. (You can also use Stack Overflow trends to compare the ratio of questions asked, not the questions visited.

Pandas is clearly the fastest growing Python package for Stack Overflow traffic from high-income countries:

It barely launched in 2011, but now accounts for nearly 1% of views on Stack Overflow issues. Questions about Numpy and Matplotlib have also increased their share of access over time. By contrast, the flow of Django problems remained fairly steady over this period, while Flask’s share remained small as Flask continued to grow. This suggests that much of Python’s growth may be due to data science rather than Web development.

However, this gives us a partial picture, as it can only measure widely used Python-specific packages. Python is also popular with system administrators and DevOps engineers who may have access to Linux, Bash and Docker issues as well as Python issues. Similarly, no Django or Flask can do a lot of Python Web development, and such developers are likely to use JavaScript, HTML, and CSS as “support” tags. We cannot simply measure the growth of tags like Linux, bash, javascript and assume they are associated with Python. Therefore, we want to measure the tags accessed with Python.

We will take into account visits during the summer of 2017 (June to August), which will help reduce the impact of students, focus on recent traffic, and help reduce the computing challenges of summarising traffic over long periods of time. We only considered logged-in users who accessed at least 50 Stack Overflow issues during this period. We consider someone to be a Python user only if the tag they visit most often is Python, and Python accounts for at least 20% of the traffic.

Which tags are often accessed by people who tend to access Python?

Pandas is mostly the most frequently accessed tag for Python developers, which is not surprising given the early growth we have seen. The most frequently accessed Python visitors are JavaScript, which probably represents a group of Python Web developers (Django has a few slots lower). This confirms our suspicion that we should be thinking about what tags to access with Python, not just the growth of Python-related tags.

In the list, we can see other “cluster” technologies. We can test their relationship by considering which tags tend to be related: that is, whether Python users disproportionately access the two tags. By filtering tag pairs with a high Pearson correlation, we can display these relationships in a network graph (see here for more information on this type of visualization).

We can see several broad categories of techniques that outline the kinds of problems that Python typically solves. In the upper center, we see a cluster of data science and machine learning: it has PANDAS, NumPy, and Matplotlib at the center, and is closely linked to technologies like R, Keras, and TensorFlow. The following cluster describes Web development, including JavaScript, HTML, CSS, Django, Flask, and JQuery tags. The other two clusters we can see are systems administration/DevOps (around Linux and Bash) on the left, and data engineering (Spark, Hadoop, and Scala) on the right.

Growth by subject

We’ve seen that Python-related Stack Overflow traffic can generally be broken down into several topics. This allows us to examine which topics are responsible for most of Python’s growth in Stack Overflow visits.

Imagine that we are looking at a user’s history and we see that Python is their most visited TAB. How do we guess whether they are web developers, data scientists, system administrators or something else? So, we can consider the tag they visited the second time, then their third, and work through the list of tags they visited most often until we see something recognizable from one of the clusters above.

Therefore, we propose the following simple method to categorize users into topics. We find the most frequently visited tags for each user from the 9 listed below and use it to categorize them.

  • Data scientist: Pandas, NumPy or Matplotlib
  • Web developers: JavaScript, Django, HTML
  • Sysadmin/DevOps: Linux, Bash or Windows.
  • None: None of the nine tags accounts for more than 5% of its traffic.

This is not very complicated, but it allows us to quickly estimate the impact of each major category on Python’s growth. We also tried more rigorous potential Dirichlet assignment methods and obtained qualitatively similar results.

What categories of Python developers have become more common over time? Note that since we categorize users, not problem access, we show them as a percentage of Stack Overflow registered visitors (whether they have access to Python or not).

We can see that over the past three years, the number of Python visitors using Web technologies or system management has grown at a slow to moderate rate among all visitors to Stack Overflow. But the share of Python developers accessing data science technology is growing rapidly. This suggests that Python’s popularity in data science and machine learning may be a major driver of its rapid growth.

We can also consider the growth in individual tag levels by counting the code traffic that Python developers visited in 2016 and 2017. For example, Javascript traffic may be stable overall, but its percentage of shrinkage comes from Python’s percentage of access developers. Once we have these growth rates for each tag, it is useful to put them in our network to understand which topics are growing and shrinking.

This helps confirm our suspicion that much of the growth in Python is related to data science and machine learning. Most of the clusters have shifted to orange, which means that these tags have begun to compensate for a larger part of the Python ecosystem.

industry

Another way we can understand the growth of the Python language is to consider what types of companies are accessed from where. This is a separate issue for the guest developer type: both retail companies and media companies can hire data scientists or web developers.

We’ll focus on two countries where Python has seen the most significant growth: the United States and the United Kingdom. In these countries, we were able to break down traffic by industry (just as we compare AWS to Azure).

The industry with the most Python traffic (which has grown substantially) is academia, which is made up of universities. Is this because Python is usually taught in undergraduate programming classes?

Some, but not all. As we saw in the previous article, Python traffic from universities is common in the summer, not just in the fall and spring. For example, Python and Java are the most visited tags for universities, and we can see differences in seasonal trends.

In a percentage, we can see that Java traffic drops more dramatically each summer because Java is a more common subject in undergraduate courses. (We’ll explore the most commonly used programming languages at universities in future posts). Python, by contrast, accounts for a much larger share of each summer’s traffic. Thus, the high flow of Python questions from universities is partly attributable to academic researchers, who typically work throughout the year. This provides more evidence that Python’s growth is due to its scientific computing and data analysis capabilities.

For other industries, we’ve seen Python’s popularity and rapid growth in government, but we can also see its widespread use in electronics and manufacturing. I’m not familiar with these industries and would be interested to know why. The language is still not taken seriously by retail or insurance companies (some surveys suggest Java still dominates).

This article investigates the reasons for Python’s development. Are Python traffic growing faster in some industries than in others?

Python’s growth over the past year has been very evenly distributed across industries, at least in the United States and the United Kingdom. In each industry, Python traffic increased by about 2-3 absolute percentage points. (Note that this is not common in industries such as insurance and retail) the relative growth has been greater.

In many of these industries, Java is still the most visited label based on traffic so far in 2017, but Python has been gaining ground. For example, in finance, one of the largest contributors to Stack Overflow traffic in these industries, Python rose from fourth most visited in 2016 to second in 2017.

conclusion

As a data scientist who used to work in Python but now works in R, should this push me back to Python?

I don’t think so. On the one hand, R has been growing rapidly, and as we saw in the last article, it is the fastest growing major programming language after Python. But second, the reason I prefer using R for data analysis has nothing to do with its relative popularity. (I plan to write a personal blog post about my own journey from Python to R, my love for both languages, and why I don’t feel compelled to switch back).

In any case, data science is an exciting and growing field, and there is plenty of room for multiple languages to thrive. My main takeaway is to encourage developers to think about building data science skills early in their careers. We see here that it is one of the fastest growing components in the software development ecosystem and is becoming relevant across many industries.

If you work in Python, for Web development, data science, or other fields, and are looking for the next step in your career, some companies are hiring Python developers on Stack Overflow Jobs right now.