Python is by far the most widely used and fastest growing programming language. Python’s elegant and concise syntax, strong third-party library support, and so on are the reasons why Python is popular in many industries. What you may not know, however, is that there’s more to Python than that.
Let’s start with the rise of big data in recent years to explain the real reason for Python’s popularity.
Depressed big data programmers
With the rise of big data, most industries find themselves in a state of panic: they spend a lot of time and money building their big data channels, but their return on investment is low. In the never-ending race to scoop up ever more data, most companies don’t have a clear plan for what to do with the data they scoop up. At the time, almost everyone thought that with a large data store, subsequent analysis would be easier and the business value of the data would become obvious. It may sound silly today, but most people still believe that if you get enough data, patterns and information will emerge.
The “data scientist” called by The Times
Then, almost at the same time, the industry woke up to the need for rigorous mathematical analysis and validation of all the great insights it wanted and the questions it wanted answered. SQL queries can give you the most obvious patterns and trends in the data, but to extract the most useful information from the data, you need an entirely different set of techniques — a set of skills firmly rooted in mathematics and applied mathematics. The talent for this kind of skill seems to exist only in academia. In addition, the people responsible for analyzing these huge data sets need not only a very strong mathematical background, but also the ability to write software. That’s why the job of “data scientist” is popping up so often on job boards.
Ruby vs Python: The “Web Development Language War”
Taking a step back, before big data really took off, Ruby and Python were locked in a fierce battle to become the most popular “Web development language.” Both are ideal for developing Web applications. Ruby’s popularity is closely tied to the Rails framework. Back then, most people who called themselves “Ruby programmers” should have just called themselves “Rails programmers.” Python is well established in academia and a few different industries. The closest Python equivalent to Rails is Django, which seems to lag far behind in popularity despite predating Rails.
Many people believe that Python and Ruby are so similar in performance that ultimately only one language will win the “Web development language war.” But in reality, Ruby’s popularity is closely related to Rails, and Django represents only a small part of an already active Python ecosystem. The “Web development language wars” have also proved to be far less important than people expected. Even though Ruby has won the battle with Rails in many ways, Python is the most popular language today. Why is that?
Oliphant’s big move
To solve this mystery, we have to mention one big name, Travis Oliphant. All the way back to 2006. At the time, Travis Oliphant was an assistant professor at BYU and had yet to start Anaconda, one of the most successful commercial data science platforms based entirely on Python. A year ago, he developed NumPy with reference to the scientific computing library Numeric. He went on to become the founder of SciPy and head of PSF.
In 2006, along with Carl Banks, he submitted PEP 3118, a revision of Python’s “buffer protocol.” This foreshadowed the rise of Python.
Python’s buffering protocol: The first reason Python is so popular around the world
The buffer protocol is (and still is) a very low-level API used by other libraries to manipulate memory buffers directly. These are buffers created and used by the interpreter to store certain types of data in contiguous memory (initially, primarily “array-like” data structures whose data types and sizes were given in advance).
The main motivations for providing such an API are to eliminate the need to copy data when read only, clarify the semantics of transfer of ownership in buffers, and store data in contiguous storage (even in the case of multidimensional data structures) where read access is very fast. The “other libraries” that will use the API are generally written in C and are performance-sensitive. This new protocol means that if I create a NumPy int array, other libraries can directly access the underlying memory buffer, rather than indirectly accessing or copying the data before using it.
The question now arises: what type of programmer would benefit from fast, zero-copy data capture?
Data scientists, of course!
Let’s go over the course of events:
-
Oliphant and Banks proposed a revision of Python’s buffering protocol to simplify direct access to the underlying memory of some of the data structures driven by the initial NumPy project work.
-
PEP 3118 (https://www.python.org/dev/peps/pep-3118/) to submit, approved, implemented.
-
Thanks to the implementation of PEP 3118, Python has quietly become a very attractive compiled language. On this basis, many digital computing libraries based on C language extensions have been developed (note: C language extensions can easily implement data sharing and manipulation).
-
Python and Ruby are slugging it out on the Web, and most people think the “Web development language war” will be resolved.
-
As the price of magnetic storage devices has plummeted, it has become feasible to store vast amounts of data for later analysis (because data has become so cheap, it is better to keep it in storage first, without even thinking about what to analyze).
-
The demand for a new generation of programmers is changing: those with a background in statistics, preferably applied mathematics, and some prior programming experience are being snapped up — the age of the data scientist is here!
-
Data scientists were looking for a language that was both expressive and fast (with a good numerical library to boot), and all of these needs were directed to Python
Later, as we have seen, Python became the most popular programming language.
By Jeff Knupp
Wu Lei, Huo Jing
Jeff Knupp
Source: 51 cto