Plan & edit | Natalie
Solicitation | Natalie, Vincent
AI Front Line introduction:Yes, you read that right. Two of the most popular programming languages in data science — Python and R — are actually working together. In what may turn out to be the most ambitious crossover event of the year, URSA LABS announced that R and Python are working together to make it easier for data scientists using different programming languages to collaborate and avoid too much duplication among developers of different languages. One netizen commented on the collaboration: “This is really a historic moment”, while another joked that “the two slowest languages are working together”. What do you think?






Please pay attention to the wechat official account “AI Front”, (ID: AI-front)

Ladies and folks, the two heavyweights in the battle for the best programming languages in data analytics, Python and R, have announced a deal. Make! !

If you’re in the field of data analysis, you’ve probably encountered or heard of this dilemma: which language is better for data analysis, R or Python?

The “best programming language” debate is a long-running one, and data science is no exception. Questions like “I want to learn machine learning, which programming language should I use” or “I want to solve problems quickly, should I use R or Python” pop up all the time. Although both languages are currently leaders in the data analysis community, they are still Mired in a debate over which is the preferred programming language for data scientists.

But, guys, from now on, you can forget about it.

The most powerful programming language in data science

Before we talk about the collaboration, let’s take a look at the history of the two programming languages and their history in the programming world.

All men are mortal

Ross Ihaka and Robert Gentleman created the open source language R based on THE S language in 1995 to focus on providing a better and more user-friendly way to do data analysis, statistics, and graphical modeling.

Initially R was primarily used in academia and research, but more recently industry has also discovered its benefits, making it one of the world’s fastest growing statistical languages used in business.

Python, on the other hand, was created by Guido van Rossem in 1991 to emphasize efficiency and readability of code. Programmers who want to do in-depth data analysis or apply statistical techniques are the primary users of Python.

The more you need to work in an engineering environment, the more you’ll like Python. It’s a flexible language that does well with new things, readability and simplicity, and a flat learning curve.

enmity

How do these two programming languages fall apart? Let’s start with machine learning and data analysis.

The differences between machine learning and data analysis are a little hard to pin down, but the main difference is that machine learning places more emphasis on prediction accuracy than model interpretability; Data analysis emphasizes interpretability and statistical inference.

As a result, Python is a great tool for machine learning because of its emphasis on predicting results accurately. As a statistical inference-oriented programming language, R has been widely used in data analysis.

However, this does not define the sex of the two, except their respective fields of work. In fact, Python can also do data analysis efficiently, and R has some flexibility in machine learning. Each of them has a number of libraries to implement each other’s specific functions. Python, for example, has many libraries to improve its statistical inference ability, and R has many packages to improve prediction accuracy.

Because of these features, the debate on the web over which Python and R are the best has not been settled, even in academic and industry usage has been catching up.

However, with the popularity of ARTIFICIAL intelligence in recent years, more and more schools have introduced Python programming courses, and the enterprises transforming artificial intelligence have also increased their investment in Python. According to Stack Overflow’s developer survey 2018, Python has risen in the rankings, passing C# this year just as it passed PHP last year. Python insists on being “the fastest growing major programming language”, while R is far behind.

It seems that the popularity gap is getting bigger and bigger, some people jokingly called them a red net in the season, a “red net.” But some tech bigwigs were fed up with the debate over who was the best programming language, and came up with an idea: Why not let them work together?

Who initiated the collaboration?

Tech gurus Hadley Wickham and Wes McKinney are behind the collaboration.

Hadley Wickham is the most important developer of the programming language R, and Wes McKinney is one of the most important developers of the programming language Python.

Wes McKinney founded the Pandas Project in 2008, pandas is an open source BSD-licensed library that provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language (yes, The library you use super often in Python. In addition, he wrote Python for Data Analysis, which helped popularize the use of Python in Data science. He is a member of the Apache Software Foundation, a PMC member of Apache Arrow and Apache Parquet, and a former CEO and co-founder of DataPad.

Hadley Wickham is the creator of many of the R packages most widely used in data science, such as GGplot2, Dplyr, and many others. He has written several books on the R language, such as R for Data Science and Advanced R. Hadley is lead scientist at RStudio and technical advisor to Ursa Labs on R language support and general API design and usability.

Last month, McKinney announced the creation of USRA LABS, an innovative organization dedicated to improving data science tools. The Python/R collaboration is a collaboration between USRA LABS and RStudio, the company that Wickham is working on, which currently maintains R’s most popular user interface. The main goal of the USRA lab is to make it easier for data scientists using different programming languages to collaborate and avoid too much rework by developers of different languages. In addition to improving R and Python, they hope their work will improve the user experience of other open source programming languages, such as Java and Julia.

Both Python and R are free to use and are often considered competitors in data science. But Wickham and McKinney agree that this kind of competition is unnecessary. In fact, they think they can work together to make the two programming languages more useful to millions of users.

Python and R go hand-in-hand

Python and R are essential tools for data scientists, researchers, and data journalists working at tech companies such as Google, Facebook, and others. One of the biggest headaches for developers is that collaboration is often difficult when their colleagues are using a different programming language. URSA LABS will make it easier to share data and code with speakers of other data science languages by creating new standards for all programming languages. Developers are touting the move as an improvement on “interoperability.” Wickham and McKinney have worked together to create a file format that works in both Python and R.

Wickham and McKinney say that in addition to making collaboration easier, there was another reason they embarked on this collaborative project: developers using different programming languages repeatedly solve the same problem without sharing the lessons they learned.

Wickham gives you an example, in every programming language, the developer has to be able to calculate averages. This is very simple for the user and requires only one line of code in both Python and R. But finding the best way to perform calculations in a single line of code is a tricky problem for language developers. Developers of BOTH R and Python tend to solve this problem with C++ and C, which are great for developers but tricky for the average user. Wickham argues that, ideally, if a developer of one programming language has found the best way to do something, that method should be applied to all other languages. This is the core mission of the USRA Lab.

Wickham and McKinney add that in addition to addressing technical issues, the collaboration also aims to bring different programming language communities together peacefully. They argue that the more people who use these languages collaborate, the better the field of data science will develop.

“I hope this collaboration ends the pointless battle between R and Python,” Wickham says, “because both languages are great.”

References:

http://wesmckinney.com/blog/announcing-ursalabs/

https://qz.com/1270139/r-and-python-are-joining-forces-in-the-most-ambitious-crossover-event-of-the-year-for-programmers /