R or Python?

What a conundrum!

If you specialize in data analysis, statistical modeling, and visualization, R is probably the right choice for you. But if you want to do some deep learning, whole natural language processing, you really need Python.

If you’re in an intersection, you’ll probably need to switch between the two languages. As a result, writing a for loop with a bug is not uncommon. Call the police!

You’re not the only one facing this dilemma! The most recent Software survey of KDnuggets Analytics ranked Python and R among the top two data science and machine learning software.

If you really want to improve your data science skills, you really should learn both.

But here’s the good news!

RStudio has developed a package called Reticulate. With installation packages, you can now run Python installation packages and functions on R

Today’s Reticulate bag will teach you how to use it.

Install and load the Reticulate package

Run the following command to install the package and import it into your system.

Reticulate package install. Reticulate package install. Reticulate package (reticulate)Copy the code

Check if Python is installed on your system

py_available()Copy the code

The return value is TRUE or FALSE. If TRUE is returned, congratulations, your system already has Python. If FALSE, you need to install Python first.

Import a Python module in R

You can use the function import () to import specific packages or modules.

OS < - import (OS) OS $getcwd ()Copy the code

The command above returns to the working directory.

[1]”C:\\Users\\DELL\\Documents”

You can use the listdir () function in the OS package to see all the files in the working directory.

Copy the code
os$listdir()Copy the code



Install Python package

Step 1: Create a new work environment;

Copy the code

- reticulate conda_create (" r ")Copy the code

Step 2: Install “R-Reticulate” and “Numpy” in conda;

- reticulate conda_install (" r ", "numpy")Copy the code

If “Numpy” is already installed, you do not have to install the package again. The above code is just an example.

Step 3: Load the package.

Numpy < - import (" numpy ")Copy the code

Use numpy arrays

Start by creating a simple NUMpy array

y <- array(1:4, c(2, 2))

x <- numpy$array(y)



[,1] [,2]
[1,] 1 3
[2,] 2 4Copy the code

Transpose the array

numpy$transpose(x)



[,1] [,2]
[1,] 1 2
[2,] 3 4
Copy the code

Find the eigenroots and eigenvectors

numpy$linalg$eig(x)Copy the code

Some mathematical functions

numpy$sqrt(x)

numpy$exp(x)Copy the code

Use Python interactively

You can create an interactive Python console in R. Objects you create in Python can be used in R (and vice versa). You can make Python and R interact by using the repl_python () function. First, download the data set used in the following program:

Travel = pd.read_excel(" air.xlsx ") # Show the number of rows and columns in the datafile Travel.sample (n = 10) # Select travel.groupby(" Year ").air.mean () # select travel.loc[(travel.month >=) 6) & (travel.Year >= 1955),:] # return to R exitCopy the code

Note: You need to type “exit” to return to the R session

How do I get objects created in Python from R

You can use Py Object to get objects in Python.

summary(py$t)Copy the code

In this case, I use R’s summary () function and access the dataset T created in Python. In addition, you can use the GGplot2 package to draw line charts.

Library (ggploT2) ggplot(py$t, AES (AIR, Year)) + geom_line()Copy the code

How do I get objects created in R from Python

You can use r Object to solve this problem.

Create an object in R:

mydata = head(cars, n=15)Copy the code

Call the object previously created in R in the Python REPL:

repl_python()

import pandas as pd

r.mydata.describe()

pd.isnull(r.mydata.speed)

exit
Copy the code

Use sklearn package to build Logistic regression model

The Sklearn package is one of the most popular machine learning packages in Python, supporting a variety of statistical and machine learning algorithms.

Repl_python () # Load package from sklearn import datasets from sklearn.linear_model import LogisticRegression # Load database iris = Model = LogisticRegression() model.fit(iris.data, Ir.target) # Actual = Ir.target predicted = model.predict(ir.data) # Model performance comparison matrix print(metrics.classification_report(actual, predicted)) print(metrics.confusion_matrix(actual, predicted))Copy the code

Other useful functions

Check the python configuration

Run the py_config () command to see what version of R is installed on your system. It also displays details about Anaconda and Numpy.


py_config()Copy the code




Check whether a package is installed

To check whether pandas is installed, run the following command:

Py_module_available (" pandas ")Copy the code


The original article was published on April 19, 2018

Author: Abstract bacteria

This article is from “Big Data Digest”, a partner of the cloud community. You can pay attention to “Big Data Digest” for relevant information.