R or Python?
What a conundrum!
If you specialize in data analysis, statistical modeling, and visualization, R is probably the right choice for you. But if you want to do some deep learning, whole natural language processing, you really need Python.
If you’re in an intersection, you’ll probably need to switch between the two languages. As a result, writing a for loop with a bug is not uncommon. Call the police!
You’re not the only one facing this dilemma! The most recent Software survey of KDnuggets Analytics ranked Python and R among the top two data science and machine learning software.
If you really want to improve your data science skills, you really should learn both.
But here’s the good news!
RStudio has developed a package called Reticulate. With installation packages, you can now run Python installation packages and functions on R
Today’s Reticulate bag will teach you how to use it.
Install and load the Reticulate package
Run the following command to install the package and import it into your system.
Reticulate package install. Reticulate package install. Reticulate package (reticulate)Copy the code
Check if Python is installed on your system
py_available()Copy the code
The return value is TRUE or FALSE. If TRUE is returned, congratulations, your system already has Python. If FALSE, you need to install Python first.
Import a Python module in R
You can use the function import () to import specific packages or modules.
OS < - import (OS) OS $getcwd ()Copy the code
The command above returns to the working directory.
[1]”C:\\Users\\DELL\\Documents”
You can use the listdir () function in the OS package to see all the files in the working directory.
Copy the code
os$listdir()Copy the code
Install Python package
Step 1: Create a new work environment;
Copy the code
- reticulate conda_create (" r ")Copy the code
Step 2: Install “R-Reticulate” and “Numpy” in conda;
- reticulate conda_install (" r ", "numpy")Copy the code
If “Numpy” is already installed, you do not have to install the package again. The above code is just an example.
Step 3: Load the package.
Numpy < - import (" numpy ")Copy the code
Use numpy arrays
Start by creating a simple NUMpy array
y <- array(1:4, c(2, 2))
x <- numpy$array(y)
[,1] [,2]
[1,] 1 3
[2,] 2 4Copy the code
Transpose the array
numpy$transpose(x)
[,1] [,2]
[1,] 1 2
[2,] 3 4
Copy the code
Find the eigenroots and eigenvectors
numpy$linalg$eig(x)Copy the code
Some mathematical functions
numpy$sqrt(x)
numpy$exp(x)Copy the code
Use Python interactively
You can create an interactive Python console in R. Objects you create in Python can be used in R (and vice versa). You can make Python and R interact by using the repl_python () function. First, download the data set used in the following program:
Travel = pd.read_excel(" air.xlsx ") # Show the number of rows and columns in the datafile Travel.sample (n = 10) # Select travel.groupby(" Year ").air.mean () # select travel.loc[(travel.month >=) 6) & (travel.Year >= 1955),:] # return to R exitCopy the code
Note: You need to type “exit” to return to the R session
How do I get objects created in Python from R
You can use Py Object to get objects in Python.
summary(py$t)Copy the code
In this case, I use R’s summary () function and access the dataset T created in Python. In addition, you can use the GGplot2 package to draw line charts.
Library (ggploT2) ggplot(py$t, AES (AIR, Year)) + geom_line()Copy the code
How do I get objects created in R from Python
You can use r Object to solve this problem.
Create an object in R:
mydata = head(cars, n=15)Copy the code
Call the object previously created in R in the Python REPL:
repl_python()
import pandas as pd
r.mydata.describe()
pd.isnull(r.mydata.speed)
exit
Copy the code
Use sklearn package to build Logistic regression model
The Sklearn package is one of the most popular machine learning packages in Python, supporting a variety of statistical and machine learning algorithms.
Repl_python () # Load package from sklearn import datasets from sklearn.linear_model import LogisticRegression # Load database iris = Model = LogisticRegression() model.fit(iris.data, Ir.target) # Actual = Ir.target predicted = model.predict(ir.data) # Model performance comparison matrix print(metrics.classification_report(actual, predicted)) print(metrics.confusion_matrix(actual, predicted))Copy the code
Other useful functions
Check the python configuration
Run the py_config () command to see what version of R is installed on your system. It also displays details about Anaconda and Numpy.
py_config()Copy the code
Check whether a package is installed
To check whether pandas is installed, run the following command:
Py_module_available (" pandas ")Copy the code
The original article was published on April 19, 2018
Author: Abstract bacteria
This article is from “Big Data Digest”, a partner of the cloud community. You can pay attention to “Big Data Digest” for relevant information.