\
If you plan to work in data science, such as data analysis or data mining, or if you’re like me, you’re currently working in a related field, then “chain call” is a required course for us.
Why chain calls?
Chain invocation, or Method Chaining, is literally Code’s way of stringing together a series of operations or function methods.
I first sensed the “beauty” of chain calls using the R language’s pipe operators.
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarise(meanOfdisp = mean(disp)) %>%
ggplot(aes(x=as.factor(cyl), y=meanOfdisp, fill=as.factor(seq(1.3))))+
geom_bar(stat = 'identity') +
guides(fill=F)
Copy the code
For R User, it’s easy to understand what the process steps are for this piece of code. It all starts with the symbol %>% (the pipe operator).
With the pipe operator, we can pass the left thing to the next thing. Here I pass the MTCARS data set to the group_by function, and then pass the results to the summarize function, and finally to the GGplot function for visual drawing.
If I hadn’t learned how to make chain calls, I would have written something like this when I first learned R:
library(tidyverse)
cyl4 <- mtcars[which(mtcars$cyl==4), ]
cyl6 <- mtcars[which(mtcars$cyl==6), ]
cyl8 <- mtcars[which(mtcars$cyl==8), ]
data <- data.frame(
cyl = c(4.6.8),
meanOfdisp = c(mean(cyl4$disp), mean(cyl6$disp), mean(cyl8$disp))
)
graph <- ggplot(data=data, aes(x=factor(cyl), y=meanOfdisp,
fill = as.factor(seq(1.3))))
graph <- graph + geom_bar(stat = 'identity') + guides(fill=F)
graph
Copy the code
If I didn’t use the pipe operator, I would have done unnecessary assignments and overwritten the original data object, but the resulting cyl# and data are actually only serving graph, so the problem is that the code becomes redundant.
Chain calls improve the readability of the code while simplifying it to a great extent, allowing you to quickly see what you’re doing at each step. This approach is very useful for data analysis or processing, reducing the need to create unnecessary variables and allowing a quick and easy way to explore.
You can see chain calls or pipe operations in a lot of places, but I’ll give you two examples outside of R.
One is a Shell statement:
echo "`seq 1 100`" | grep -e "^ [3-4]. *" | tr "3" "*"
Copy the code
Used in shell statement “|” pipeline operators can quickly achieve chain calls, here I first is to print all the integer 1-100, then introduced to grep method, extracted by 3 or 4 at the beginning of all parts, then this part introduced to tr method, and the digital contains three parts are replaced with an asterisk. The results are as follows:
The other is the Scala language:
object Test {
def main(args: Array[String]): Unit = {
val numOfseq = (1 to 100).toList
val chain = numOfseq.filter(_%2= =0).map(_ *2)
.take(10)}}Copy the code
In this example, the variable numOfseq contains all integers from 1 to 100, and then from the chain part, I first call the filter method based on numOfseq to filter the even numbers, and then call the map method, Multiply these filtered numbers by 2, and finally use the take method to extract the first 10 numbers from the newly formed numbers, which are collectively assigned to the chain variable.
You can get an idea of what chain calls are like, but once you get the hang of it, it will not only make a difference to your code style, but also make a difference to your programming mind.
Chain calls in Python
A simple chain call in Python is done by building a classmethod and returning either the object itself or the owning class (@classmethod).
class Chain:
def __init__(self, name):
self.name = name
def introduce(self):
print("hello, my name is %s" % self.name)
return self
def talk(self):
print("Can we make a friend?")
return self
def greet(self):
print("Hey! How are you?")
return self
if __name__ == '__main__':
chain = Chain(name = "jobs")
chain.introduce()
print("-"*20)
chain.introduce().talk()
print("-"*20)
chain.introduce().talk().greet()
Copy the code
Here we create a Chain class, passing a name string argument to create the instance object; There are three methods in this class, introduce, talk, and greet.
Since self is returned each time, we can continuously call methods in the object’s owning class with the result:
hello, my name is jobs
--------------------
hello, my name is jobs
Can we make a friend?
--------------------
hello, my name is jobs
Can we make a friend?
Hey! How are you?
Copy the code
Use chain calls in Pandas
The Series and DataFrame methods will be returned by the API, so we can simply call the Series and DataFrame methods. Here I take the video data of Huannong Brothers STATION B as an example when I gave a case demonstration to others around February this year. You can get it from the link.
The data field information is as follows, which contains 300 data and 20 fields:
Field information
However, before using this part of data, we still need to perform preliminary cleaning on this part of data. Here, I mainly select the following fields:
- Aid: indicates the AV number corresponding to the video
- Comment: Indicates the number of comments
- Play: Indicates the play volume
- Title: the title
- Video_review: number of rounds
- Created: indicates the upload date
- Length: indicates the video length
1. Data cleaning
The corresponding values of each field are as follows:
The field values
From the data, we can see:
title
In front of the field will have “Huannong brothers” four words, if the title of the word count need to be removed in advance;created
The upload date appears as a long string of values, but it’s actually a timestamp from 1970 to the present, which we need to process into readable year, month and day;length
The playback length is only displayed in minutes and seconds, but hours are not completed with “00”, so we need to complete them on the one hand and convert them to the corresponding time format on the other hand
The chain call operation is as follows:
import re
importDef word_count(text):return len(re.findall(r"[\u4e00-\u9fa5]", text))
tidy_data = (
pd.read_csv('~/Desktop/huanong.csv')
.loc[:, ['aid'.'title'.'created'.'length'.'play'.'comment'.'video_review']]
.assign(title = lambda df: df['title'].str.replace("Brother Huannong:".""),
title_count = lambda df: df['title'].apply(word_count),
created = lambda df: df['created'].pipe(pd.to_datetime, unit='s'),
created_date = lambda df: df['created'].dt.date,
length = lambda df: "00:" + df['length'],
video_length = lambda df: df['length'].pipe(pd.to_timedelta).dt.seconds
)
)
Copy the code
Create a new column assign (loC). If the new column name is the same as the original column name, the assign method will overwrite the new column.
1. The title and title_count:
- The original
title
Because fields are strings, they can be called directly and convenientlystr.*
Method to process, here I will directly call thereplace
Methods The characters of “Huannong Brothers:” were cleaned - Based on the cleaning
title
Field and then use it on the fieldapply
Method, which passes the word count function defined by our previous implementation, for each record in the title, to belong to\u4e00
to\u9fa5
All Unicode Chinese characters in this range are extracted and the length is calculated
2. Created and created_date:
- The original
created
Field calls apipe
Method, which takes thecreated
Field pass inpd.to_datetime
Parameter, here we need to setunit
The time unit is set tos
Seconds are required to display the correct time, otherwise it will still display in Unix time error style - Based on processed
created
Field, which we can belong to bydatetime64
Pandas provides a convenient API method for retrieving times for Pandasdt.*
To get the value of the property
3. The length and video_length:
- The original
length
Field we’ll just let string00:
Concatenate the field directly for the next conversion - integration-based
length
Time string, we call againpipe
Method to implicitly pass the field as a parameter topd.to_timedelta
Method, and then do the same thingcreate_date
Get the corresponding attribute value, here I take the number of seconds.
2. Viewing trend chart
Based on the tidy_data obtained after a bit of cleaning, we can quickly explore the trend of play volume. Here we need to use the “created” field of dateTime64 as the X-axis and the “play” field as the Y-axis for visualization.
# matplotlib inline %config inlinebackend. figure_format ='retina'
import matplotlib.pyplot as plt
(tidy_data[['created'.'play']]
.set_index('created')
.resample('1M')
.sum()
.plot(
kind='line',
figsize=(16.8),
title='Video Play Prend(2018-2020)',
grid=True,
legend=False
)
)
plt.xlabel("")
plt.ylabel('The Number Of Playing')
Copy the code
Here, after selecting upload date and playback amount, we need to set “created” as the index before using resample ressampling method for aggregation operation. Here, we count granularity by month, sum up the playback amount of each month, and then call plot interface to realize visualization.
One of the tricks of a chain call is that you can use the contiguous scope of parentheses to make the entire operation of the chain call error-free. If you don’t like this, you can also manually append a \ symbol to each operation, so the entire operation looks like this:
tidy_data[['created'.'play']] \
.set_index('created') \
.resample('1M')
.sum()
.plot( \
kind='line', \
figsize=(16.8), \
title='Video Play Prend(2018-2020)', \
grid=True, \
legend=False \
)
Copy the code
But it’s not recommended or elegant to append a \ symbol to a pair of parentheses.
But if neither the parenthesis scope nor the \ symbol is appended, the Python interpreter will report an error at run time.
3. Chain call performance
We can see from the first two cases that the chain call can be said to be more elegant and fast to achieve a set of data operation process, but the chain call will also be written in different ways and there are performance differences.
Here we continue with the tidy_data operation, and here we sum the play, comment, and video_review values based on created_date to further log base 10. Finally, the following results are needed:
Statistical table
Writing method 1: General writing method
General writing
This method is based on the tidy_data copy operation, operation results will overwrite the original data object
Notation 2: chain call notation
Chain call notation
As can be seen, the chain call method is a little faster than the general method, but due to the small amount of data, so the time difference between the two is not big; But chained calls have less memory overhead because they do not require additional intermediate variables to override the write step.
Conclusion: the pros and cons of chain calls
As you can see from the snippings of this article, chain calls make code much more readable while doing as much with as little code as possible.
Of course, chain calls are not perfect, and they have some drawbacks. For example, when the method of chain call is more than 10 steps, the probability of error is greatly increased, resulting in difficult debugging or debugging. Like this:
(data
.method1(...)
.method2(...)
.method3(...)
.method4(...)
.method5(...)
.method6(...)
.method7(...) # Something Error
.method8(...)
.method9(...)
.method10(...)
.method11(...)
)
Copy the code
You can only reproduce where the problem occurred step by step “from the tail” of the chain call method body.
Therefore, the following questions must be considered when using chained calls:
- Whether intermediate variables are needed
- Whether steps in the operation data need to be decomposed
- Check whether the result is still DataFrame after each operation
If you don’t need intermediate variables, you don’t need to decompose the steps, and you’re guaranteed to return a DataFrame at the end, then happily use the chain call method to complete your data flow.
Author: 100gle, a non-serious liberal arts student who has been practicing for less than two years, likes to type code, write articles and do various new things; Now he is engaged in the related work of big data analysis and mining.
Appreciate the author
Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the ministry, tsinghua university, Peking University, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, such as Google, Microsoft, government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.
Recommended reading:
2020Python Recruitment promotion channel now open! \
Old driver teaches you to read Python decorator in 5 minutes \
Using Python to implement particle swarm optimization \
Bargain-hunting us stocks? Use Python to analyze the real return on U.S. stocks
▼ clickBecome a community member and click on itIn the see