• Use Pipe Operations in Python for More Readable and Faster Coding
  • Thuwarakesh Murallie
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Z Zhaojin
  • Proofread: Jaredliw, executing cent0

How to code efficiently with Python Pipe

Pipe is a handy Python package that saves a lot of coding time and improves code readability with shell-style Pipe operations.

Python is now an elegant programming language, but that doesn’t mean there isn’t room for improvement.

Pipe is a handy Python package that takes Python’s ability to handle data to a new level. Pipe takes an SQL-like declarative approach to manipulating elements in a collection. You don’t need to write a lot of code to use it to perform filtering, conversion, sorting, de-weighting, grouping, and so on.

In this article, we’ll discuss how to use Pipe to simplify Python code. Most importantly, we will build custom pipe operations that can be reused. These custom methods can be used in future projects.

Start with the following:

  • An illuminating example;
  • Some out-of-the-box pipe operations;
  • Create a custom pipe operation.

If you don’t know how to set it up, you can easily install Pipe through PyPI. Here are the installation commands:

pip install pipe
Copy the code

Use pipes in Python

Here is an example using Pipe. Given a list of numbers, we want to do the following:

  • Remove all duplicate values;
  • Filter odd number;
  • Square each number in the list;
  • Sort values in ascending order.

Here’s what we used to do in Python typically:

num_list_with_duplicates = [1.4.2.27.6.8.10.7.13.19.21.20.7.18.27]

# remove duplicate numbers
num_list = list(dict.fromkeys(num_list_with_duplicates))

# filter odd numbers
odd_list = [num for num in num_list if num % 2= =1]

# Square the number
odd_square = list(map(lambda x: x**2, odd_list))

Order values in ascending order
odd_square.sort()

print(odd_square)
Copy the code
[1, 49, 169, 361, 441, 729]
Copy the code

The above code is very readable, but using pipes is a better approach.

from pipe import dedup, where, select, sort

num_list_with_duplicates = [1.4.2.27.6.8.10.7.13.19.21.20.7.18.27]
# Perform pipe operation
results = list(num_list_with_duplicates 
                | dedup 
                | where(lambda x: x % 2= =1)
                | select(lambda x: x**2)
                | sort
            )

print(results)
Copy the code
[1, 49, 169, 361, 441, 729]
Copy the code

Both approaches produce the same result. However, the second one is more intuitive than the first one and requires less code.

This is how Pipe helps us simplify our code. We can operate continuously on a collection without having to write separate code.

But there are cooler actions in Pipe, such as the one used in the example above. Also, custom pipes can be created if we need something unique. Let’s start with some classic, practical plumbing.

Classic practical pipe operation

We have learned a few simple pipe operations. In this section, we continue to discuss some classic and useful pipe operations to process data.

A complete list of actions is not available once Pipe is installed, but consult the GitHub repository for Pipe for details.

Group By method

I believe this is the most helpful pipeline operation for data scientists. Data scientists tend to use Pandas, but converting a list to a dataset can sometimes feel like overdoing it. In most cases, I can use this pipe operation to work with the data.

from pipe import dedup, groupby, where, select, sort

num_list = [1.4.2.27.6.8.10.7.13.19.21.20.7.18.27]

results = list(num_list
               | groupby(lambda x: "Odd" if x % 2= =1 else "Even"))print(results)
Copy the code

The above code divides the data set into odd and even groups, creating a list of two tuples. Each tuple has the name specified in the lambda function and the object to be grouped. In summary, the above code produces the following grouping:

[('Even', <itertools._grouper object at 0x7fdc05ed0310>),'Odd', <itertools._grouper object at 0x7fdc05f88700>),]Copy the code

You can now operate separately on each group you create. Here is an example of extracting elements from each group and squaring the elements.

from pipe import dedup, groupby, where, select, sort

num_list = [1.4.2.27.6.8.10.7.13.19.21.20.7.18.27]

results = list(num_list
               | groupby(lambda x: "Odd" if x % 2= =1 else "Even")
               | select(lambda x: {x[0]: [y**2 for y in x[1]]}))print(results)
Copy the code
[
    {'Even': [16, 4, 36, 64, 100, 400, 324]}, 
    {'Odd': [1, 729, 49, 169, 361, 441, 49, 729]},
]
Copy the code

Chain and Traverse methods

These two methods make it easy to expand and flatten a nested list. Chain operations are done step by step, while traversal operations are recursive until the list is no longer extended.

Here are the results of using chain:

from pipe import chain

nested_list = [[1.2.3], [4.5.6], [7[8.9]]]

unfolded_list = list(nested_list
                     | chain
                     )

print(unfolded_list)
Copy the code
[1, 2, 3, 4, 5, 6, 7, [8, 9]]
Copy the code

You can see that the outermost layer of the list has been expanded, but 8 and 9 are still in a nested list because they are already nested in the inner layer.

Here are the results with traverse:

from pipe import traverse

nested_list = [[1.2.3], [4.5.6], [7[8.9]]]

unfolded_list = list(nested_list
                     | traverse
                     )

print(unfolded_list)
Copy the code
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Copy the code

Traverse unfolds everything you can.

I mostly use list Comprehension to expand the list, but it becomes increasingly difficult to read and understand the code. Also, it’s hard to scale recursively when we don’t know how many nesting layers there are, as we did with the traversal operation in the example above.

nested_list = [[1.2.3], [4.5.6], [7[8.9]]]

unfolded_list = [num for item in nested_list for num in item]

print(unfolded_list)
Copy the code

Take_while and Skip_while methods

These two operations work similarly to the WHERE operation we used earlier. The key difference is that take_while and skip_while stop looking up other elements in the collection if certain conditions are met. While operates on each element in the list.

Here’s how take_while and WHERE perform in the simple task of filtering values less than 5:

rom pipe import as_list, take_while, where

result = list([3.4.5.3] | take_while(lambda x: x < 5))
print(f"take_while: {result}")


result2 = list([3.4.5.3] | where(lambda x: x < 5))
print(f"where: {result2}")
Copy the code

The above code results in the following:

take_while: [5, 3]
where: [3, 4, 3]
Copy the code

Note that the take_while operation skips the last 3, while the WHERE operation includes it.

Skip_while works much like take_while, except that it includes elements only if certain conditions are met.

take_while: [5.3]
where: [3.4.3]
Copy the code
[5, 3]
Copy the code

As mentioned earlier, this is not all you can do with a pipeline library. See the repository for more built-in functions and examples.

Create a custom pipe operation

Creating a new Pipe operation is relatively easy, just using the Pipe class to annotate the function.

In the following example, we convert Python functions into pipe operations. It takes an integer as input and returns its square value.

from pipe import Pipe


@Pipe
def sqr(n: int = 1) :
    return n ** 2


result = 10 | sqr
print(result)
Copy the code

When we annotate the function with the @pipe class, it becomes a Pipe operation. In line 9, we use it to square the number.

Additional parameters can also be used for pipe operations. The first parameter is always the output of its last operation in the chain. We can have additional parameters and specify them when used in the chain.

The extra argument can even be a function.

In the following example, we create a pipe operation that accepts an additional parameter, which is a function. Our pipe operation is to transform each element in the list using a function.

from typing import List
from pipe import Pipe, as_list, select


def fib(n: int = 1) :
    Recursive creation of Fibonacci numbers ""
    return n if n < 2 else fib(n-1)+fib(n-2)


@Pipe
def apply_fun(nums: List[int], fun) :
    "" apply any function to a list element and create a new list. ""
    return list(nums | select(fun))


result = [5.10.15] | apply_fun(fib)


print(result)
Copy the code

This paper summarizes

It is impressive to see how Python can be further improved.

As a practicing data scientist, I find Pipe useful for a wide range of everyday tasks. We can also use Pandas for most of the tasks. Pipe, however, does a great job of improving code readability so that even novice programmers can understand this data conversion.

It’s important to note here that I haven’t used Pipe in large-scale projects or explored its performance on large-scale datasets and pipelines, but I believe the package will play an important role in offline data analysis.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.