Introduction: Slice a series of articles written in three consecutive, this article is to do a summary of them. Why merge serial articles? Here I would like to explain that this paper is by no means a simple combination of them. It mainly corrects some serious mistakes (such as the part of custom sequence slicing), and also makes a lot of changes to the structure and chapter cohesion, so that the integrity of the structure and the quality of the content of this paper are well guaranteed.

It is well known that we can find sequence types (such as strings, lists, tuples…) by indexing values (or subscripts). So what if you want to get the elements of an index interval?

Slice is a technique for cutting index fragments, with which we can deal with sequential objects in a very flexible way. In general, slicing is used to intercept sequence objects. However, is there any way to slice non-sequence objects? What are the important points and underlying principles to pay attention to when using slicing? This article will mainly discuss these contents with you, I hope I can learn and progress with you together.

1. Basic usage of slice

Lists are one of Python’s most fundamental and important data structures, and the one that makes the most of slicing, so in the first two sections, I’ll use lists as examples of some common uses of slicing.

The first is the writing form of slice: [I: I +n: m]; Where, I is the initial index value of the slice and can be omitted if it is the first place in the list. I +n is the end position of the slice, which can be omitted if it is the end of the list. M may not be provided. The default value is 1 and cannot be 0. When m is negative, the list is flipped. Note: these values can be greater than the list length and are not reported out of bounds.

The basic meaning of slicing is: from the index of the i-th bit of the sequence to the right until the last n-bit element, filtering by m interval.

li = [1, 4, 5, 6, 7, 9, 11, 14, 16]

X >= len(li)Li] [0: X = = li [0:] = = li: [X] = = li [:] = = li [: :] = = li / - X: X = = li li [-x:] [1] = = (4, 7)# From 1, take 5-1 bit elementsLi [1:5:2] = = [4, 6]# From 1, take 5-1 bit elements, filter by 2
li[-1:] == [16] # take the reciprocal element
li[-4:-2] == [9, 11] # from the fourth to last, take -2-(-4)=2 elementsLi [:-2] == li[-len(li):-2] == [1,4,5,6,7,9,11]-len(li) =7 bits

When the step size is negative, the list is flipped and then truncatedLi] [: : - 1 = =,14,11,9,7,6,5,4,1 [16]# Flip the entire listLi [: : - 2] = =,11,7,5,1 [16]# Flip the entire list and filter by 2Li [: - 5:1] = =,14,11,9 [16]# Flip the entire list and take -5-(-len(li))=4 elementsLi [: - 5: - 3] = = [16, 9]Select -5-(-len(li))=4 digits and filter by 3

The step size of slice cannot be 0
li[::0]  ValueError: Slice step cannot be zero
Copy the code

Some of the examples above may be difficult for beginners (and even many experienced ones) to understand, but they all leave the basic syntax of slicing, so I’ve included them in the basics for convenience.

I personally have two lessons to draw from these examples:

(1) Keep the formula [I: I +n: m] in mind. When default values appear, complete the formula by imagination;

(2) When the index is negative and the step size is positive, the index position is calculated by reciprocal; When the index is negative and the step size is negative, the list is flipped and the index position is calculated in reverse.

2. Advanced use of slicing

In general, slicing returns a new, independent sequence (PS: There are exceptions, see does Python support copying Strings?). ). In the case of a list, the sliced list is still a list, occupying a new memory address.

When the result of the slice is fetched, it is a stand-alone object, so it can be used for assignment operations as well as other scenarios where values are passed. However, slicing is only a shallow copy, which copies references to elements in the original list, so when there are elements of varied-length objects, the new list is subject to the original list.

li = [1, 2, 3, 4]
ls = li[::]

li == ls # True
id(li) == id(ls) # False
li.append(li[2:4]) # [1, 2, 3, 4, [3, 4]]
ls.extend(ls[2:4]) # [1, 2, 3, 4, 3, 4]

The following example is equivalent to determining whether the length of Li is greater than 8
if(li[8:]):
    print("not empty")
else:
    print("empty")

The sliced list is subject to the original listLo = [1,1],2,3] LP = lo[:2]# (1, 1, 1)
lo[1].append(1) # [1, [1, 1, 1], 2, 3]
lp # [1, [1, 1, 1]]
Copy the code

Because it is visible, pull out the slice result and it can be used as a stand-alone object, but also be careful if the element of the variable length object is pulled out.

Slices can either be “taken out” of the original sequence as standalone objects, or they can be left in the original sequence and used as placeholders.

A while ago, I described several ways to concatenate strings (see link at the end of this article), and three of the concatenation methods for formatting classes (i.e. %, format(), and template) use the placeholder idea. For lists, using slices as placeholders can also achieve the effect of splicing lists. In particular, the assignment to the slice must be an iterable.

li = [1, 2, 3, 4]

# Splice in the header
li[:0] = [0] # [0, 1, 2, 3, 4]
# concatenate at the end[len li (li) :] = [5, 7]# [0, 1, 2, 3, 4, 5, 7]
# Splice in the middle
li[6:6] = [6] # [0, 1, 2, 3, 4, 5, 6, 7]

The slice assigned must be an iterable
li[-1:-1] = 6 # TypeError: can only assign an iterable
li[:0] = (9,) # [9, 0, 1, 2, 3, 4, 5, 6, 7]
li[:0] = range(3) # [0, 1, 2, 9, 0, 1, 2, 3, 4, 5, 6, 7]
Copy the code

In the above example, if you extract the slices as separate objects, you will see that they are empty lists, li[:0]==li[len(li):]==li[6:6]==[]. I call this placeholder “pure placeholder”. Assigning a value to a pure placeholder does not destroy the original element. New elements are concatenated only in specific index positions. Removing pure placeholders also does not affect the elements in the list.

In contrast to pure placeholders, the slice of an impure placeholder is a non-empty list, and operations on it (assignment and deletion) will affect the original list. If pure placeholders can concatenate lists, then impure placeholders can replace lists.

li = [1, 2, 3, 4]

# Substitution of different positionsLi [: 3] =,8,9 [7]# [7, 8, 9, 4]Li [3:] = [5, 6]# [7, 8, 9, 5, 6, 7]
li[2:4] = ['a'.'b'] # [7, 8, 'a', 'b', 6, 7]

# nonisometric substitutionLi, [then] = [1, 2, 3, 4]# [7, 8, 1, 2, 3, 4, 6, 7]
li[2:6] = ['a']  # [7, 8, 'a', 6, 7]

# delete element
del li[2:3] # [7, 8, 6, 7]
Copy the code

Slice placeholders can have step sizes to enable continuous leapfrog substitution or deletion. It is important to note that this usage only supports isometric substitution.

li = [1, 2, 3, 4, 5, 6]

li[::2] = ['a'.'b'.'c'] # ['a', 2, 'b', 4, 'c', 6]
li[::2] = [0]*3 # [0, 2, 0, 4, 0, 6]
li[::2] = ['w'] Attempt to assign sequence of size 1 to extended slice of size 3

del li[::2] # [2, 4, 6]
Copy the code

3, custom object slice function

Slicing is one of the most fascinating, powerful, and Amazing language features in Python (there is hardly one), and while the two sections above cover both basic and advanced uses of slicing, they are not enough to fully demonstrate its power, so in the next two sections we will focus on its more advanced uses.

The first two sections are based on native sequence types (strings, lists, tuples……) So, can we define our own sequence type and have it support slicing syntax? Further, can we customize other objects (such as dictionaries) to support slicing?

3.1. Magic method:__getitem__()

It is not difficult to make custom objects support slicing syntax, just implement the magic method __getitem__() when defining the class. So, here’s how to do it.

Syntax: Object.__getitem__ (self, key)

Official document meaning: Called to implement evaluation of self[key]. For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. For mapping types, if key is missing (not in the container), KeyError should be raised.

The __getitem__() method returns the value of the key argument, which can be an integer value and a slice object, and supports negative indexes. If the key is of either type, TypeError is thrown; If the index is out of bounds, an IndexError is thrown; If a mapping type is defined, a KeyError is thrown if the key argument is not the key value of its object.

3.2 Custom sequence to achieve the slicing function

Next, we define a simple MyList and add slicing to it. (PS: for demonstration only, the completeness of other functions is not guaranteed).

import numbers

class MyList(a):
    def __init__(self, anylist):
        self.data = anylist
    def __len__(self):
        return len(self.data)
    def __getitem__(self, index):
        print("key is : " + str(index))
        cls = type(self)
        if isinstance(index, slice):
            print("data is : " + str(self.data[index]))
            return cls(self.data[index])
        elif isinstance(index, numbers.Integral):
            return self.data[index]
        else:
            msg = "{cls.__name__} indices must be integers"
            raise TypeError(msg.format(cls=cls))

l = MyList(["My"."name"."is"."Python cat"])

###
key is : 3Python cat keyis : slice(None.2.None)
data is : ['My'.'name']
<__main__.MyList object at 0x0000019CD83A7A90>
key is : hi
Traceback (most recent call last):
...
TypeError: MyList indices must be integers or slices
Copy the code

From the output, the custom MyList supports both lookups by index and slicing operations, which is exactly what we want.

3.3. Customize the dictionary to achieve the slicing function

Slicing is a property of sequence types, so in the above example, we do not need to write the implementation logic for slicing. However, for other custom objects that are not sequence types, you have to implement the slicing logic yourself. Take the custom dictionary as an example (PS: for demonstration only, the completeness of other features is not guaranteed) :

class MyDict(a):
    def __init__(self):
        self.data = {}
    def __len__(self):
        return len(self.data)
    def append(self, item):
        self.data[len(self)] = item
    def __getitem__(self, key):
        if isinstance(key, int):
            return self.data[key]
        if isinstance(key, slice):
            slicedkeys = list(self.data.keys())[key]
            return {k: self.data[k] for k in slicedkeys}
        else:
            raise TypeError

d = MyDict()
d.append("My")
d.append("name")
d.append("is")
d.append("Python cat")
print(d[2])
print(d[:2])
print(d[4 -:2 -])
print(d['hi'])

###
is
{0: 'My'.1: 'name'}
{0: 'My'.1: 'name'}
Traceback (most recent call last):
...
TypeError
Copy the code

The key point of the above example is to take out the dictionary keys and slice the list of key values. The beauty is that, without worrying about index crossing and negative index, the dictionary slice is converted into the dictionary key value slice, finally achieving the purpose.

4. Iterator realizes slicing function

Okay, now that we’ve seen how common custom objects slice, here comes another unusual class of objects.

Iterators are a unique high-level object in Python that doesn’t have slicing capability on its own, but using them for slicing is like icing on the cake. So, this section takes a big look at how iterators implement slicing.

4.1 Iteration and iterators

First, a few basic concepts need to be clarified: iteration, iterable, iterator.

Iteration is a way of iterating over container-type objects (such as strings, lists, dictionaries, and so on). For example, when we iterate over a string “ABC”, we refer to the process of taking all of its characters one by one from left to right. (PS: Iteration in Chinese is a word that goes round and round, but in Python it’s meant to be one-way horizontal linear, and if you’re not familiar with it, I recommend just thinking of it as traversal.)

So, how do you write the instructions for iteration? The most common writing syntax is the for loop.

# for loop implements iterative process
for char in "abc":
    print(char, end="")
A, B, c
Copy the code

However, not all objects can be used in a for loop. For example, if the string “ABC” is replaced by any integer number, an error is reported: ‘int’ object is not iterable.

The word “iterable” in this error statement means “iterable”, i.e., int is not iterable. The string type is iterable, as are lists, tuples, dictionaries, and so on.

So how do you tell if an object is iterable? Why are they iterative? How do you make an object iterable?

To make an object iterable, the iterable protocol is implemented, that is, the __iter__() magic method is implemented. In other words, any object that implements the magic method is an iterable.

So how do you tell if an object implements this method? In addition to the for loop above, I know of four other methods:

# method 1: dir() looks at __iter__
dir(2)     # No, skip
dir("abc") # yes, slightly

# method 2: isinstance(
import collections
isinstance(2, collections.Iterable)     # False
isinstance("abc", collections.Iterable) # True

# method 3: Hasattr () judgment
hasattr(2."__iter__")     # False
hasattr("abc"."__iter__") # True

Method 4: use iter() to check if an error is reported
iter(2)     'int' object is not iterable
iter("abc") # <str_iterator at 0x1e2396d8f28>

### PS: Check whether it is iterable and also check whether __getitem__ is implemented. This article is omitted for convenience.
Copy the code

The most notable of these is the iter() method, which is Python’s built-in method that turns an iterable into an iterator. Iterables and iterators are two different things. (2) An iterable can become an iterator.

In fact, iterators must be iterables, but iterables need not be iterators. How much difference is there?

As shown in the blue circle above, the key differences between ordinary iterables and iterators can be summarized as follows: When an iterable object is converted to an iterator, it loses some attributes (__getitem__) and adds some attributes (__next__).

First look at the added attribute __next__. It is the key to what makes iterators iterators. In fact, it is objects that implement both the __iter__ and __next__ methods that we define as iterators.

With this extra property, iterables can iterate/traverse on their own without the need for external for loop syntax. I have invented two concepts to describe these two types of traversal (PS: for ease of understanding, it is called traversal, actually also called iteration) : it traversal refers to traversal through external syntax, and self-traversal refers to traversal through its own method.

Using these two concepts, we say that an iterable is an object that can be “traversed by it”, and an iterator is an object that can be “self-traversed” on top of that.

ob1 = "abc"
ob2 = iter("abc")
ob3 = iter("abc")

# ob1 it traverses
for i in ob1:
    print(i, end = "")   # a b c
for i in ob1:
    print(i, end = "")   # a b c
# ob1 self-traversal
ob1.__next__()  'STR' object has no attribute '__next__'

# ob2 it traverses
for i in ob2:
    print(i, end = "")   # a b c    
for i in ob2:
    print(i, end = "")   # no output
# ob2 self-traversal
ob2.__next__()  # Error: StopIteration

# ob3 self-traversal
ob3.__next__()  # a
ob3.__next__()  # b
ob3.__next__()  # c
ob3.__next__()  # Error: StopIteration
Copy the code

As can be seen from the above examples, the advantage of the iterator is that it supports self-traversal. Meanwhile, it is one-way and non-cyclic. Once traversal is completed, an error will be reported when the iterator is called again.

An example comes to mind: a normal iterable is like a bullet magazine. It iterates by taking the bullet out and putting it back in, so it can iterate over and over (that is, calling the for loop multiple times to return the same result). Iterators, on the other hand, are like guns loaded with cartridges that cannot be detachable. Traversal or self-traversal shoots bullets, which are expendable traversals that cannot be reused (i.e., traversals have an end).

Iteration is a way of iterating over elements. There are two types of implementation, external iteration and internal iteration. Objects that support external iteration (traversal) are iterables, and objects that also support internal iteration (self-traversal) are iterators. According to the consumption mode, it can be divided into reusable iteration and one-time iteration. The common iterable is reusable, while the iterator is one-time.

4.2 Iterator slicing

The last difference is that ordinary iterables lose some attributes when they are converted to iterators. The key attribute is __getitem__. In the previous section, I introduced this magic method and used it to implement the slicing feature of custom objects.

So the question is: why don’t iterators inherit this property?

First, the iterator uses consumable traversal, which means that it is full of uncertainty, i.e., its length and index key-value pairs decay dynamically, so it is difficult to get its item and the __getitem__ attribute is no longer needed. Second, it is not reasonable to impose this attribute on the iterator, just as the so-called twist is not sweet……

This raises a new question: why use iterators when such important attributes (along with other unidentified attributes) can be lost?

The answer to this question is that iterators have irreplaceable powerful and useful functionality that makes Python design them this way. For lack of space, I won’t expand on this topic here, but I will fill in the blanks later.

That’s not all, but the nagging question is: can we make iterators have this property, even if they continue to support slicing?

hi = "Welcome to the public account: Python Cat"
it = iter(hi)

# Ordinary slice
hi[7 -:] # Python cat

Counter example: iterator slicing
it[7 -:] 'str_iterator' object is not subscriptable
Copy the code

Iterators cannot use normal slicing syntax because they lack __getitem__. There are no more than two ways to realize slicing: one is to build the wheel and write the logic of implementation; The second is to find the encapsulated wheel.

Python’s Itertools module is the wheel we’re looking for, and it provides a way to easily implement iterator slicing.

import itertools

Example 1: Simple iterator
s = iter("123456789")
for x in itertools.islice(s, 2.6):
    print(x, end = "")   # output: 3, 4, 5, 6
for x in itertools.islice(s, 2.6):
    print(x, end = "")   # output: 9

Example 2: Fibonacci sequence iterator
class Fib(a):
    def __init__(self):
        self.a, self.b = 1.1

    def __iter__(self):
        while True:
            yield self.a
            self.a, self.b = self.b, self.a + self.b
f = iter(Fib())
for x in itertools.islice(f, 2.6):
    print(x, end = "")  # output: 2, 3, 5, 8
for x in itertools.islice(f, 2.6):
    print(x, end = "")  Output: 34, 55, 89, 144
Copy the code

The islice() method of the Itertools module answers the previous question by combining iterators with slicing perfectly. However, iterator slicing has many limitations compared with ordinary slicing. First, this method is not “pure function” (pure function should obey the principle of “same input, same output”). Second, it only supports forward slicing and does not support negative indexes, both due to the lossy nature of the iterator.

So, I can’t help but ask: what implementation logic does the slicing method of the IterTools module use? Below is the source code provided by the official website:

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    # index interval is [0,sys.maxsize], default step is 1
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass
Copy the code

The index direction of the islice() method is limited, but it also offers the possibility of allowing you to slice an infinite (system-supported) iterator. This is the most imaginative use scenario for the iterator slice.

In addition, iterator slicing has a very practical application scenario: reading data in a given range of lines in a file object.

As we know, there are two main ways to read content from a file (see the previous article on file reading and writing) : Read () is good for reading small amounts of content, or for processing everything at once; Readlines () is more useful because it reads iteratively, reducing memory stress and facilitating line-by-line data processing.

While readlines() has the advantage of iterative reading, it reads line by line from beginning to end, which is inefficient if the file has thousands of lines and we only want to read a few specific lines(lines 1000-1009, for example). Given that file objects are iterators by nature, we can use iterator slices to intercept first and process later, which is much more efficient.

# test.txt File contents
Python is a cat. This is the end.

from itertools import islice
with open('test.txt'.'r',encoding='utf-8') as f:
    print(hasattr(f, "__next__"))  Check if iterator is used
    content = islice(f, 2.4)
    for line in content:
        print(line.strip())
###
True
python is a cat.
this is the end.
Copy the code

The iterator is a special kind of iterable, which can be used for its traversal and self-traversal. However, the traversal process is lose-type and does not have cyclic reuse. Therefore, the iterator itself does not support slicing operation. With the help of iterTools module, we can achieve iterator slicing, combining the advantages of the two, its main purpose is to intercept large iterators (such as infinite sequence, large files, etc.) fragment, to achieve accurate processing, thus greatly improving performance and efficiency.

5, summary

To conclude, slicing is an advanced Python feature that is often, but not limited to, intercepting elements of a sequence type. This article focuses on its basic usage, advanced usage (such as placeholder usage), custom object slicing, and iterator slicing. In addition, slicing also has a wider variety of use scenarios, such as Numpy multidimensional slicing, memory view slicing, asynchronous iterator slicing, etc., are worth us to explore some, today is limited to space and cannot be detailed, welcome to follow the public account “Python cat”, we will slowly learn later.

Section Series (original single) :

Advanced Python: Pitfalls and advanced uses of slicing

Python Advanced: Slicing custom objects

Python Advanced: iterators and iterator slices

Related links:

Getitem usage: t.cn/EbzoZyp

Slice assignment source code analysis: t.cn/EbzSaoZ

The official website iterTools module: t.cn/EbNc0ot

Does Python support copying strings?

Advice from Kenneth Reitz: Avoid unnecessary object-oriented programming

A guide to reading and writing files for Python learners

Explain seven ways Python concatenates strings

—————–

This article was originally published on the wechat public account [Python Cat]. The background replies “Love learning”, and you can get 20+ selected e-books for free.