For loops, iterables, iterators, and generators in Python

Note: this article is a combination of online bloggers to share the blog for compilation, if there is infringement will be directly deleted.

Question:

  • When learning about lists and dict, I encountered a common problem: how to delete a list or dict properly while iterating through it? For example, if you delete a dict while iterating through it, RuntimeError: dictionary changed size during iteration; If you’re going through a list and you delete it, some of the elements are not completely deleted. This led me to another question: how exactly do we for a list or dict when we iterate through a for loop? What is the nature of the for loop?
  • After reading about it, I realized that this is an iterator-related problem, so I took the opportunity to take a closer look at Python’s for loops, iterables, iterators, and generators

1. The iteration

“Iteration is the activity of repeating a feedback process, often to get closer to a desired goal or result.” In Python, iterables, iterators, and for loops are all closely related to iteration.

1.1 Iterable Objects Iterable

  • In Python, an iterable object is called an iterable. To determine whether a class is Iterable, we simply need to determine whether the class is an instance of Iterable:

    >>> from collections.abc import可迭代>>> isinstance([], Iterable)
    True
    >>> isinstance(123, Iterable)
    False
    Copy the code
  • This provides a way to determine if an object is iterable, but how is an object iterable if its class implements __iter__() :

    >>> class A:
        	pass
    >>> isinstance(A(), Iterable)
    False
    >>> class B:
        	def __iter__(self) :
            	pass     
    >>> isinstance(B(), Iterable)
    True
    Copy the code

    Thus, instances of a class that implements an __iter__() method are iterable. Note that the __iter__() method here can have no content.

1.2 Iterator Iterator

  • In Python, iterators are represented by the Iterator class. Iterators simply implement one more __next__() method than iterables:

    >>> from collections.abc import Iterator
    >>> class C:
            def __iter__(self) :
                pass
    
            def __next__(self) :
                pass
    >>> isinstance(C(), Iterator)
    True
    Copy the code

    Obviously, iterators must be iterables (since iterators implement both the __iter__() and __next__() methods), and iterables need not be iterators.

  • Let’s see if an iterable in a built – in type is an iterator:

    >>> isinstance(C(), Iterator)
    True
    >>> isinstance([], Iterable)
    True
    >>> isinstance([], Iterator)
    False
    >>> isinstance('123', Iterable)
    True
    >>> isinstance('123', Iterator)
    False
    >>> isinstance({}, Iterable)
    True
    >>> isinstance({}, Iterator)
    False
    Copy the code

    It follows that STR, List, and dict objects are all iterable, but none of them are iterators.

  • At this point, we have a basic conceptual understanding of iterables and iterators, as well as the __iter__() and __next__() methods. But how do these two magic methods work? What do they have to do with the for loop?

1.3 a for loop

1.3.1 Iter () method and next() method
  • The iter() and next() methods are both built-in methods provided by Python. Using iter() on an object calls its __iter__() method, and using next() on an object calls its __next__() method. Now let’s look at the relationship.
1.3.2 iter () and __iter__ ()
  • The __iter__() method returns an iterator. We can call the object’s __iter__() method using the built-in iter() function

  • The __iter__() function is passed, and the iter__() function is not iterated.

    >>> class A:
        def __iter__(self) :
            print('Execute the __iter__() method of class A')
            return B()
    
    >>> class B:
        def __iter__(self) :
            print('Execute the __iter__() method of class B')
            return self
        def __next__(self) :
            pass
    
        
    >>> a = A()
    >>> a1 = iter(a) Execute the __iter__() method of class A>>> b = B()
    >>> b1 = iter(b) execute the __iter__() method of class BCopy the code

    As you can see, for class A, we set the return value of its __iter__() method to B(), which is an iterator; For class B, we return its instance self directly in its __iter__() method because its instance is itself an iterable. Of course we could return other iterators here, but if __iter__() returns a non-iterator, then we get an error when we call iter() :

    >>> class C:
            def __iter__(self) :
                pass
    
    >>> iter(C())
    Traceback (most recent call last):
      File "<pyshell#4>", line 1.in <module>
        iter(C())
    TypeError: iter() returned non-iterator of type 'NoneType'
    
    
    >>> class D:
            def __iter__(self) :
                return []
    
    >>> iter(D())
    Traceback (most recent call last):
      File "<pyshell#8>", line 1.in <module>
        iter(D())
    TypeError: iter() returned non-iterator of type 'list'
    Copy the code
1.3.3 next () and __next__ ()
  • The __next__() method returns the next element in the loop. If there is no next element, a StopIteration exception is raised. Normally, we can call the object’s __next__() method using the built-in function next()

  • Let’s use the list object as an example to see how next iterates:

    >>> l1 = [1.2.3]
    >>> next(l1)
    Traceback (most recent call last):
      File "<pyshell#1>", line 1.in <module>
        next(l1)
    TypeError: 'list' object is not an iterator
    Copy the code

    ‘List’ object is not an iterator. The list object is not an iterator. It does not implement __next__(). So how do we “use next() on a list object” — from the __iter__() method we introduced earlier, we know that it returns an iterator that implements __next__(), so we can use iter__() on a list first, Get its corresponding iterator, and then use next() on that iterator:

    >>> l1 = [1.2.3]
    >>> l1_iter = iter(l1)
    >>> type(l1_iter)
    <class 'list_iterator'> > > >next(l1_iter1) > > >next(l1_iter2) > > >next(l1_iter3 > > >next(l1_iter)
    Traceback (most recent call last) :
      File "<pyshell#6>", line 1.in <module>
        next(l1_iter)
    StopIteration
    Copy the code
  • Consider: why does __next__() keep fetching elements and throwing exceptions at the end, rather than counting the number of calls based on the length of the object?

    I think it’s because we can manually call the object’s __next__() method through next(), and next() does not determine the length of the object, so it needs to be handled in __next__()

1.3.4 Custom Classes implement __iter__() and __next__()

Let’s try to customize the list iteration process by implementing it:

  1. First we define A class A, which is an iterable. The __iter__() method returns an iterator B() and has A member variable m_Lst:

    >>> class A:
            def __init__(self, lst) :
                self.m_Lst = lst
            def __iter__(self) :
                return B(self.m_Lst)
    Copy the code
  2. For class B of the iterator, we implement its __iter__() and __next__() methods, noting that in the __next__() method we need to raise StopIteration. In addition, it has two member variables self.m_lst and self.m_index for iterating through:

    >>> class B:
            def __init__(self, lst) :
                self.m_Lst = lst
                self.m_Index= 0
            def __iter__(self) :
                return self
            def __next__(self) :
                try:
                    value = self.m_Lst[self.m_Index]
                    self.m_Index += 1
                    return value
                except IndexError:
                    raise StopIteration()
    Copy the code
  3. Now that we have the iterator ready, let’s practice iterating. To better illustrate the process, we can add some printing:

    >>> class A:
            def __init__(self, lst) :
                self.m_Lst = lst
            def __iter__(self) :
                print('call A().__iter__()')
                return B(self.m_Lst)
            
    >>> class B:
            def __init__(self, lst) :
                self.m_Lst = lst
                self.m_Index= 0
            def __iter__(self) :
                print('call B().__iter__()')
                return self
            def __next__(self) :
                print('call B().__next__()')
                try:
                    value = self.m_Lst[self.m_Index]
                    self.m_Index += 1
                    return value
                except IndexError:
                    print('call B().__next__() except IndexError')
                    raise StopIteration()
                    
    >>> l = [1.2.3]
    >>> a = A(l)
    >>> a_iter = iter(a)
    call A().__iter__()
    >>> next(a_iter)
    call B().__next__()
    1
    >>> next(a_iter)
    call B().__next__()
    2
    >>> next(a_iter)
    call B().__next__()
    3
    >>> next(a_iter)
    call B().__next__()
    call B().__next__() except IndexError
    Traceback (most recent call last):
      File "<pyshell#5>", line 11.in __next__
        value = self.m_Lst[self.m_Index]
    IndexError: list index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<pyshell#12>", line 1.in <module>
        next(a_iter)
      File "<pyshell#5>", line 16.in __next__
        raise StopIteration()
    StopIteration
    Copy the code
  4. As you can see, the iter() and next() methods do a good job of showing the entire traversal process. Now that we know about iterables, iterators, and __iter__() and __next__(), what does a for loop have to do with them?

1.3.5 Explore the for loop
  • The for loop is one of our most frequently used operations, iterating through containers (lists, dictionaries, etc.) that have one thing in common — they are all iterable. So for our custom class A, its instance object A should also be iterated through the for loop:

    >>> for i in a:
        	print(i)
    call A().__iter__()
    call B().__next__()
    1
    call B().__next__()
    2
    call B().__next__()
    3
    call B().__next__()
    call B().__next__() except IndexError
    
    >>> for i in a:
    	    pass
    call A().__iter__()
    call B().__next__()
    call B().__next__()
    call B().__next__()
    call B().__next__()
    call B().__next__() except IndexError
    Copy the code
  • Through printing, we can clearly see: Iterate over an iterable using a for loop. The for loop calls the object’s __iter__() method to get the iterator, then calls the iterator’s __next__() method to get the next element, And finally catch StopIteration. (Try catching IndexError at the end of class B’s __next__() method without raising StopIteration, and the for loop will loop indefinitely.)

  • Since we mentioned that the for loop will automatically catch a StopIteration exception and will loop indefinitely if it does not, can we use a while loop to simulate this process?

    >>> while True:
            try:
                i = next(a_iter)
                print(i)
            except StopIteration:
                print('except StopIteration')
                break
    
    call B().__next__()
    1
    call B().__next__()
    2
    call B().__next__()
    3
    call B().__next__()
    call B().__next__() except IndexError
    except StopIteration
    Copy the code

    By now, you should have a certain understanding of for iterable traversal process, if you want to have a deeper understanding of the source code can be combined with further learning (this learning share is mainly combined with the actual code to explain some concepts, did not involve the corresponding source code).

2 the generator

Iterators and generators are often mentioned together, so what do they have to do with each other? Generators are a special kind of iterator.

2.1 Obtaining the Generator

  • When a function uses the yield keyword in its body, we call the function a generator function; When we call this generator function, Python automatically adds the __iter__() and __next__() methods to the returned object, which is a generator.

  • Code examples:

    >>> from collections.abc import Iterator
    >>> def generator() :
            print('first')
            yield 1
            print('second')
            yield 2
            print('third')
            yield 3
    
    >>> gen = generator()
    >>> isinstance(gen, Iterator)
    True
    Copy the code

2.2 Next (Generator)

  • Since generators are a special kind of iterator, let’s use the next() method on it:

    >>> next(gen)
    first
    1
    >>> next(gen)
    second
    2
    >>> next(gen)
    third
    3
    >>> next(gen)
    Traceback (most recent call last):
      File "<pyshell#19>", line 1.in <module>
        next(gen)
    StopIteration
    Copy the code

    Here I want to add a return to the generator() function, which will print the return value when an exception is thrown:

    >>> from collections.abc import Iterator
    >>> def generator() :
            print('first')
            yield 1
            print('second')
            yield 2
            print('third')
            yield 3
            return 'end'
    
    >>> gen = generator()
    >>> isinstance(gen, Iterator)
    True
    >>> next(gen)
    first
    1
    >>> next(gen)
    second
    2
    >>> next(gen)
    third
    3
    >>> next(gen)
    Traceback (most recent call last):
      File "<pyshell#7>", line 1.in <module>
        next(gen)
    StopIteration: end
    Copy the code
  • As you can see, when we use the next() method on a generator, the generator executes until the next yield and returns the value after yield; When we call next(generator) again, we continue executing until the next yield statement; When there are no yield statements at the end of the execution, StopIteration is raised

2.3 Generators and iterators

  • We learned from the above that generators are essentially iterators, but what else is special about generators besides yield? Lazy computing.

  • Lazy computing here means that when we call next(the generator), only one value is generated per call. The advantage of this is that when a large number of elements are iterated, we don’t need to fetch all of them at once, but only one of them at a time, which saves a lot of memory. (Personal understanding: Note the difference from next() in the iterator above. For iterators, although only one value is returned each time next(), we essentially store all the values in memory (such as self.m_lst of classes A and B), for generators, not all the values are stored in memory first. It gets a value every time it calls next().)

  • Let’s look at a practical example: output all even numbers up to 10000000. (Note that if you need to store in a real business environment, it will be based on the actual situation. The difference is only discussed here.)

    First we use iterators :(here we use lists directly)

    >>> def iterator() :
            lst = []
            index = 0
            while index <= 10000000:
                if index % 2= =0:
                    print(index)
                    lst.append(index)
                index += 1
            return lst
    
    >>> result = iterator()
    Copy the code

    Then use the generator to implement:

    >>> def generator() :
            index = 0
            while index <= 10000000:
                if index % 2= =0:
                    yield index
                index += 1
                
    >>> gen = generator()
    >>> next(gen)
    0
    >>> next(gen)
    2
    >>> next(gen)
    4
    >>> next(gen)
    6
    >>> next(gen)
    8
    Copy the code
  • Generators also have their disadvantages due to lazy arithmetic: for iterable objects such as list objects and dictionary objects, len() can be used to get the length directly, but for generator objects, we only know the current element, so we can’t get the length information.

  • Let’s summarize the similarities and differences between generators and iterators:

    1. A generator is a special kind of iterator;
    2. Iterators return values via return, while generators return values via yield. Using the next() method on generators pauses at each yield statement;
    3. The iterator stores all elements, but the generator uses lazy calculation and only knows the current element.

2.4 Analytic formula of generator

  • List parsing is a common form of parsing (dictionary parsing, set parsing)

    >>> lst = [i for i in range(10) if i % 2 == 1]
    >>> lst
    [1, 3, 5, 7, 9]
    Copy the code
  • Generator parses are similar to list parses; we simply replace [] with () :(hh)

    >>> gen = (i for i in range(10) if i % 2= =1)
    >>> gen
    <generator object <genexpr> at 0x00000193E2945A80>
    >>> next(gen)
    1
    >>> next(gen)
    3
    >>> next(gen)
    5
    >>> next(gen)
    7
    >>> next(gen)
    9
    >>> next(gen)
    Traceback (most recent call last):
      File "<pyshell#11>", line 1.in <module>
        next(gen)
    StopIteration
    Copy the code
  • At this point, we have two ways of creating generators:

    1. The generator function (yield) returns a generator
    2. The generator parser returns a generator

3 Problem Solving

  • Finally, back to our original question: How do I delete properly while iterating through a list or dict?

  • First let’s explore the cause of the error, using the list object as an example:

    >>> lst = [1.2.3]
    >>> for i in lst:
        	print(i)
    	    lst.remove(i)
    
    1
    3
    Copy the code

    As you can see, we delete the current element while iterating through the printed list elements, and the actual output is different from what we need. Here is my personal understanding (a more accurate answer to this question may require further combination of source code) :

    1. When removing a list element, the index of the list element changes. (This is because Python’s underlying list is implemented through arrays.)

    2. By analogy with our custom implementation of iterators, we can see that we increment the index in the __next__() method:

      >>> class A:
              def __init__(self, lst) :
                  self.m_Lst = lst
              def __iter__(self) :
                  print('call A().__iter__()')
                  return B(self.m_Lst)
              
      >>> class B:
              def __init__(self, lst) :
                  self.m_Lst = lst
                  self.m_Index= 0
              def __iter__(self) :
                  print('call B().__iter__()')
                  return self
              def __next__(self) :
                  print('call B().__next__()')
                  try:
                      value = self.m_Lst[self.m_Index]
                      self.m_Index += 1
                      return value
                  except IndexError:
                      print('call B().__next__() except IndexError')
                      raise StopIteration()
      Copy the code

      Then we can guess that the iterator corresponding to the list object should also have an index member variable, which is used to locate the object in the __next__() method.

    3. When we iterate over a list object using the for loop, we actually operate on its corresponding iterator using the next() method. The index of the list element is changed due to the call to the remove() method. So in the __next__() method, the element index that needs to be iterated over is 1, and element 3 takes that place, so the final output is 1, 3.

  • Dict is similar to list, but an error is reported when a dict element is deleted iteratively.