In this article, yield and Generator are introduced in detail, including: what generator, how to generate generator, features of generator, basic and advanced application scenarios of Generator, and considerations for using Generator. This article does not cover enhanced Generator pep342, which will be covered in a later blog post.

The generator base

In a Python function definition, whenever a yield expression occurs, you are actually defining a generator function. Calling this generator function returns a generator. For example:

def gen_generator() :
    yield 1

def gen_value() :
    return 1
    
if __name__ == '__main__':
    ret = gen_generator()
    print ret, type(ret)    #<generator object gen_generator at 0x02645648> <type 'generator'>
    ret = gen_value()
    print ret, type(ret)    # 1 <type 'int'>
Copy the code

As you can see from the above code, the gen_generator function returns an instance of a generator with the following specialities:

  • Follow the iterator protocol, which implements the __iter__, next interface
  • Can enter multiple times, multiple returns, can suspend the execution of the code in the function body

Let’s look at the test code:

>>> def gen_example() :
.    print 'before any yield'
.    yield 'first yield'
.    print 'between yields'
.    yield 'second yield'
.    print 'no yield anymore'.>>> gen = gen_example()
>>> gen.next() # first callnext
before any yield
'first yield'
>>> gen.next() # second callnext
between yields
'second yield'
>>> gen.next() # third callnext
no yield anymore
Traceback (most recent call last):
  File "<stdin>", line 1.in <module>
StopIteratio
Copy the code

Calling the gen_example method yields nothing, indicating that the code for the function body has not yet started executing. When the generator’s next method is called, the generator executes to the yield expression, returns the contents of the yield expression, and then suspends (suspends) there, so the first call to Next prints the first sentence and returns “first yield”. Pause means that the local variables, pointer information, and runtime environment of the method are saved until the next method is called. A second call to next suspends at the last yield, and a second call to the next() method raises StopIteration.

Because the for statement automatically catches the StopIteration exception, a more common method for generator (essentially any iterator) is to use it in a loop:

def generator_example() :
    yield 1
    yield 2

if __name__ == '__main__':
    for e in generator_example():
        print e
        # output 1 2
Copy the code

Generator Function What is the difference between a generator and a regular function?

  • Function runs from the first line each time, while Generator runs from the last yield
  • Function calls return one (set of) value at a time, whereas Generator can return multiple times
  • Function can be called numerous times, and a Generator instance cannot continue to be called after yield the last value or return

One way to generate a generator is to use Yield in a function and then call that function. Another common way to use generator expression is to use () instead of list derivations. For example:

>>> gen = (x * x for x in xrange(5))
>>> print gen
<generator object <genexpr> at 0x02655710>
Copy the code

The generator application

Generator basic Application

The most important reason for using generators is that you can generate and “return” results on demand, rather than generating all the return values at once, and sometimes “all the return values” are not known at all. For example, for the following code

RANGE_NUM = 100
for i in [x*x for x in range(RANGE_NUM)]: # First method: Iterate over the list
    # do sth for example
    print i

for i in (x*x for x in range(RANGE_NUM)): # Second method: Iterate over the Generator
    # do sth for example
    print i
Copy the code

In the above code, the output of the two for statements is the same, and the code is literally the difference between a brace and a brace. This difference is quite significant, however, as the first method returns a list and the second method returns a generator object. As RANGE_NUM becomes larger, the list returned by the first method becomes larger and consumes more memory; But there is no difference for the second method.

Let’s look at another example that can “return” an infinite number of times:

def fib() :
    a, b = 1.1
    while True:
        yield a
        a, b = b, a+b 
Copy the code

The generator has the ability to generate an infinite number of “return values”, so the user can decide when to stop iterating

Generator Advanced Applications

Usage Scenario 1:

The Generator can be used to generate a stream of data. The Generator does not generate a return value immediately, but does not generate a return value until it is needed. This is equivalent to an active pull process. But we can provide a common, on-demand data flow.

def gen_data_from_file(file_name) :
    for line in file(file_name):
        yield line

def gen_words(line) :
    for word in (w for w in line.split() if w.strip()):
        yield word

def count_words(file_name) :
    word_map = {}
    for line in gen_data_from_file(file_name):
        for word in gen_words(line):
            if word not in word_map:
                word_map[word] = 0
            word_map[word] += 1
    return word_map

def count_total_chars(file_name) :
    total = 0
    for line in gen_data_from_file(file_name):
        total += len(line)
    return total

if __name__ == '__main__':
    print count_words('test.txt'), count_total_chars('test.txt')
Copy the code

The above example comes from a talk at PyCon in 2008. Gen_words gen_DATA_FROm_file is the data producer, and count_words count_total_chars is the data consumer. As you can see, data is only pulled when needed, not prepared in advance. In addition, gen_words (w for w in line.split() if w.trip ()) also generates a generator

Usage Scenario 2:

In some programming scenarios, one thing might execute part of the logic, then wait for some time, or wait for some asynchronous result, or wait for some state, and then proceed to execute another part of the logic. For example, in A microservice architecture, service A performs A piece of logic, then goes to service B to request some data, and then continues the execution on service A. Or in game programming, a skill can be divided into several parts, where you perform one action (effect), wait a while, and then continue. For this kind of waiting without blocking, we usually use a callback. Here’s a simple example:

def do(a) :
    print 'do', a
    CallBackMgr.callback(5.lambda a = a: post_do(a))

def post_do(a) :
    print 'post_do', a
Copy the code

Here, CallBackMgr registers a time interval of 5 seconds, and then calls the lambda function after 5 seconds. You can see that a piece of logic is split into two functions, and context passing is required (as in the argument a here). Let’s modify this example with yield, where the yield returns the time to wait.

@yield_dec
def do(a) :
    print 'do', a
    yield 5
    print 'post_do', a
Copy the code

Understanding of this place:

  • Callback mode: In the above program, when we execute the do function, we set a 5s timer when we execute the CallBackMgr, and then execute the post_DO function in the callback mode (anonymous function) after 5s to simulate the segmented execution of a function.
  • Yeild mode: we execute the do function, in the execution of the Yeild, the program suspended, and in yeild_dec for the yeILD registration, is also started a 5S timer, and then 5s after the suspension of the program to continue to execute, the execution of the post_DO function, to achieve the effect of segmentalized execution.

You need to implement a YieldManager, register the Generator Do into the YieldManager using the yield_dec decrator, and call the next method 5 seconds later. The Yield version implements the same functionality as the callback, but looks much cleaner. Here is a simple implementation for your reference:

# -*- coding:utf-8 -*-
import sys
# import Timer
import types
import time

class YieldManager(object) :
    def __init__(self, tick_delta = 0.01) :
        self.generator_dict = {}
        # self._tick_timer = Timer.addRepeatTimer(tick_delta, lambda: self.tick())

    def tick(self) :
        cur = time.time()
        for gene, t in self.generator_dict.items():
            if cur >= t:
                self._do_resume_genetator(gene,cur)

    def _do_resume_genetator(self,gene, cur ) :
        try:
            self.on_generator_excute(gene, cur)
        except StopIteration,e:
            self.remove_generator(gene)
        except Exception, e:
            print 'unexcepet error'.type(e)
            self.remove_generator(gene)

    def add_generator(self, gen, deadline) :
        self.generator_dict[gen] = deadline

    def remove_generator(self, gene) :
        del self.generator_dict[gene]

    def on_generator_excute(self, gen, cur_time = None) :
        t = gen.next()
        cur_time = cur_time or time.time()
        self.add_generator(gen, t + cur_time)

g_yield_mgr = YieldManager()

def yield_dec(func) :
    def _inner_func(*args, **kwargs) :
        gen = func(*args, **kwargs)
        if type(gen) is types.GeneratorType:
            g_yield_mgr.on_generator_excute(gen)

        return gen
    return _inner_func

@yield_dec
def do(a) :
    print 'do', a
    yield 2.5
    print 'post_do', a
    yield 3
    print 'post_do again', a

if __name__ == '__main__':
    do(1)
    for i in range(1.10) :print 'simulate a timer, %s seconds passed' % i
        time.sleep(1)
        g_yield_mgr.tick()
Copy the code

Understanding of this place:

We manually built a Yeild manager, where we look at the implementation method:

  • __init__ method: defines a generator dictionary
  • Tick: find the current time, then we traverse our generator dictionary, when our present time is greater than the generator dictionary cut-off time, indicating that our generator’s suspension time is over, resume execution, execute the look-see _do_resume_genetator method
  • _do_resume_genetator: Call on_generator_excute to execute the generator. When we encounter the StopIteration exception, we remove the generator from our generator dictionary. After other error types output errors, we remove the generator from the generator dictionary
  • Add_generator: Adds a key generator to the generator dictionary where value is the end of time
  • Remove_generator: Removes a key generator from the generator dictionary
  • On_generator_excute: Execution generator

For our understanding of decorators, we first execute our function to determine if it is a generator, and if it is a generator type, we execute the generator directly by executing the on_generator_excute method in our Yeild manager.

The information execution process of the program is as follows:

Matters needing attention:

Yield cannot be nested!

def visit(data) :
    for elem in data:
        if isinstance(elem, tuple) or isinstance(elem, list):
            visit(elem) # here value retuened is generator
        else:
            yield elem

if __name__ == '__main__':
    for e in visit([1.2, (3.4), 5) :print e
Copy the code

The code above accesses each element in the nested sequence, and the expected output is 1, 2, 3, 4, 5. The actual output is 1, 2, 5. Why, as the comment shows, visit is a generator function, so line 4 returns a Generator object, and the code does not iterate over the generator instance. Just change the code and iterate over the temporary generator.

def visit(data) :
    for elem in data:
        if isinstance(elem, tuple) or isinstance(elem, list) :for e in visit(elem):
                yield e
        else:
            yield elem
Copy the code

Or in python3.3 you can use yield from, which was added in pep380

def visit(data) :
    for elem in data:
        if isinstance(elem, tuple) or isinstance(elem, list) :yield from visit(elem)
        else:
            yield elem
Copy the code

Use return in generator function

In the Python Doc, it is explicitly mentioned that a return can be used, and when the Generator executes here it raises a StopIteration exception.

def gen_with_return(range_num) :
    if range_num < 0:
        return
    else:
        for i in xrange(range_num):
            yield i

if __name__ == '__main__':
    print list(gen_with_return(-1)) # []
    print list(gen_with_return(1))  # [0]
Copy the code

However, a return in a generator function cannot have any return value

def gen_with_return(range_num) :
    if range_num < 0:
        return 0
    else:
        for i in xrange(range_num):
            yield i
Copy the code

SyntaxError: ‘return’ with argument inside generator