start

Fluent Python is an advanced python book that covers basically advanced Python usage. The basics are probably enough for a novice python learner, but often enough makes them forget to go deeper and become proficient. We want to get a full understanding of the language’s capabilities. Some of the advanced features may not be available right away, so this book is for those who have the spare time to read it after work, so I’ll put together the useful and sophisticated advanced content.

There are 21 chapters in this book, and the arrangement is based on these chapters.

Chapter 1: The Python data Model

This section focuses on Python’s magic methods, which are often named with two underscores (such as __init__, __lt__, __len__). These special methods are meant to be called by the Python interpreter, and they are registered with the collection of methods in their type, which provides a shortcut for CPython. The speed of these methods is faster than ordinary methods, of course, in their own do not know the use of these magic methods, do not add at will.

There are two representations of strings, __str__ and __repr__. Python’s built-in function repr uses __repr__ to get the string representation of an object. This is more common in interactive mode, if __repr__ is not implemented, when the console prints an object it is usually < an object at 0x000>. __str__ is used by the STR () function, or is called when the print function prints an object, which is end-user friendly.

There is another difference between the two. In string formatting, “%s” corresponds to __str__. And “%r” corresponds to __repr__. __str__ and __repr__ are recommended in use, the former is for end users, and the latter is more convenient for us to debug and log.

More special methods: docs.python.org/3/reference…

Chapter 2: Arrays of sequences

This section is an introduction to sequences, focusing on some advanced uses of arrays and tuples.

Sequences can be classified according to the type of data they contain:

  • Container sequence: List, tuple, and collections.deque can hold different types of data
  • Flat sequence: STR, bytes, bytearray, memoryView, and array.array, which can contain only one type.

According to whether it can be modified, it can be divided into:

  • The variable sequence: list, bytearray, array.array, collections.deque and memoryView
  • Immutable sequence: tuple, STR, and bytes

The list of deduction

List derivation is a more readable and efficient shortcut to building lists.

For example, the example of turning a string into a list of Unicode code points, in general:

Symbols = '$£¥€¤' Codes = [] for symbol in codes.append(ord(symbol))Copy the code

Derive using lists:

Symbols = '$¢£¥€' Codes = [ORD (symbol) for symbol in symbols]Copy the code

Use list inferences to create a list, use them as often as possible, and keep them short.

Cartesian product and generator expression

Generator expressions can output elements one by one, saving memory. Such as:

>>> colors = ['black', 'white']
>>> sizes = ['S', 'M', 'L']
>>> for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
... print(tshirt)Copy the code

The example has fewer list elements, so if you had two lists with 1000 elements each, it would be obvious that the combined Cartesian product would be a list of 1 million elements, which would take up a lot of memory, but generator expressions can help eliminate the overhead of the for loop.

To be a tuple

Tuples are often used as a representation of immutable lists. The numeric index usually just fetches elements, but it can also name them:

>>> from collections import namedtuple >>> City = namedtuple('City', 'Name country population coordinates') >>> Tokyo = City('Tokyo', 'JP', Tokyo City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667)) >>> Tokyo. Population 36.933 >>> Tokyo.coordinates (35.689722, 139.691667) >>> Tokyo [1] 'JP'Copy the code

slice

In the list, 0 is used as the subscript of the first element. Slicing can extract a fragment according to the subscript.

Take the form s[a:b:c] and evaluate s with c intervals between a and b. C can also be negative, negative means the opposite.

>>> s = 'bicycle'
>>> s[::3]
'bye'
>>> s[::-1]
'elcycib'
>>> s[::-2]
'eccb'Copy the code

Chapter three: Dictionaries and collections

Dict is not only widely used in programs, it is also a cornerstone of the Python language. Because of the importance of dict, Python is highly optimized for its implementation, most importantly because the underlying “hash table” set, like dict, relies on hash tables for its implementation.

A hash table is also called a hash table, and for dict, its key must be a hashable data type. What is a hashable data type? The official explanation is:

If an object is hashed, its hash value remains constant throughout the life of the object

And this object needs to implement the __hash__() method. Hashed objects also need to have a __qe__() method so they can be compared to other keys. If two hash objects are equal, their hash value must be the same…

STR, bytes, frozenset, and values are all hashed types.

Dictionary derivation

DIAL_CODE = [(86, 'China'), (91, 'India'), (7, 'Russia'), (81, 'Japan'),] code for code, country in DIAL_CODE} print(country_code) ''' OUT: {'China': 86, 'India': 91, 'Russia': 7, 'Japan': 81} ' ' 'Copy the code

Defaultdict: An option to handle keys that cannot be found

We also want to get a default value when a key is not in the map. This is defaultdict, which is a dict subclass and implements the __missing__ method.

import collections
index = collections.defaultdict(list)
for item in nums:
    key = item % 2
    index[key].append(item)Copy the code

Dictionary variants

In the Collections module of the standard library, there are different mapping types besides DefaultDict:

  • OrderDict: This type preserves the order when adding keys, so the keys are always iterated in the same order
  • ChainMap: This type can hold several different mapping pairs. During a key search, these objects are searched as a whole until the key is foundpylookup = ChainMap(locals(), globals())
  • Counter: This mapping type prepares an integer technician for the key and increments this Counter each time a key is added, so this type can be used to count hash objects or as multiple sets.
import collections
ct = collections.Counter('abracadabra')
print(ct)   # Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
ct.update('aaaaazzz')
print(ct)   # Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
print(ct.most_common(2)) # [('a', 10), ('z', 3)]Copy the code
  • UserDict: This class is simply standard dict rewritten in pure Python
import collections
class StrKeyDict(collections.UserDict):
    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]
        
    def __contains__(self, key):
        return str(key) in self.data
        
    def __setitem__(self, key, item):
        self.data[str(key)] = itemCopy the code

Immutable mapping type

The types module’s MappingProxyType can be used to make the mapping between keys and values immutable:

from types import MappingProxyType
d = {1:'A'}
d_proxy = MappingProxyType(d)
d_proxy[1]='B' # TypeError: 'mappingproxy' object does not support item assignment

d[2] = 'B'
print(d_proxy) # mappingproxy({1: 'A', 2: 'B'})Copy the code

D_proxy is dynamic, meaning that any changes made to D are fed back to it.

Set theory

The essence of a collection is an aggregation of many unique objects. Therefore, sets can be used for de-weighting. The elements in the set must be hashed, but the set itself is not hashed, whereas the Frozenset itself is hashed.

Collections are unique, and at the same time they implement many of the basic infix operators. Given two sets a and b, a | b to return back to their collection, a & b is the intersection of the a – b got the difference set.

Proper use of these features can not only reduce the amount of code, but also increase operational efficiency.

S = {CHR (I) for I in range(23, 45)} s = {CHR (I) for I in range(23, 45)}Copy the code

Chapter 4: Text and byte sequences

This chapter discusses text strings and byte sequences, as well as some encoding conversions. The STR discussed in this chapter refers to python3.

Character problem

A string is a relatively simple concept: a string is a sequence of characters. However, there are many different definitions of “character”, among which the best definition of “character” is Unicode character. Thus, elements obtained from STR objects in PYTHon3 are Unicode characters.

The process of converting a code point into a sequence of bytes is called encoding:

> > > s = 'cafe' > > > len (s) 4 > > > b = s.e ncode (' utf8) > > > b b 'caf \ xc3 \ xa9' > > > len (b) 5 > > > b.d ecode (' utf8) # 'cafeCopy the code

Code points can be considered human-readable text, while character sequences can be considered more machine-friendly. It’s also easy to distinguish between.decode() and.encode(). From byte sequence to human can understand text is decode. And human can understand into human difficult to understand byte sequence is encode.

Byte profile

Python3 has two byte sequences, the immutable bytes type and the mutable bytearray type. Each element in the byte sequence is an integer between [0, 255].

Dealing with coding issues

Python comes with over 100 codecs. Each codec has a name, and some even have aliases, such as UTF_8, UTF-8, and U8.

If the character sequence does not match expectations, it is easy to throw Unicode*Error when decoding or encoding. This error occurs because a character is not defined in the target code. Here is how to solve this problem.

  • Using Python3, Python3 avoids 95% of character problems.
  • Mainstream encoding attempts: LATin1, CP1252, CP437, GB2312, UTF-8, UTF-16LE
  • Pay attention to BOM headerb'\xff\xfe'Utf-16 encoded sequences also start with these extra bytes.
  • Find the encoding of the sequence and recommend using itcodecsThe module

Normalizing Unicode strings

S1 = 'cafe' s2 = 'caf\u00e9'Copy the code

These two lines of code are exactly equivalent. One thing to avoid is that sequences such as E and E \ U0301 are called “standard equivalents” in the Unicode standard. This case uses NFC to form the equivalent string using the least number of code points:

> > > s1 = 'cafe' > > > s2 = 'cafe \ u0301' > > > s1, s2 (' cafe ', 'cafe') > > > len (s1), len (s2) (4, 5) > > > s1 = = s2 FalseCopy the code

After the improvement:

>>> from unicodedata import normalize >>> s2 = 'cafe\u0301' # split into 'e' and accent >>> len(s1), len(s2) (4, 5) >>> len(normalize('NFC', s1)), len(normalize('NFC', s2)) (4, 4) >>> len(normalize('NFD', s1)), len(normalize('NFD', s2)) (5, 5) >>> normalize('NFC', s1) == normalize('NFC', s2) True >>> normalize('NFD', s1) == normalize('NFD', s2) TrueCopy the code

Unicode text sort

For strings, the code point for comparison. So when it comes to non-ASCII characters, the results can be disappointing.

Chapter 5: first-class functions

In Python, functions are first-class objects. Programming languages define “first-class objects” as satisfying the following conditions:

  • Created at run time
  • Can assign values to variables or elements in data structures
  • Can be passed to functions as arguments
  • Can be returned as a result of a function

In Python, integers, strings, lists, and dictionaries are first-class objects.

Think of functions as objects

Python can be programmed both functionally and object-oriented. Here we create a function, then read its __doc__ attribute and determine that the function object is actually an instance of the function class:

def factorial(n):
    '''
    return n
    '''
    return 1 if n < 2 else n * factorial(n-1)

print(factorial.__doc__)
print(type(factorial))
print(factorial(3))

'''
OUT

    return n

<class 'function'>
6
'''Copy the code

Higher-order functions

A higher-order function is one that takes a function as an argument or returns a result. For example, map, Filter, and Reduce.

For example, when calling sorted, len is passed as an argument:

fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']
sorted(fruits, key=len)
# ['fig', 'apple', 'cherry', 'banana', 'raspberry', 'strawberry']Copy the code

Anonymous functions

The lambda keyword is used to create anonymous functions. Anonymous functions have some limitations. The body of an anonymous function can only use pure expressions. In other words, there are no assignments inside lambda functions, and statements like while and try cannot be used.

fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']
sorted(fruits, key=lambda word: word[::-1])
# ['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']Copy the code

Callable object

In addition to user-defined functions, the call operator () can be applied to other objects. If you want to determine whether an object can be called, you can use the built-in callable() function. There are seven types of python data models that can be called:

  • User-defined functions: created using DEF statements or lambda expressions
  • Built-in functions: such as Len
  • Built-in methods: such as dict.get
  • Method: a function in the body of the class definition
  • class
  • Class instance: if the class is defined__call__, then its instance can be called as a function.
  • Generator functions: useyieldKeyword function or method.

From location parameters to keyword only parameters

Are mutable and keyword arguments:

def fun(name, age, *args, **kwargs):
    passCopy the code

Where *args and **kwargs are both iterables that map to a single parameter when expanded. Args is a tuple and kwargs is a dictionary.

Chapter 6: Implementing design patterns using first-class functions

While design patterns are language-independent, this does not mean that every pattern can be used in every language. Of the 23 patterns in The book Design Patterns: The Foundation of Reusable Object-oriented Software co-authored by Gamma et al., 16 are “missing or simplified” in dynamic languages.

There are no examples of design patterns here, because the patterns in the book are not commonly used.

Chapter 7: Function decorators and closures

Function decorators are used to “mark” functions in source code to enhance their behavior in some way. This is a powerful work

Yes, but to master it, you must understand closures.

Modifiers and closures are often discussed together because modifiers are a form of closure. Closures are also the foundation of callback asynchronous programming and functional programming styles.

Decorator basics

A decorator is a callable object whose argument is another function (the function being decorated). A decorator might process the decorated function and then return it, or replace it with another function or callable.

@decorate
def target():
    print('running target()')Copy the code

This is exactly the same as writing:

def target():
    print('running target()')
target = decorate(target)Copy the code

Decorators are syntactic sugar, which actually treat functions as arguments for other functions to process. Decorators have two main characteristics:

  • Replace the decorated function with another function
  • The decorator executes immediately when the module is loaded

To understand executing immediately, look at the equivalent code. Target = target (target) calls this function. Normally decorator functions will take a function as a return value.

Variable scope rules

To understand the scope of variables in decorators, you need to understand closures, and I thought it would be better to switch the order of closures and scopes in the book. In Python, the search order for a variable is LEGB (L: Local Local environment, E: Enclosing closure, G: Global, B: built-in).

base = 20
def get_compare():
    base = 10
    def real_compare(value):
        return value > base
    return real_compare
    
compare_10 = get_compare()
print(compare_10(5))Copy the code

In the closure real_compare, the variable base used is actually base = 10. Because the base variable can be hit in the closure, it doesn’t need to be retrieved in global.

closure

Closures are actually pretty straightforward, but it’s when anonymous functions come in that makes this part harder to master. The short and simple explanation for closures is:

The result of a namespace bundled with a function is called a closure.

This namespace is the E in LEGB. So closures do more than just take functions as return values. Instead, you bundle the namespace with the function as the return value. How many people forget to understand the bundling, where the variable ends up. Ah.

Decorators in the standard library

Python has three built-in functions for decorating methods: Property, classMethod, and StaticMethod. These are used to enrich classes.

class A(object):
    @property
    def age():
        return 12Copy the code

Chapter 8: Object references, Variability, and garbage Collection

Variables are not boxes

A lot of people think of variables as boxes, and you just throw whatever data you want into the box.

A = b = [1, 2, 3] a.a ppend (4) print (b) # [1, 2, 3, 4]Copy the code

Variables A and B refer to the same list, not a copy of that list. Therefore, an assignment statement should be understood as a reference to a variable and value.

Identity, equality, and aliases

To determine whether a and b are references to the same value, use is:

> > > a = b = (4 and 6) > > > c = (4 and 6) > > > is a True b > > > x is False cCopy the code

If two variables refer to the same object, we usually say that the variable is an alias of another variable.

Choosing the == operator between == and is determines whether two object values are equal. Is is an alias used to determine whether two variables refer to the same object, or whether two variables are identical. Is does not care about the value of the object. In terms of usage, == is used more frequently, while is is executed faster.

Shallow replication is performed by default

L1 = [3, [55, 44], (7, 8, 9)] l2 = list(l1) # >>> l2 == l1 True >>> l2 is l1 FalseCopy the code

Although L2 is a copy of L1, the process of copying is first (that is, the outermost container is copied, and the elements in the copy are references to elements in the source container). Therefore, l1[1] will also change when operating l2[1]. If all the elements in the list are immutable, there is no such problem, and memory is saved. However, if mutable elements are present, they can cause unexpected problems.

The Python standard library provides two tools: copy and deepCopy. For shallow copy and deep copy:

import copy
l1 = [3, [55, 44], (7, 8, 9)]

l2 = copy.copy(l1)
l2 = copy.deepcopy(l1)Copy the code

When a function argument is referenced

Function arguments in Python use shared arguments. Shared parameter passing means that each formal parameter of a function gets a copy of each reference in the argument. That is, the parameters inside the function are aliases of the arguments.

The scheme is that when the parameter passed in is a mutable object, the modification of the parameter inside the function is the modification of the external mutable object. Attempts to reassign a parameter to a new object do not work, because this is merely a reference to something else, and the old object does not change. That is, arguments cannot replace one object with another within a function.

Do not use mutable types as default values for arguments

Parameter defaults are a great feature. For developers, avoid using mutable objects as parameter defaults. If the argument defaults to a mutable object and changes its contents, subsequent function calls will be affected.

Del and garbage collection

In Python, when an object loses its last reference, it is treated as garbage and then reclaimed. Although Python provides del statements to delete variables. However, simply removing a reference between a variable and an object does not necessarily make the object reclaim, since the object may have other references.

In CPython, garbage collection mainly uses a reference-counting algorithm. Each object counts how many references refer to it. When the reference count goes to zero, meaning the object is not in use, the object is destroyed immediately.

Objects that conform to the Python style

Thanks to the Python data model, custom types can behave just as naturally as built-in types. So natural

Behavior, rather than inheritance, is duck typing: we simply implement the methods required by the object according to the predetermined behavior.

Object representation

Every object-oriented language has at least one standard way of getting a string representation of an object. Python provides two ways to do this.

  • repr(): returns a string representation of an object in a way that is easy for developers to understand.
  • str(): returns a string representation of an object in a way that the user can understand.

Classmethod and staticmethod

Both of these decorators are built into Python and are not mentioned in most Python tutorials. Both are used in class definitions, where functions defined in a class are bound to instances of its class. These two decorators can change this call.

Let’s start with ClassMethod. This decorator is not a method that operates on an instance and takes the class itself as its first argument. The StaticMethod decorator also changes how methods are called. It’s just a normal function,

The difference between classMethod and staticMethod is that classMethod passes in the class itself as the first argument, and everything else is the same.

Take a look at some examples:

>>> class Demo:
... @classmethod
... def klassmeth(*args):
...     return args
... @staticmethod
... def statmeth(*args):
...     return args
...
>>> Demo.klassmeth()
(<class '__main__.Demo'>,)
>>> Demo.klassmeth('spam')
(<class '__main__.Demo'>, 'spam')
>>> Demo.statmeth()
()
>>> Demo.statmeth('spam')
('spam',)Copy the code

Formatted display

The built-in format() function and str.format() method delegate each type of formatting to the corresponding.__format__(format_spec) method. Format_spec is the format specifier, which is:

  • format(my_obj, format_spec)The second argument to
  • str.format()Method format string, {} replaces the field after the colon

Python’s private and “protected” properties

There is no modifier like private for instance variables to create private attributes. In Python, there is a simple mechanism for handling private attributes.

class A:
    def __init__(self):
        self.__x = 1

a = A()
print(a.__x) # AttributeError: 'A' object has no attribute '__x'

print(a.__dict__)Copy the code

If the attribute is prefixed with two underscores of __name and ends with at most one underlined instance attribute, Python will prefix its name with an underscore and the class name and put it in __dict__, which in __name becomes _A__name.

Name rewriting is a kind of security measure, but it’s not foolproof. It prevents accidental access, but it doesn’t stop intentional wrongdoing.

Private attributes can be read and overwritten directly by anyone who knows how they work. So many Python programmers have a strict rule: use an underscore to mark an object’s private attributes. The Python interpreter does not make a special case for attribute names that use a single underscore, leaving it up to the programmer to access those attributes outside the class. This method is also recommended, the two underline method is not used. To quote the Great Python god:

Never use two leading underscores, it’s annoying and selfish. If you are concerned about name conflicts, you should specify them

Do use a name-rewriting method (such as _MyThing_blahblah). This is the same as using double underlining, but the rules are easier to understand than double underlining.

Python calls properties marked with an underscore prefix “protected”

useslotsClass attributes save space

By default, Python uses a __dict__ dictionary to store instance attributes in individual instances. Therefore, the attributes of an instance change dynamically and can be added at any time during run time. Dictionaries are memory-consuming structures. Therefore, using __slots__ saves memory when an object’s attribute name is specified.

Class Vector2d: __slots__ = ('__x', '__y') typecode = 'd' #Copy the code

The purpose of defining the __slots__ attribute in a class is to tell the interpreter: “All instance attributes in this class are here!” In this way, Python uses tuple-like structures to store instance variables in individual instances, avoiding the memory-draining __dict__ attribute. This can save a lot of memory if you have millions of instances active at the same time.

Chapter 10: Sequence modification, hashing, and slicing

Protocol and duck type

In Python, sequence types do not need inheritance, just methods that conform to the sequence protocol. The protocol here is to implement __len__ and __getitem__ methods. Any class that implements these two methods satisfies sequence operations because it behaves like a sequence.

Protocols are informal and not enforced, so you know exactly how classes will be used, usually just to implement part of a protocol. For example, to support iteration, only the __getitem__ method needs to be implemented, not the __len__ method, which explains the duck type:

When you see a bird that walks like a duck, swims like a duck, quacks like a duck,

Then the bird could be called a duck

Sliceable sequence

A Slice is an element that captures a range of elements in a sequence. Slicing is also done by __getitem__ :

Class Vector: # omitted many lines #... Def __len__(self): return len(self._components) # def getitem__(self, index): CLS = type(self) # if isinstance(index, slice) Call the constructor of Vector if index is a sliced object. Return CLS (self._components[index]) elif isinstance(index, numbers.Integral): return CLS (self._components[index]) elif isinstance(index, numbers. Return self._components[index] # We slice the array else: If (CLS = int (int), int (int), int (int), int (int), int (int), int (int), int (int)Copy the code

Dynamic access attribute

Obtain attributes by accessing component names:

Shortcut_names = 'xyzt' def getattr__(self, name): CLS = type(self) # if len(name) == 1: Pos = cls.shortcut_names.find(name) if 0 <= pos < len(self._components): return self._components[pos] msg = '{} objects has no attribute {}' raise AttributeError(msg.format(cls, name)) test = Vector([3, 4, 5]) print(test.x) print(test.y) print(test.z) print(test.c)Copy the code

Hashing and fast equivalence testing

Implement the __hash__ method. This, combined with the existing __eq__ method, turns the instance into a hashed object.

When the sequence is multidimensional, we have a more efficient method:

def __eq__(self, other): if len(self) ! Return False for a, b in zip(self, other): # if a! = b: return False return True # def __eq__(self, other): return (len(self) == len(other)) and all(a == b for a, b in zip(self, other))Copy the code

Chapter 11: Interfaces: From protocols to Abstract base classes

These protocols, defined as informal interfaces, are a way for programming languages to be polymorphic. In Python, there is no interface keyword, and every class except the abstract base class has an interface: all classes can implement __getitem__ and __add__ themselves.

There are rules for writing, such as naming protected attributes with a single leading underscore, and some coding conventions, which are slowly developed by programmers.

The protocol is an interface, but it is not formal, these provisions are not mandatory, and a class may only implement part of the interface, which is allowed.

If there are informal agreements, are there formal agreements? Yes, an abstract base class is a mandatory protocol.

An abstract base class requires that its subclasses implement a defined interface, and that the abstract base class cannot be instantiated.

Interfaces and protocols in the Python culture

Python was very successful before the introduction of abstract base classes, and even now they are rarely used. With duck types and protocols, we define protocols as informal interfaces, which are a way to make Python polymorphic.

On the other hand, don’t feel bad about putting public data properties into an object’s interface. If you need to, you can always implement read and set methods to make data properties properties. The object exposes the method itself, allowing the object to play a specific role in the system. Thus, an interface is a collection of methods that implement a particular role.

The sequence protocol is one of Python’s most basic protocols, and the interpreter handles it responsibly even if an object implements only the most basic part of that protocol.

Waterfowl and abstract base class

Duck types are useful in many situations, but as they evolve, they often evolve in a better way.

In modern times, genera and species have been classified according to phenotypic systematics. Anatidae belongs to waterfowl, and waterfowl also includes goose, swan goose, etc. Waterfowl is a consistent category of performance, and they have some uniform “description” sections.

Therefore, according to the classification evolution, there needs to be a waterfowl type. As long as CLS is an abstract base class, i.e. the metaclass of CLS is ABC.ABCMeta, isinstance(obj, CLS) can be used to determine.

Abstract base classes have many theoretical advantages over classes. Registered classes must meet the requirements of abstract base classes for methods and signatures, and more importantly, for the underlying semantic contracts.

Abstract base classes in the standard library

Most of the library’s abstract base classes are defined in the collections. ABC module. A few have abstract base classes in numbers and IO packages. There are two ABC modules in the library, and only collections.abc will be discussed here.

There are 16 abstract base classes defined in this module.

Iterable, Container, and Sized collections should inherit these three abstract base classes or at least implement compatible protocols. Iterable supports iteration through the __iter__ method, Container supports the in operator through the __contains__ method, and Sized supports the len() function through the __len__ method.

Sequence, Mapping, and Set are the main immutable Set types, and each has a mutable subclass.

In Python3, the objects returned by the mapping methods.items(),.keys(), and.values() are instances of ItemsView, KeysView, and ValuesView, respectively. The first two classes also inherit rich interfaces from the Set class.

The two abstract base classes Callable and Hashable don’t have much to do with collections, except that collections. ABC was the first module in the standard library to define an abstract base class, and they were so important that they were included in the collections. ABC module. I’ve never seen a subclass of Callable or Hashable. The main purpose of these two abstract base classes is to provide support for the built-in isinstance function to determine in a safe way whether an object can be called or hashed.

Iterator Note that it is a subclass of Iterable.

Chapter 12: Advantages and disadvantages of inheritance

Many people feel that multiple inheritance is not worth the cost, and programming languages that don’t support multiple inheritance don’t seem to lose either.

Subclassing built-in types is cumbersome

Before Python2.2, built-in types (such as list, dict) could not be subclassed. They cannot be inherited by other classes because built-in types are implemented in C and do not call user-defined class-override methods.

CPython has no official rules about whether methods overridden by subclasses of built-in types can be called implicitly. Basically, methods of built-in types do not call methods overridden by subclasses. For example, dict subclasses override __getitem__ methods that don’t override built-in get() method calls.

Multiple inheritance and method resolution order

Any language that implements multiple inheritance deals with potential naming conflicts caused by unrelated ancestor classes implementing methods of the same name. This conflict is called the “diamond problem”, as shown in the figure.

Python iterates through inheritance diagrams in a specific order. This Order is called Method Resolution Order (MRO). Each class has a property called MRO, whose value is a tuple that lists the superclasses in method resolution order, from the current class up to the Object class.

Chapter 13: Correctly overloading Operators

In Python, most operators can be overloaded, such as == for __eq__ and + for __add__.

Some operators cannot be overloaded, such as is, and, or, and.

Chapter 14: Iterable objects, iterators, and Generators

Iteration is the cornerstone of data processing. When scanning for data sets that don’t fit in memory, we find a way to lazily fetch data, one at a time, as needed. This is the iterator pattern.

The yield keyword is used in Python to build generators in the same way as iterators.

All generators are iterators because generators fully implement the iterator interface.

The most accurate way to check if object X iterates is to call iter(x) and raise TypeError if it is not. This method is more accurate than isinstance(x, abc.iterable) because it also takes into account the legacy __getitem__ method.

An iterable object versus an iterator

We need to define iterable objects as follows:

The iterator object can be obtained using the iter built-in function. If the object implements an object that returns an iterator

Iter method, then the object is iterable. Sequences can be iterated; Getitem method is implemented, and its argument is a zero-based index. This object can also be iterated.

We want to clarify the relationship between iterables and iterators: get iterators from iterables.

The standard iterator interface has two methods:

  • __next__: Returns the next available element, or if there are no more, throwsStopIterationThe exception.
  • __iter__Returns theselfTo use iterators where iterables should be used.

A typical iterator

To clarify the important difference between an iterable and an iterator, we separate the two and write them as two classes:

import re import reprlib RE_WORD = re.compile('\w+') class Sentence: def __init__(self, text): Self.words = re_word.findall (text) def repr__(self): self.words = re_word.findall (text) def repr__(self): Return Sentence(%s)' % reprlib.repr(self.text) def iter__(self): Return SentenceIterator(self.words) # Create an iterator class SentenceIterator: def __init__(self, words) Self. words = words # This iterator instance applies the list of words self.index = 0 # to locate the next element def __next__(self): try: Word = self.words[self.index] # return current element except IndexError: Raise StopIteration() self.index += 1 # index +1 return word # def __iter__(self): return self # Return selfCopy the code

The main purpose of this example is to distinguish between iterables and iterators, which is usually a lot of work and programmers don’t want to write it this way.

Building iterables and iterators often goes wrong because you confuse the two. Remember that iterable objects have an __iter__ method that instantiates a new iterator each time; Iterators implement the __next__ method, which returns a single element, and provide an __iter__ method that returns the iterator itself.

An iterable must not be an iterator of itself. That is, iterables must implement __iter__ methods, but not __next__ methods.

In summary, iterators can iterate, but iterables are not iterators.

Generator function

A way to override python conventions for the same functionality is to replace SentenceIterator with a generator. Change the previous example to the generator approach:

import re import reprlib RE_WORD = re.compile('\w+') class Sentence: def __init__(self, text): Self.words = re_word.findall (text) def repr__(self): self.words = re_word.findall (text) def repr__(self): Return Sentence(%s)' % reprlib.repr(self.text) def iter__(self): "" generator version" "for word in self.words: # Iteration instance of words yield word # generate the word returnCopy the code

In this case, the iterator is actually a generator object that is automatically created each time __iter__ is called, because the __iter__ method here is a generator function.

How generator functions work A function is a generator function as long as the python function has the yield keyword in its definition body. When a generator function is called, a generator object is returned. That is, generator functions are generator factories.

The only difference between normal and generator functions is that generator functions have the yield keyword in them.

The generator function creates a generator object that wraps the body of the generator function definition. Pass the generator to next(…) Function, the generator function moves forward, executes the next yield statement in the function body, returns the value of the output, and pauses at the current position of the function definition body.

Inert implementation

The lazy approach is to simply produce all the data, which is different from next(…). Generating one element at a time.

import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for match in RE_WORD.finditer(self.text):
            yield match.group()Copy the code

Generator expression

A generator expression can be thought of as a lazy version of a list derivation: instead of building a list urgently, a generator is returned, lazily generating elements on demand. That is, if the list derivation is the factory of the output list, then the generator expression is the factory of the output generator.

def gen_AB():
    print('start')
    yield 'A'
    print('continue')
    yield 'B'
    print('end.')

res1 = [x*3 for x in gen_AB()]
for i in res1:
    print('-->', i)Copy the code

As you can see, generator expressions produce generators, so you can use generator expressions to reduce code:

import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text))Copy the code

Instead of calling __iter__ a generator function, you build a generator using a generator expression, resulting in the same effect. Calling the __iter__ method yields a generator object.

Generator expressions are syntactic sugars that are perfectly capable of replacing generator functions.

Generator functions in the standard library

The library provides many generators, including objects for iterating through plain text files line by line, and the excellent OS.walk function. This function produces the file name as it traverses the directory tree, so recursively searching the file system is as simple as a for loop.

Most of the generators in the library are in itertools and Functools; not all are represented in the table.

A generator function for filtering

The module function instructions
itertools compress(it, selector_it) Parallel processing of two iterable objects; If the element in selector_it is true, output the corresponding element in IT
itertools dropwhile(predicate, it) Process it, skipping the elements of the predicate that evaluate to true, and then producing the remaining elements (without further checking)
(built-in) filter(predicate, it) Pass the elements of it to the predicate. If the predicate(item) returns true, the corresponding elements are produced. If predicate is None, only the truth elements are produced

Generator functions for mapping

The module function instructions
itertools accumulate(it, [func]) The cumulative sum of outputs; If func is provided, pass the first two elements to it, then the result of the calculation and the next element, and so on, to produce the result
(built-in) enumerate(iterable, start=0) Output a tuple of two elements of the structure (index, item), where index is counted from start and item is retrieved from iterable
(built-in) map(func, it1, [it2, …, itN]) Pass elements from IT to FUNc to produce results; If N iterable objects are passed in, func must accept N arguments and process each iterable object in parallel

A generator function that merges multiple iterables

The module function instructions
itertools chain(it1, … , itN) All elements in IT1 are produced, then all elements in IT2 are produced, and so on, seamlessly linked together
itertools chain.from_iterable(it) The elements in the iterables generated by output IT, one after the other, are seamlessly linked together; It should produce iterable elements, such as a list of iterable objects
(built-in) zip(it1, … , itN) Fetch elements from each input iterable in parallel, produce a tuple composed of N elements, as long as one iterable object ends, silently stop

New syntax: yield from

If a generator function needs to produce a value generated by another generator, the traditional way is a nested for loop. For example, we would implement the chain generator ourselves:

>>> def chain(*iterables):
...     for it in iterables:
...         for i in it:
...             yield i
...
>>> s = 'ABC'
>>> t = tuple(range(3))
>>> list(chain(s, t))
['A', 'B', 'C', 0, 1, 2]Copy the code

The chain generator function passes the operations in turn to the received iterable. Using the yield from statement simplifies:

>>> def chain(*iterables):
...     for i in iterables:
...         yield from i
...
>>> list(chain(s, t))
['A', 'B', 'C', 0, 1, 2]Copy the code

As you can see, yield from I replaces a for loop. And make the code read more smoothly.

An iterable reduction function

Specification functions are functions that take an iterable but return only a single result.

The module function instructions
(built-in) sum(it, start=0) The sum of all elements in IT, added to if optional start is provided (math.fsum can be used to improve accuracy when calculating floating-point additions)
(built-in) all(it) Return True if all elements in it are True, False otherwise; All ([]) returns True
(built-in) any(it) Return True as long as any element in it is True, False otherwise; Any ([]) returns False
(built-in) max(it, [key=,] [default=]) Returns the element with the largest value in IT; * Key is a sort function, just like in the sorted function; If the iterable is empty, return default
functools reduce(func, it, [initial]) Pass the first two elements to func, then pass the calculation and the third element to func, and so on, returning the final result; If initial is provided, pass it in as the first element

Chapter 15: Context Managers and ELSE Blocks

This chapter discusses flow control features that are uncommon in other languages and, as such, are often overlooked or underused by newcomers to Python. The following features are discussed:

  • The with statement and context manager
  • Else clause of the for while try statement

The with statement sets up a temporary context, which the context manager object controls and cleans up the context. This avoids errors and reduces the amount of code, so the API is safer and easier to use. Besides automatically closing files, the with block has many other uses.

The else clause does this first, and then it chooses to do that.

Else block outside the if statement

The else here is not used in the if statement, but in the for while try statement.

for i in lst:
    if i > 10:
        break
else:
    print("no num bigger than 10")Copy the code

The else clause behaves as follows:

  • for: only if the for loop completes (i.ebreakStatement abort) before the else block is run.
  • while: Only if the while loop exits because the condition is false (i.ebreakStatement abort) before the else block is run.
  • tryThe else block is run only if no exception is thrown in the try block.

In all cases, the else clause is skipped if an exception or a return, break, or continue statement causes control to jump outside the block of the compound statement.

In these cases, using the else clause usually makes your code easier to read and saves you the hassle of setting variables that control flag effects and additional if judgments.

Context manager and with block

The purpose of the context manager object is to manage the with statement, which is to simplify the try/finally pattern. This mode is used to ensure that an operation is performed after a piece of code has finished running, even if the operation is terminated by an exception, return, or sys.exit() call. Code in the finally clause is often used to release important resources or restore a temporary change in state.

The context manager protocol contains two methods, __enter__ and __exit__. When the with statement is started, the __enter__ method is called on the context manager, and when the with statement is finished, the __exit__ method is called. This plays the role of the finally clause.

The most common example of with is ensuring that file objects are closed.

The context manager calls __enter__ with no arguments, and calls __exit__ with three arguments:

  • exc_type: exception classes (such as ZeroDivisionError)
  • exc_value: Exception instance. Sometimes there are parameters passed to the exception constructor, such as error messages, that can be usedexc_value.argsTo obtain
  • traceback : tracebackobject

Utilities in the Contextlib module

In ptyhon’s standard library, there are also classes and other functions in the Contextlib module that are more widely used.

  • closing: If the object provides oneclose()Method, but not implemented__enter__/__exit__Protocol, then you can use this function to build the context manager.
  • suppress: Builds a context manager that temporarily ignores the specified exception.
  • @contextmanagerThis decorator turns a simple generator function into a context manager, eliminating the need to create classes to implement the manager protocol.
  • ContextDecoratorThis is the base class that defines the class-based context manager. This context manager can also be used to decorate functions, running the entire function in a managed context.
  • ExitStack: This context manager can access multiple context managers. At the end of the with block, ExitStack calls the values of each context manager in the stack in last-in, first-out order__exit__Methods. You can use this class if you don’t know in advance how many context managers the with block will enter. For example, open all files in any file list at the same time.

Obviously, the most widely used of these utilities is the @ContextManager decorator, so be careful. This decorator is also somewhat confusing because it uses yield statements that have nothing to do with iteration.

Using the @ contextmanager

The @contextmanager decorator reduces the amount of boilerplate code needed to create a contextmanager because instead of defining __enter__ and __exit__ methods, you just need to implement a yield statement generator.

import sys
import contextlib
@contextlib.contextmanager
def looking_glass():

    original_write = sys.stdout.write
    def reverse_write(text):
        original_write(text[::-1])
    sys.stdout.write = reverse_write
    yield 'JABBERWOCKY'
    sys.stdout.write = original_write

with looking_glass() as f:
    print(f)        # YKCOWREBBAJ
    print("ABCD")   # DCBA
    Copy the code

The yield statement acts as a split, with all the code before the yield statement executing at the beginning of the with block (when the interpreter calls the __enter__ method) and the code after the yield statement executing at the end of the with block (when the __exit__ method is called).

Chapter 16: Coroutines

To understand the concept of coroutines, the yield item yields a value that is supplied to next(…). The caller; There is also a concession to suspend execution of the generator and let the caller continue until another value is needed before calling next(…) Pick up where you left off.

Syntactically, coroutines are similar to generators in that they are functions that pass the yield keyword. However, in coroutines, yield usually occurs on the right side of an expression (datum = yield), and may or may not yield a value (if yield is followed by no expression, None is produced). The coroutine may receive data from the caller, but the caller provides the data to the coroutine using the.send(datum) method. Rather than next (…). Usually, the caller pushes the value to the coroutine.

While generator callers are always asking for data, coroutines allow callers to pass data to them, and coroutines don’t necessarily produce data.

Yield is a process control tool that allows writing multitasking regardless of how the data flows: Coroutines can yield the controller to the central scheduler, thereby enabling other coroutines to be activated.

How do generators evolve into coroutines

The.send(value) method was added to the generator API after the underlying framework of the coroutine was implemented. Callers of generators can use.send(…) To send data that becomes the yield expression value in the generator function. Therefore, generators can be used as coroutines. In addition to the send (…). Method, and add.throw(…) And.close() methods to cause the caller to throw an exception and terminate the generator.

The basic behavior of generators used as coroutines

>>> def simple_coroutine():
...     print('-> coroutine started')
...     x = yield
...     print('-> coroutine received:', x)
...
>>> my_coro = simple_coroutine()
>>> my_coro
<generator object simple_coroutine at 0x100c2be10>
>>> next(my_coro)
-> coroutine started
>>> my_coro.send(42)
-> coroutine received: 42
Traceback (most recent call last):
...
StopIterationCopy the code

In the yield expression, if the coroutine only accepts data from the call, the output value is None. In the same way you create a generator, you call a function to get the generator object. Coroutines always call next(…) first. Function, which cannot initially send data because the generator has not been started and is not provisionally generated at yield. If the controller flows to the end of the coroutine definition, it raises StopIteration like an iterator.

The advantage of using coroutines is that there is no locking, because all coroutines run in only one thread, and they are non-preemptive. Coroutines some state, can call inspect. Getgeneratorstate (…). Coroutines are one of these four states:

  • 'GEN_CREATED'Wait for execution to start.
  • 'GEN_RUNNING'The interpreter is executing.
  • 'GEN_SUSPENDED'Pause at the yield expression.
  • 'GEN_CLOSED'No further action is required.

You can only see this state in multithreaded applications. Alternatively, the generator object could call the getGeneratorState function on itself, but that doesn’t help.

To better understand the behavior of inheritance, consider a coroutine that produces two values:

>>> from inspect import getgeneratorstate >>> def simple_coro2(a): ... print('-> Started: a =', a) ... b = yield a ... print('-> Received: b =', b) ... c = yield a + b ... print('-> Received: c =', c) ... >>> MY_CORO2 = Simple_corO2 (14) >>> GetGeneratorState (my_CORO2) # GEN_CREATED >>> Next (my_CORO2) # Proceed to yield expression, output value a, pause and wait for assignment of b -> Started: A = 14 14 >>> getGeneratorState (my_CORO2) # the coroutine is suspended 'GEN_SUSPENDED' >>> my_corO2. send(28) # The yield expression returns 28 for b, the coroutine executes ahead, and produces a + b value -> Received: B = 28 42 >>> my_coro2.send(99) # similarly, c gets 99, but the generator object raises StopIteration because the coroutine terminates -> Received: c = 99 Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>> getGeneratorState (my_CORO2) #Copy the code

The key point is that the coroutine pauses execution at the yield keyword. For the line b = yield A, the value of b is not set until the client code reactivates the coroutine. It takes a little bit of getting used to, and once you understand this, you’ll understand the use of yield in asynchronous programming. The execution of the function simple_corO2 in the instance code can be divided into three stages:

Example: Use coroutines to compute moving averages

Def averager(): total = 0.0 count = 0 average = None while True term = yield average total += term count += 1 average = total/countCopy the code

This is a coroutine code that dynamically computes the average, and this infinite loop shows that it keeps receiving values and producing results. The coroutine terminates only if the caller calls the.close() method on the coroutine, or if there is no reference to it.

The advantage of coroutines is that context can be maintained between calls without the need for instance properties or closures.

Decorator for preexcited coroutines

If next(…) is not executed Coroutines are useless. To simplify the use of coroutines, a pre-excited decorator is sometimes used.

from functools import wraps def coroutine(func): """ Decorators: execute forward to the first 'yield' expression, prewraps' func '"" @wraps(func) def primer(*args,**kwargs): Call primer to return the pre-excited generator gen = func(*args,**kwargs) Next (gen) # return gen # return primerCopy the code

Terminate coroutines and exception handling

Unhandled exceptions in coroutines bubble up to the caller of next() or send(). If this exception is not handled, it causes the coroutine to terminate.

>>> Coro_avg. Send (40) 40.0 >>> coro_avg. Send (50) 45.0 >>> Coro_avg. Send ('spam' last): ... TypeError: unsupported operand type(s) for +=: 'float' and 'str' >>> coro_avg.send(60) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIterationCopy the code

This requires that these exceptions be handled inside the coroutine. Alternatively, client code can explicitly send exceptions to the coroutine by throwing and closing:

coro_avg.throw(ZeroDivisionError)Copy the code

Failure to handle this exception internally causes the coroutine to terminate.

Close causes the GeneratorExit exception to be thrown at the paused yield expression. The coroutine is of course allowed to handle this exception, but it must not produce a value when it is received, or the interpreter will throw a RuntimeError.

Let the coroutine return a value

Def averager(): total = 0.0 count = 0 average = None while True: yield if term is None: break total += term count += 1 average = total/count return (count, average) coro_avg = averager() next(coro_avg) coro_avg.send(10) coro_avg.send(30) try: Coro_avg. Send (None) # send None to terminate the coroutine except StopIteration as exc: result = exc.valueCopy the code

To return a value, the coroutine must terminate normally, and a normally terminated coroutine raises a StopIteration exception that the caller needs to handle.

Using yield from

Yield from is a completely new grammatical construct. It does a lot more than yield.

>>> def gen():
...     for c in 'AB':
...         yield c
...     for i in range(1, 3):
...         yield i
...
>>> list(gen())
['A', 'B', 1, 2]Copy the code

Can be rewritten as:

>>> def gen():
...     yield from 'AB'
...     yield from range(1, 3)
...
>>> list(gen())
['A', 'B', 1, 2]Copy the code

When the yield form subgen() is used in the generator gen, the subgen gains control and passes the value to the caller of gen, who can call the subgen directly. At this point, gen blocks, waiting for subGen to terminate.

The first thing the yield from x expression does on an x object is call iter(x) to get the iterator. Therefore, an X object can be any iterable.

The semantics are too complicated. Here’s Greg Ewing’s explanation:

“Using an iterator as a generator is equivalent to concatenating the definition of a child generator to a yield from expression

In the. In addition, the child generator can execute a return statement that returns a value that becomes the value of the yieldFrom expression.”

Child generators are generators obtained from yield from

. Then, if the caller uses the send() method, it is also passed directly to the child generator. If None is sent, the child generator’s __next__() method is called. If it is not None, the send() method of the child generator is called. When the child generator raises the StopIteration exception, the delegate generator resumes running. Any other exceptions will bubble up to the delegate generator.

A generator raises a StopIteration exception in a return expr expression.

Chapter 17: Use objects to handle Concurrency

What is the term “temporal”? An object that represents an operation to be performed asynchronously. This concept is the basis for the concurrent.futures module and the Asyncio package.

Example: Three styles of web download

To efficiently process network IO, use concurrency, because networks have high latency, in order not to waste CPU cycles waiting.

For a program that downloads 20 images from the network, the serial download takes 7.18 seconds. Multithreaded download takes 1.40s, asyncio takes 1.35s. There is not much difference between two scripts downloaded concurrently, but it is much faster for serial.

Blocking I/O and GIL

The CPython interpreter is not thread-safe, so there is a global interpretation lock (GIL) that allows only one thread to execute Python bytecode at a time, so a Python process cannot use multiple CPU cores at the same time.

Python programmers have no control over the GIL when writing code. However, all functions in the standard library that perform blocking I/O will release the GIL when the staging system returns the result. This means that IO intensive Python programs can benefit.

Start the process using the concurrent.futures module

A Python process has only one GIL. Multiple Python processes can bypass the GIL, so this approach takes advantage of all the CPU cores. The concurrent.futures module implements true parallel computing. Because it uses ProcessPoolExecutor to delegate work to multiple Python processes.

Both the ProcessPoolExecutor and ThreadPoolExecutor classes implement a common Executor interface, making it easy to switch from a thread-based solution to a process-based solution using concurrent.futures.

def download_many(cc_list):
    workers = min(MAX_WORKERS, len(cc_list))
    with futures.ThreadPoolExecutor(workers) as executor:
        res = executor.map(download_one, sorted(cc_list))Copy the code

To:

def download_many(cc_list):
    with futures.ProcessPoolExecutor() as executor:
        res = executor.map(download_one, sorted(cc_list))Copy the code

The ThreadPoolExecutor.__init__ method requires the max_workers parameter, which specifies the number of threads in the thread pool; In the ProcessPoolExecutor class, this parameter is optional.

Chapter 18: Handling concurrency using the Asyncio package

Concurrency means processing more than one thing at a time.

Parallelism means doing more than one thing at a time. They’re different, but they’re related. One about structure, one about execution. Concurrency is used to formulate solutions to problems that may (but need not) be parallel. Rob Pike, one of the creators of Go

Parallelism refers to two or more events occurring at the same time, while concurrency refers to two or more events occurring at the same time interval. True parallelization requires multiple cores, and laptops now typically have four CPU cores, but often have more than 100 processes running at the same time. Therefore, most processes are actually processed concurrently, not in parallel. The computer is always running more than 100 processes, ensuring that each one has a chance to grow, but the CPU itself does no more than four things at once.

This chapter introduces the Asyncio package, which uses an event looping driven coroutine for concurrency. Asyncio makes heavy use of yield from expressions and is therefore incompatible with python versions below 3.3.

Threads versus coroutines

One uses threads through the threading module and one uses coroutine implementations through the asyncio package for comparison.

import threading import itertools import time def spin(msg, done): # this function will run in a separate thread for char in itertools) cycle (' | / - \ \ ') : Status = char + "+ MSG print(status) if done. Wait (.1): Break def slow_function(): Sleep () # Pretend to be a long time for I/O time.sleep() # pretend to be a long time for I/O time.sleep() Return 42 def supervisor(): # This function sets up the slave thread, displays the thread object, runs the time-consuming computation, and finally kills the thread. done = threading.Event() spinner = threading.Thread(target=spin, args=('thinking! ', done)) print('spinner object:', spinner) # <Thread(thread-1, initial)> spiner.start () # slow_function() # slow_function() Meanwhile, the slave thread animates the rotated pointer done.set() # changes the state of signal; Join () # wait for the spinner thread to finish return result if __name__ == '__main__': result = supervisor() print('Answer:', result)Copy the code

This is an example of using threading to print in 3 seconds. In Python, there is no API for terminating threads. If you want to close a thread, you must send a message to the thread.

Let’s use the @asyncio.coroutine decorator instead of coroutines to implement the same behavior:

Import asyncio import itertools @asyncio. Coroutine def spin(MSG): import asyncio import itertools @asyncio. for char in itertools.cycle('|/-\\'): status = char + ' ' + msg print(status) try: Sleep (.1) # Use yield from Asyncio.sleep (.1) instead of time.sleep(.1). This sleep does not block the event loop. Except Asyncio. CancelledError: # If spin wakes up and raises asyncio.CancelledError, the loop is exited because a cancellation request was made. Break @asyncio.coroutine def slow_function(): # slow_function is a coroutine that continues the event loop with yield from when pretending to do I/O with hibernation. Sleep (3) # yield from asyncio.sleep(3) # yield from asyncio.sleep(3) The coroutine is restored after sleep ends. Asyncio.coroutine def supervisor(): spinner = asyncio.async(spin('thinking! ')) # asyncio.async(...) Function schedules the run time of the Spin coroutine, wraps the Spin coroutine with a Task object, and returns immediately. Print ('spinner object:', spinner) result = yield from slow_function() # slow_function() When finished, the return value is obtained. Meanwhile, the event loop continues because the slow_function finally cedes control back to the main loop using the yield from asyncio.sleep(3) expression. Spinner. Cancel () # The Task object can be cancelled; Cancelling raises asyncio.CancelledError at the yield at which the coroutine is currently paused. Coroutines can catch this exception, delay cancellation, or even refuse cancellation. return result if __name__ == '__main__': Result = loop.run_until_complete(supervisor()) # Drive the Supervisor coroutine so that it finishes running; The return value of this coroutine is the return value of this call. loop.close() print('Answer:', result)Copy the code

The coroutines used by the Asyncio package are strictly defined. Coroutines suitable for the Asyncio API must use yield from in the body of the definition, not yield. In addition, the coroutine of asyncio is driven by the caller, for example asyncio.async(…) , thus driving the coroutine. Finally, the @asyncio.coroutine decorator is applied to coroutines.

The main differences between these two Supervisor implementations are summarized below:

  • asyncio.TaskThe object is more or less the samethreading.ThreadObject equivalence. “Tasks are like green threads in libraries that implement collaborative multitasking, such as GEvent.”
  • TaskObject is used to drive coroutines,ThreadObject is used to invoke callable objects.
  • TaskObjects are not instantiated by hand, but by passing coroutines toasyncio.async(...)A function orloop.create_task(...)Method.
  • To obtain theTaskThe object is scheduled to run (for example, byasyncio.asyncFunction scheduling); Thread instances must call the start method, explicitly telling it to run.
  • The thread versionsupervisorFunction,slow_functionFunctions are ordinary functions called directly by the thread. The asynchronous versionsupervisorFunction,slow_functionThe function is a coroutine, given byyield fromDriver.
  • There is no API to terminate threads externally, because threads can be interrupted at any time, leaving the system in an invalid state. If you want to terminate a task, use theTask.cancel()Instance method, thrown inside the coroutineCancelledErrorThe exception. Coroutines can be pausedyieldTo catch the exception and process the termination request.
  • supervisorCoroutines must be inmainFunction byloop.run_until_completeMethod execution.

Multithreaded programming is difficult because the scheduler can interrupt the thread at any time, and you must remember to retain locks to protect important parts of the program from interruptions in the execution of multiple threads.

By default, coroutines are fully protected against interrupts. We must display output for the rest of the program to run. In the case of coroutines, there is no need to keep the lock, and if you operate synchronously between multiple threads, the coroutine itself will be synchronized, because only one coroutine is running at any one time.

Output from schedules, tasks, and coroutines

In the Asyncio package, transients and coroutines are closely related because yield from can be used to produce results from asyncio.future objects. That is, if foo is a coroutine function, or a normal function that returns a Future or Task instance, then res = yield from foo() can be used.

To perform this operation, you must schedule the running time of the coroutine and then wrap it with an Asyncio.task object. For coroutines, there are two main ways to get a Task:

  • asyncio.async(coro_or_future, *, loop=None): This function unifies coroutines and transients: the first argument can be either of them. If it’s a Future or Task, return it as it is. If it is a coroutine, the async function is calledloop.create_task(...)Method to create a Task object. The loop keyword argument is optional for the incoming event loop; If nothing is passed in, the async function is calledasyncio.get_event_loop()The function gets the loop object.
  • BaseEventLoop.create_task(coro)This method schedules the execution time of the coroutine and returns oneasyncio.TaskObject. If in customBaseEventLoopThe object returned may be an instance of a class in an external library (such as Tornado) that is compatible with Task.

There are multiple functions in the asyncio package that automatically (using the asyncio.async function) wrap the coroutine specified by the argument in an asyncio.task object.

Download using asyncio and AIoHTTP packages

The Asyncio package only supports TCP and UDP directly. If you are using HTTP or other protocols, you need to use third-party packages. The aiOHTTP package is almost always used. Take downloading images for example:

import asyncio import aiohttp from flags import BASE_URL, save_flag, show, main @asyncio.coroutine def get_flag(cc): # Coroutines should be decorated with @asyncio.coroutine. url = '{}/{cc}/{cc}.gif'.format(BASE_URL, cc=cc.lower()) resp = yield from aiohttp.request('GET', Image = yield from resp.read() # Return [email protected] def download_one(cc): The # download_one function must also be coroutine, Yield from image = yield from get_flag(cc) show(cc) save_flag(image, cc.lower() + '.gif') return cc def download_many(cc_list): Loop = asyncio.get_event_loop() # Get a reference to the low-level implementation of the event loop to_do = [download_one(cc) for cc in sorted(cc_list) Wait_coro = asyncio.wait(to_do) # Although the name of the function is wait, it is not a blocking function. Res, _ = loop.run_until_complete(wait_coro) # Execute event loop, Close () # Close the event loop return len(res) if __name__ == '__main__': main(download_many)Copy the code

asyncio.wait(…) A coroutine parameter is an iterable consisting of a term or coroutine, and wait loads each coroutine into a Task. The net result is that all objects handled by WAIT somehow become instances of a Future class. Wait is a coroutine function and therefore returns a coroutine or generator object. To drive the coroutine, we pass the coroutine to loop.run_until_complete(…) Methods.

The loop.run_until_complete method takes a term or coroutine. In the case of coroutines, the run_until_complete method, like the wait function, wraps the coroutine into a Task object. Because coroutines are all yield from driven, this is exactly what run_until_complete does for the wait_coro object returned by wait. The run returns two elements, the first is the finished item and the second is the unfinished item.

Avoid blocking calls

There are two ways to prevent blocking calls from aborting the entire application process:

  • Run each blocking operation in a separate thread
  • Convert each blocking operation to a non-blocking asynchronous call

Multithreading is ok, but it consumes a lot of memory. To reduce memory consumption, callbacks are often used to implement asynchronous calls. This is an underlying concept, similar to the oldest and most primitive of all concurrency mechanisms — hardware interrupts. With callbacks, instead of waiting for a response, we register a function to call when something happens. In this way, all calls are non-blocking.

The event loop at the bottom of an asynchronous application can rely on interrupts, threads, polling, and daemons to ensure that multiple concurrent requests progress and complete so that callbacks can be used. After the event loop gets the response, it goes back and invokes the callback we specified. If done correctly, the main thread common to the event loop and the application code will never block.

Using generators as coroutines is another approach to asynchronous programming. Calling a callback for an event loop has the same effect as calling.send() on a suspended coroutine.

Use Executor objects to prevent blocking event loops

Accessing a local file blocks, while CPython underlying releases the GIL during blocking I/O calls, so another thread can continue.

Because asyncio events are not multithreaded, the save_flag function that saves images blocks the only thread shared with the asyncio event loop, so the entire application freezes when the file is saved. The solution to this problem is to use the run_in_executor method of the event loop object.

The asyncio event loop maintains a ThreadPoolExecutor object. We can call the run_in_executor method and send it a callable object to execute:

@asyncio.coroutine def download_one(cc, base_url, semaphore, verbose): try: with (yield from semaphore): image = yield from get_flag(base_url, cc) except web.HTTPNotFound: status = HTTPStatus.not_found msg = 'not found' except Exception as exc: raise FetchError(cc) from exc else: Loop = asyncio.get_event_loop() # Get a reference to the event loop object loop.run_in_executor(None, # run_in_executor method whose first argument is an Executor instance; If set to None, the default ThreadPoolExecutor instance of the event loop is used. Save_flag, image, cc.lower() + '.gif') # The remaining arguments are callable objects, Ok MSG = 'ok' if verbose and MSG: print(cc, MSG) return Result(status, cc)Copy the code

Chapter 19: Dynamic Properties and features

In Python, properties of data and methods used to process data are called properties. In addition to attributes, PythPN provides a rich API for controlling access to attributes, as well as implementing dynamic attributes, such as the obj.attr method and __getattr__ calculation of attributes.

Creating properties dynamically is a kind of metaprogramming,

Transform data using dynamic properties

Normally, parsed JSON data needs to be accessed in the form of feed[‘Schedule’][‘events’][40][‘name’], If necessary we can change it to attribute access to feed.schedule.events [40].name to get that value.

From Collections import ABC class FrozenJSON: """ A read-only interface that accesses JSON class objects using property notation """ def __init__(self, mapping): self.__data = dict(mapping) def __getattr__(self, name): if hasattr(self.__data, name): return getattr(self.__data, name) else: Return frozenjson.build (self.__data[name]) # return frozenjson.build (self.__data[name]) @classmethod def build(CLS, obj): if isinstance(obj, abc.Mapping): return cls(obj) elif isinstance(obj, abc.MutableSequence): Return [cls.build(item) for item in obj] else: # if neither a dictionary nor a list, return objCopy the code

usenewMethod creates objects in a flexible manner

We usually refer to __init__ as a constructor, a term borrowed from other languages. In fact, the special method used to construct an instance is __new__ : this is a class method that must return an instance. The returned instance is passed to the __init__ method as future self.

Chapter 20: Attribute descriptors

Descriptors are classes that implement the property protocol, which includes the __get__, __set__, and __delete__ methods. Typically, partial protocols can be implemented.

Overridden versus unoverridden descriptors

The way Python accesses attributes is unequal. When a property is read from an instance, the property defined in the instance is usually returned, but if there is no specified property in the instance, the class property is fetched from it. When an attribute is assigned in an instance, the attribute is usually created in the instance without affecting the class at all.

This unequal treatment also has an effect on descriptors. Descriptors can be divided into two broad categories based on whether or not a __set__ method is defined: overridden descriptors and non-overridden descriptors.

Descriptors that implement __set__ methods are override descriptors, because while descriptors are class attributes, implementing __set__ methods overrides assignments to instance attributes. So __set__ as a class method needs to pass in an instance. Here’s an example:

Now def print_args(*args): # Print (args) class Overriding: # Set __set__ and __get__ def __get__(self, instance, owner): print_args('get', self, instance, owner) def __set__(self, instance, value): print_args('set', self, instance, value) class OverridingNoGet: # def __set__(self, instance, value) without the __get__ method: print_args('set', self, instance, value) class NonOverriding: Def __get__(self, instance, owner): print_args('get', self, instance, owner) class Managed: Over = Overriding() over_no_get = OverridingNoGet() non_over = NonOverriding() def spam(self): print('-> Managed.spam({})'.format(repr(self)))Copy the code

Overridden descriptor

obj = Managed()
obj.over        # ('get', <__main__.Overriding object>, <__main__.Managed object>, <class '__main__.Managed'>)
obj.over = 7    # ('set', <__main__.Overriding object>, <__main__.Managed object>, 7)
obj.over        # ('get', <__main__.Overriding object>, <__main__.Managed object>, <class '__main__.Managed'>)Copy the code

The instance property named over overrides the behavior of reading and assigning obj.over.

There is no__get__Override descriptor for a method

obj = Managed()
obj.over_no_get
obj.over_no_get = 7  # ('set', <__main__.OverridingNoGet object>, <__main__.Managed object>, 7)
obj.over_no_getCopy the code

The cover behavior is only reverted when an assignment is performed.

Methods are descriptors

Python classes define functions that are bound methods. If a user-defined function has a __get__ method attached to the class, it acts as a descriptor.

Obj. Spam and manage. spam fetch different objects. The former is

and the latter is

.

Functions are non-overriding descriptors. A call to the __get__ method on a function passing an instance as self yields the method bound to that instance. Calling the function’s __get__ passes Instance None, and the result is the function itself. This is how the parameter self is implicitly bound.

Descriptor usage recommendations

Using features to keep things simple the built-in property class creates override descriptors, and __set__ and __get__ are both implemented.

Read-only descriptors must have set methods To implement read-only attributes, both __get__ and __set__ methods must be defined, or the instance’s property of the same name overrides the descriptor.

The descriptors used for validation can only be set methods. For example, if there is an age attribute, it can only be set to a number. In this case, only __set__ can be defined to validate the value. There is no need to set __get__ in this case, because the instance attributes are fetched directly from __dict__ without triggering the __get__ method.

Chapter 21: Class metaprogramming

Class metaprogramming is the art of creating or customizing classes at run time. In Python, classes are first-class objects, so classes can be created using functions at any time without the class keyword. Class decorators are also functions, but can review, modify, and even replace the decorated class with another class.

Metaclasses are the most advanced tool in class metaprogramming. What is a metaclass? For example, STR is the class that creates strings and int is the class that creates integers. Then the metaclass is the class that creates the class. All classes are created by metaclasses. The other classes are just “instances” of the original.

This chapter discusses how to create classes at run time.

Class factory function

One example in the library is the class factory function named tuples (collections.namedtuple). We pass a class name and several attributes to this function, which creates a subclass of tuple whose elements are retrieved by name.

Suppose we create a record_factory that has similar functionality to a named tuple:

>>> Dog = record_factory('Dog', 'name weight owner')
>>> rex = Dog('Rex', 30, 'Bob')
>>> rex
Dog(name='Rex', weight=30, owner='Bob')
>>> rex.weight = 32
>>> Dog.__mro__
(<class 'factories.Dog'>, <class 'object'>)Copy the code

We’re going to do a class factory function that creates a class at runtime:

def record_factory(cls_name, field_names): try: Field_names = field_names.replace(',', ').split() # attribute split except AttributeError: # no .replace or .split pass # assume it's already a sequence of identifiers field_names = tuple(field_names) # Construct the tuple with the attribute name, which becomes the __slots__ attribute def __init__(self, *args, **kwargs) of the new class: # This function will become the __init__ method attrs = dict(zip(self.__slots__, args)) attrs.update(kwargs) for name, value in attrs.items(): setattr(self, name, value) def __iter__(self): For name in self.__slots__: yield getattr(self, name) def __repr__(self): Values = ', '. Join ('{}={! r}'.format(*i) for i in zip(self.__slots__, self)) return '{}({})'.format(self.__class__.__name__, Values) cls_attrs = dict (# __slots__ = field_names, form a class attribute dictionary __init__ = __init__, __iter__ = __iter__, __repr__ = __repr__) return type(cls_name, (object,), cls_attrs) # Call the metaclass type constructor, build the new class, and then return itCopy the code

Type is the metaclass. The last line of the instance constructs a class named cls_name. The only direct superclass is Object.

When metaprogramming in Python, it is best not to use exec and eval functions. These two functions pose serious security risks.

Metaclass basics

Metaclasses are factories that make classes, but are themselves classes instead of functions. Metaclasses are classes that are used to build classes.

To avoid infinite backtracking, type is an instance of itself. Object and Type have a unique relationship. Object is an instance of Type, and Type is a subclass of Object.

Special methods of metaclassesprepare

Both the Type constructor and the __new__ and __init__ methods of the metaclass receive the body of the class to be evaluated in the form of a name-to-attribute image. By default, this mapping is a dictionary, and attributes are lost in order in the body of the class definition. The solution to this problem is to use the special __prepare__ method introduced by python3, which is only useful in metaclass and must be declared as a classmethod (that is, to use the @classmethod decorator definition). The interpreter calls the metaclass’s __new__ method before calling the __prepare__ method, which creates the mapping using attributes in the class definition body.

The first argument to __prepare__ is a metaclass, followed by the name of the class to be built and the principle of base class composition. The return value must be a map.

class EntityMeta(type): """Metaclass for business entities with validated fields""" @classmethod def __prepare__(cls, name, bases): Return collections.ordereddict () # returns an empty OrderedDict instance in which the class attributes will be stored. def __init__(cls, name, bases, attr_dict): Super ().__init__(name, bases, attr_dict) cls._field_names = [] # create a _field_names attribute for key, attr in attr_dict.items(): if isinstance(attr, Validated): type_name = type(attr).__name__ attr.storage_name = '_{}#{}'.format(type_name, key) cls._field_names.append(key) class Entity(metaclass=EntityMeta): """Business entity with validated fields""" @classmethod def field_names(cls): The # field_names method is simple: yield name for name in cls._field_names in the order in which the fields were addedCopy the code

conclusion

Python is both an easy to learn and a powerful language.