Pay attention to the “water drop and silver bullet” public account, the first time to obtain high-quality technical dry goods. 7 years of senior back-end development, with a simple way to explain the technology clearly.

It takes about 15 minutes to read this article.

In Python development, you’ve probably heard of the concept of a “descriptor”, but since we rarely use it directly, most developers don’t understand how it works.

However, for those of you who are familiar with Python and want to advance, it is recommended that you understand the principles of descriptors, which will help you to understand Python design ideas in a deeper way.

In fact, although we do not use the descriptor directly during development, it is used all the time at the bottom, such as the following:

  • function,bound method,unbound method
  • A decoratorproperty,staticmethod,classmethod

Are they all familiar?

All of this has a lot to do with descriptors, and in this article we’ll look at how they work.

What is a descriptor?

Before explaining what a “descriptor” is, let’s look at a simple example.

class A:
    x = 10
    
print(A.x) # 10
Copy the code

This example is very simple. We define A class attribute x in class A and print its value.

Instead of defining a class attribute directly, we can also define a class attribute as follows:

class Ten:
    def __get__(self, obj, objtype=None) :
        return 10

class A:
    x = Ten()   The # property is replaced with a class
    
print(A.x) # 10
Copy the code

Look closely, this time the class attribute X is no longer a concrete value, but a class Ten. Ten defines a __get__ method that returns a concrete value.

In Python, you can host a class property to a class, and that property is a “descriptor.”

In other words, a “descriptor” is a property of “binding behavior.”

How to understand this sentence?

Remember, when we were developing, what did we call “behavior”? Yes, “behavior” generally refers to a method.

So we can also think of “descriptor” as: the property of an object is no longer a concrete value, but given to a method to define.

Think about it. If we define a property in a way, what’s the advantage of doing that?

With methods, we can implement our own logic within methods. As simple as this, we can assign different values to properties within methods based on different conditions, like the following:

class Age:
    def __get__(self, obj, objtype=None) :
        if obj.name == 'zhangsan':
            return 20
        elif obj.name == 'lisi':
            return 25
        else:
            return ValueError("unknow")

class Person:

    age = Age()

    def __init__(self, name) :
        self.name = name

p1 = Person('zhangsan')
print(p1.age)   # 20

p2 = Person('lisi')
print(p2.age)   # 25

p3 = Person('wangwu')
print(p3.age)   # unknow
Copy the code

In this example, the age class attribute is hosted by another class, whose __get__ determines what age is based on the Name attribute of the Person class.

This is just a very simple example of how we can easily change the way a class attribute is defined through the use of descriptors.

Descriptor protocol

Now that you know the definition of descriptors, let’s focus on the classes that host properties.

In order for a class attribute to be hosted to a class, the methods implemented inside the class cannot be arbitrarily defined. It must comply with the descriptor protocol, which implements the following methods:

  • __get__(self, obj, type=None) -> value
  • __set__(self, obj, value) -> None
  • __delete__(self, obj) -> None

As long as one of the above methods is implemented, the class attribute can be called a descriptor.

In addition, descriptors can be divided into “data descriptors” and “non-data descriptors” :

  • Only defines the__get___Is called a non-data descriptor
  • In addition to defining__get__In addition, it defines__set____delete__Is called a data descriptor

I will elaborate on the differences between the two below.

Now let’s look at an example descriptor containing __get__ and __set__ methods:

# coding: utf8

class Age:

    def __init__(self, value=20) :
        self.value = value

    def __get__(self, obj, type=None) :
        print('call __get__: obj: %s type: %s' % (obj, type))
        return self.value

    def __set__(self, obj, value) :
        if value <= 0:
            raise ValueError("age must be greater than 0")
        print('call __set__: obj: %s value: %s' % (obj, value))
        self.value = value

class Person:

    age = Age()

    def __init__(self, name) :
        self.name = name

p1 = Person('zhangsan')
print(p1.age)
# call __get__: obj: <__main__.Person object at 0x1055509e8> type: <class '__main__.Person'>
# 20

print(Person.age)
# call __get__: obj: None type: <class '__main__.Person'>
# 20

p1.age = 25
# call __set__: obj: <__main__.Person object at 0x1055509e8> value: 25

print(p1.age)
# call __get__: obj: <__main__.Person object at 0x1055509e8> type: <class '__main__.Person'>
# 25

p1.age = -1
# ValueError: age must be greater than 0
Copy the code

In this case, the class attribute age is a descriptor whose value depends on the age class.

From the output, we call age’s __get__ and __set__ methods when we get or modify the age attribute:

  • When callingp1.ageWhen,__get__Called, argumentobjPersonInstance,typetype(Person)
  • When callingPerson.ageWhen,__get__Called, argumentobjNone.typetype(Person)
  • When callingp1.age = 25When,__set__Called, argumentobjPersonInstance,valueIs 25
  • When callingp1.age = -1When,__set__Does not pass the check, throwsValueError

The arguments passed in for __set__ calls are easier to understand, but the arguments passed in for __get__ methods are different when called by class or instance. Why?

This requires us to understand how descriptors work.

How the descriptor works

To explain how descriptors work, we need to start with access to attributes.

During development, have you ever wondered what happens behind the scenes when we write code like a.b?

Here a and B may have the following conditions:

  1. aIt could be a class, it could be an instance, we call it an object
  2. bIt could be a property, it could be a method, but a method can also be viewed as a property of a class

In either case, there is a common invocation logic in Python:

  1. First call__getattribute__Try to get results
  2. If there is no result, call__getattr__

In code, it looks like this:

def getattr_hook(obj, name) :
    try:
        return obj.__getattribute__(name)
    except AttributeError:
        if not hasattr(type(obj), '__getattr__') :raise
    return type(obj).__getattr__(obj, name) 
Copy the code

We need to focus on __getAttribute__ here, because it is the entry point to all attribute look-up, and its internal implementation of attribute look-up order looks like this:

  1. Whether the property to be looked for is a descriptor in the class
  2. If it is a descriptor, then check if it is a data descriptor
  3. If it is a data descriptor, the value of the data descriptor is called__get__
  4. If it is not a data descriptor, the__dict__Look for
  5. if__dict__Can not be found in
  6. If it is a non-data descriptor, the non-data descriptor is called__get__
  7. If it is also not a non-data descriptor, look it up from the class attribute
  8. Throws if the class also does not have this attributeAttributeErrorabnormal

The code looks like this:

Get the properties of an object
def __getattribute__(obj, name) :
    null = object(a)The type of the object is the class of the instance
    objtype = type(obj)
    Get the specified attribute from this class
    cls_var = getattr(objtype, name, null)
    # if this class implements the descriptor protocol
    descr_get = getattr(type(cls_var), '__get__', null)
    if descr_get is not null:
        if (hasattr(type(cls_var), '__set__')
            or hasattr(type(cls_var), '__delete__')) :Get attributes from data descriptors first
            return descr_get(cls_var, obj, objtype)
    Get attributes from the instance
    if hasattr(obj, '__dict__') and name in vars(obj):
        return vars(obj)[name]
    Get attributes from non-data descriptors
    if descr_get is not null:
        return descr_get(cls_var, obj, objtype)
    Get attributes from the class
    if cls_var is not null:
        return cls_var
    Raising AttributeError triggers a call to __getattr__
    raise AttributeError(name)
Copy the code

If it’s hard to understand, you’d better write a program to test it and observe the order in which attributes are found in various situations.

Here we can see that to find an attribute in an object, we start with __getAttribute__.

In __getAttribute__, it checks if the class attribute is a descriptor, and if so, its __get__ method is called. But the call details and arguments passed in look like this:

  • ifaIs aThe instance, the call details are:
type(a).__dict__['b'].__get__(a, type(a))
Copy the code
  • ifaIs aclass, the call details are:
a.__dict__['b'].__get__(None, a)
Copy the code

So we can see the output of the example above.

Data and non-data descriptors

Now that you know how descriptors work, let’s move on to the differences between data descriptors and non-data descriptors.

By definition, the difference is:

  • Only defines the__get___Is called a non-data descriptor
  • In addition to defining__get__In addition, it defines__set____delete__Is called a data descriptor

Furthermore, as we can see from the order in which descriptors are called above, data descriptors take precedence over non-data descriptors when looking up properties in an object.

In the previous example, we defined __get__ and __set__, so those class attributes are data descriptors.

Let’s look at another example of a non-data descriptor:

class A:

    def __init__(self) :
        self.foo = 'abc'

    def foo(self) :
        return 'xyz'

print(A().foo)  Output what?
Copy the code

In this code, we define an attribute and method foo with the same name. If we now execute A().foo, what do you think the output will be?

The answer is ABC.

Why print the value of the instance property foo instead of the method foo?

This has something to do with non-data descriptors.

We execute dir(a.foo) and observe:

print(dir(A.foo))
# [... '__get__', '__getattribute__', ...]
Copy the code

See? The foo method of A implements __get__. As we saw above, an object that defines only __get__ is A non-data descriptor. In other words, the method we define in A class is itself A non-data descriptor.

So, in a class, if attributes and methods with the same name exist, the attributes are retrieved from the instance in the same order as __getAttribute__, and from the non-data descriptor if they don’t exist, so the value of the instance attribute foo is retrieved first.

Here we can summarize the relevant knowledge about descriptors:

  • The descriptor must be a class attribute
  • __getattribute__Is an entry point to find a property (method)
  • __getattribute__Defines the search order of a property (method) : data descriptor, instance property, non-data descriptor, class property
  • If we rewrite it__getattribute__Method to prevent the descriptor from being called
  • All methods are actually a non-data descriptor because it defines__get__

Usage scenarios for descriptors

Now that you know how descriptors work, what business scenarios are descriptors commonly used in?

Here I implement a property validator using descriptors, which you can use in similar scenarios by referring to this example.

First, we define a Validator base class. In the __set__ method, we call the validate method to verify that the attributes meet the requirements, and then assign values to the attributes.

class Validator:

    def __init__(self) :
        self.data = {}

    def __get__(self, obj, objtype=None) :
        return self.data[obj]

    def __set__(self, obj, value) :
        Check pass and then assign
        self.validate(value)
        self.data[obj] = value

    def validate(self, value) :
        pass    
Copy the code

Next, we define two validation classes, inherit the Validator, and implement our own validation logic.


class Number(Validator) :

    def __init__(self, minvalue=None, maxvalue=None) :
        super(Number, self).__init__()
        self.minvalue = minvalue
        self.maxvalue = maxvalue

    def validate(self, value) :
        if not isinstance(value, (int.float)) :raise TypeError(f'Expected {value! r} to be an int or float')
        if self.minvalue is not None and value < self.minvalue:
            raise ValueError(
                f'Expected {value! r} to be at least {self.minvalue! r}'
            )
        if self.maxvalue is not None and value > self.maxvalue:
            raise ValueError(
                f'Expected {value! r} to be no more than {self.maxvalue! r}'
            )

class String(Validator) :

    def __init__(self, minsize=None, maxsize=None) :
        super(String, self).__init__()
        self.minsize = minsize
        self.maxsize = maxsize

    def validate(self, value) :
        if not isinstance(value, str) :raise TypeError(f'Expected {value! r} to be an str')
        if self.minsize is not None and len(value) < self.minsize:
            raise ValueError(
                f'Expected {value! r} to be no smaller than {self.minsize! r}'
            )
        if self.maxsize is not None and len(value) > self.maxsize:
            raise ValueError(
                f'Expected {value! r} to be no bigger than {self.maxsize! r}'
            )
Copy the code

Finally, we use this validation class:

class Person:

    The validation rules for attributes are implemented internally with descriptors
    name = String(minsize=3, maxsize=10)
    age = Number(minvalue=1, maxvalue=120)

    def __init__(self, name, age) :
        self.name = name
        self.age = age

The property conforms to the rule
p1 = Person('zhangsan'.20)
print(p1.name, p1.age)

Attribute does not match the rule
p2 = person('a'.20)
# ValueError: Expected 'a' to be no smaller than 3
p3 = Person('zhangsan', -1)
# ValueError: Expected -1 to be at least 1
Copy the code

Now, when we initialize the Person instance, we can verify that these properties conform to the predefined rules.

The function and the method

What’s the difference between function, unbound method, and bound method?

Take a look at this code:

class A:

    def foo(self) :
        return 'xyz'

print(A.__dict__['foo']) # <function foo at 0x10a790d70>
print(A.foo)     # <unbound method A.foo>
print(A().foo)   # <bound method A.foo of <__main__.A object at 0x10a793050>>
Copy the code

We can see the difference in the results:

  • functionIt’s exactly a function, and it implements it__get__Method, so every one of themfunctionIs a non-data descriptor, whereas in a class it would befunctionIn the__dict__Stored in the
  • whenfunctionWhen called by the instance, it is abound method
  • whenfunctionWhen called by a class, it is aunbound method

Function is a non-data descriptor, as we’ve already covered.

The difference between bound method and unbound method is the type of the caller. If it is an instance, then this function is a bound method, otherwise it is an unbound method.

property/staticmethod/classmethod

Let’s look at property, StaticMethod, classMethod.

The implementation of these decorators is implemented by C by default.

We could have implemented these decorators directly by taking advantage of Python descriptors,

The Python version of property implements:

class property:

    def __init__(self, fget=None, fset=None, fdel=None, doc=None) :
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        self.__doc__ = doc

    def __get__(self, obj, objtype=None) :
        if obj is None:
            return self.fget
        if self.fget is None:
            raise AttributeError(), "unreadable attribute"
        return self.fget(obj)

    def __set__(self, obj, value) :
        if self.fset is None:
            raise AttributeError, "can't set attribute"
        return self.fset(obj, value)

    def __delete__(self, obj) :
        if self.fdel is None:
            raise AttributeError, "can't delete attribute"
        return self.fdel(obj)

    def getter(self, fget) :
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset) :
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel) :
        return type(self)(self.fget, self.fset, fdel, self.__doc__)
Copy the code

The Python version of StaticMethod implements:

class staticmethod:

    def __init__(self, func) :
        self.func = func

    def __get__(self, obj, objtype=None) :
        return self.func
Copy the code

The Python version of classmethod implements:

class classmethod:

    def __init__(self, func) :
        self.func = func

    def __get__(self, obj, klass=None) :
        if klass is None:
            klass = type(obj)
        def newfunc(*args) :
            return self.func(klass, *args)
        return newfunc
Copy the code

In addition, you can implement other powerful decorators.

Thus, we can achieve powerful and flexible attribute management functions through descriptors. For some scenes requiring complex attribute control, we can choose to use descriptors to achieve.

conclusion

This article focuses on how Python descriptors work.

First, we learned from a simple example that a class property can be hosted by another class that implements the descriptor protocol method, then the class property is a descriptor. In addition, descriptors can be divided into data descriptors and non-data descriptors.

Then we look at the process of getting an attribute. All entries are in __getAttribute__, which defines the order in which attributes are found, with instance attributes taking precedence over data descriptor calls and data descriptors taking precedence over non-data descriptor calls.

We also learned that a method is a non-data descriptor, and that if we define instance attributes and methods with the same name in a class, the instance attributes are retrieved in the order in which the attributes in __getAttribute__ are found, the instance attributes are accessed first.

Finally, we looked at the differences between function and Method, and the use of Python descriptors to implement property, StaticMethod, and classMethod decorators.

Python descriptors provide powerful attribute access control capabilities that can be used in scenarios that require complex control over attributes.

My advanced Python series:

  • Python Advanced – How to implement a decorator?
  • Python Advanced – How to use magic methods correctly? (on)
  • Python Advanced – How to use magic methods correctly? (below)
  • Python Advanced — What is a metaclass?
  • Python Advanced – What is a Context manager?
  • Python Advancements — What is an iterator?
  • Python Advancements — How to use yield correctly?
  • Python Advanced – What is a descriptor?
  • Python Advancements – Why does GIL make multithreading so useless?

Crawler series:

  • How to build a crawler proxy service?
  • How to build a universal vertical crawler platform?
  • Scrapy source code analysis (a) architecture overview
  • Scrapy source code analysis (two) how to run Scrapy?
  • Scrapy source code analysis (three) what are the core components of Scrapy?
  • Scrapy source code analysis (four) how to complete the scraping task?

Want to read more hardcore technology articles? Focus on”Water drops and silver bullets”Public number, the first time to obtain high-quality technical dry goods. 7 years of senior back-end development, with a simple way to explain the technology clearly.