Pay attention to the “water drop and silver bullet” public account, the first time to obtain high-quality technical dry goods. 7 years of senior back-end development, with a simple way to explain the technology clearly.
It takes about 15 minutes to read this article.
In Python development, you’ve probably heard of the concept of a “descriptor”, but since we rarely use it directly, most developers don’t understand how it works.
However, for those of you who are familiar with Python and want to advance, it is recommended that you understand the principles of descriptors, which will help you to understand Python design ideas in a deeper way.
In fact, although we do not use the descriptor directly during development, it is used all the time at the bottom, such as the following:
function
,bound method
,unbound method
- A decorator
property
,staticmethod
,classmethod
Are they all familiar?
All of this has a lot to do with descriptors, and in this article we’ll look at how they work.
What is a descriptor?
Before explaining what a “descriptor” is, let’s look at a simple example.
class A:
x = 10
print(A.x) # 10
Copy the code
This example is very simple. We define A class attribute x in class A and print its value.
Instead of defining a class attribute directly, we can also define a class attribute as follows:
class Ten:
def __get__(self, obj, objtype=None) :
return 10
class A:
x = Ten() The # property is replaced with a class
print(A.x) # 10
Copy the code
Look closely, this time the class attribute X is no longer a concrete value, but a class Ten. Ten defines a __get__ method that returns a concrete value.
In Python, you can host a class property to a class, and that property is a “descriptor.”
In other words, a “descriptor” is a property of “binding behavior.”
How to understand this sentence?
Remember, when we were developing, what did we call “behavior”? Yes, “behavior” generally refers to a method.
So we can also think of “descriptor” as: the property of an object is no longer a concrete value, but given to a method to define.
Think about it. If we define a property in a way, what’s the advantage of doing that?
With methods, we can implement our own logic within methods. As simple as this, we can assign different values to properties within methods based on different conditions, like the following:
class Age:
def __get__(self, obj, objtype=None) :
if obj.name == 'zhangsan':
return 20
elif obj.name == 'lisi':
return 25
else:
return ValueError("unknow")
class Person:
age = Age()
def __init__(self, name) :
self.name = name
p1 = Person('zhangsan')
print(p1.age) # 20
p2 = Person('lisi')
print(p2.age) # 25
p3 = Person('wangwu')
print(p3.age) # unknow
Copy the code
In this example, the age class attribute is hosted by another class, whose __get__ determines what age is based on the Name attribute of the Person class.
This is just a very simple example of how we can easily change the way a class attribute is defined through the use of descriptors.
Descriptor protocol
Now that you know the definition of descriptors, let’s focus on the classes that host properties.
In order for a class attribute to be hosted to a class, the methods implemented inside the class cannot be arbitrarily defined. It must comply with the descriptor protocol, which implements the following methods:
__get__(self, obj, type=None) -> value
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
As long as one of the above methods is implemented, the class attribute can be called a descriptor.
In addition, descriptors can be divided into “data descriptors” and “non-data descriptors” :
- Only defines the
__get___
Is called a non-data descriptor - In addition to defining
__get__
In addition, it defines__set__
或__delete__
Is called a data descriptor
I will elaborate on the differences between the two below.
Now let’s look at an example descriptor containing __get__ and __set__ methods:
# coding: utf8
class Age:
def __init__(self, value=20) :
self.value = value
def __get__(self, obj, type=None) :
print('call __get__: obj: %s type: %s' % (obj, type))
return self.value
def __set__(self, obj, value) :
if value <= 0:
raise ValueError("age must be greater than 0")
print('call __set__: obj: %s value: %s' % (obj, value))
self.value = value
class Person:
age = Age()
def __init__(self, name) :
self.name = name
p1 = Person('zhangsan')
print(p1.age)
# call __get__: obj: <__main__.Person object at 0x1055509e8> type: <class '__main__.Person'>
# 20
print(Person.age)
# call __get__: obj: None type: <class '__main__.Person'>
# 20
p1.age = 25
# call __set__: obj: <__main__.Person object at 0x1055509e8> value: 25
print(p1.age)
# call __get__: obj: <__main__.Person object at 0x1055509e8> type: <class '__main__.Person'>
# 25
p1.age = -1
# ValueError: age must be greater than 0
Copy the code
In this case, the class attribute age is a descriptor whose value depends on the age class.
From the output, we call age’s __get__ and __set__ methods when we get or modify the age attribute:
- When calling
p1.age
When,__get__
Called, argumentobj
是Person
Instance,type
是type(Person)
- When calling
Person.age
When,__get__
Called, argumentobj
是None
.type
是type(Person)
- When calling
p1.age = 25
When,__set__
Called, argumentobj
是Person
Instance,value
Is 25 - When calling
p1.age = -1
When,__set__
Does not pass the check, throwsValueError
The arguments passed in for __set__ calls are easier to understand, but the arguments passed in for __get__ methods are different when called by class or instance. Why?
This requires us to understand how descriptors work.
How the descriptor works
To explain how descriptors work, we need to start with access to attributes.
During development, have you ever wondered what happens behind the scenes when we write code like a.b?
Here a and B may have the following conditions:
a
It could be a class, it could be an instance, we call it an objectb
It could be a property, it could be a method, but a method can also be viewed as a property of a class
In either case, there is a common invocation logic in Python:
- First call
__getattribute__
Try to get results - If there is no result, call
__getattr__
In code, it looks like this:
def getattr_hook(obj, name) :
try:
return obj.__getattribute__(name)
except AttributeError:
if not hasattr(type(obj), '__getattr__') :raise
return type(obj).__getattr__(obj, name)
Copy the code
We need to focus on __getAttribute__ here, because it is the entry point to all attribute look-up, and its internal implementation of attribute look-up order looks like this:
- Whether the property to be looked for is a descriptor in the class
- If it is a descriptor, then check if it is a data descriptor
- If it is a data descriptor, the value of the data descriptor is called
__get__
- If it is not a data descriptor, the
__dict__
Look for - if
__dict__
Can not be found in - If it is a non-data descriptor, the non-data descriptor is called
__get__
- If it is also not a non-data descriptor, look it up from the class attribute
- Throws if the class also does not have this attribute
AttributeError
abnormal
The code looks like this:
Get the properties of an object
def __getattribute__(obj, name) :
null = object(a)The type of the object is the class of the instance
objtype = type(obj)
Get the specified attribute from this class
cls_var = getattr(objtype, name, null)
# if this class implements the descriptor protocol
descr_get = getattr(type(cls_var), '__get__', null)
if descr_get is not null:
if (hasattr(type(cls_var), '__set__')
or hasattr(type(cls_var), '__delete__')) :Get attributes from data descriptors first
return descr_get(cls_var, obj, objtype)
Get attributes from the instance
if hasattr(obj, '__dict__') and name in vars(obj):
return vars(obj)[name]
Get attributes from non-data descriptors
if descr_get is not null:
return descr_get(cls_var, obj, objtype)
Get attributes from the class
if cls_var is not null:
return cls_var
Raising AttributeError triggers a call to __getattr__
raise AttributeError(name)
Copy the code
If it’s hard to understand, you’d better write a program to test it and observe the order in which attributes are found in various situations.
Here we can see that to find an attribute in an object, we start with __getAttribute__.
In __getAttribute__, it checks if the class attribute is a descriptor, and if so, its __get__ method is called. But the call details and arguments passed in look like this:
- if
a
Is aThe instance, the call details are:
type(a).__dict__['b'].__get__(a, type(a))
Copy the code
- if
a
Is aclass, the call details are:
a.__dict__['b'].__get__(None, a)
Copy the code
So we can see the output of the example above.
Data and non-data descriptors
Now that you know how descriptors work, let’s move on to the differences between data descriptors and non-data descriptors.
By definition, the difference is:
- Only defines the
__get___
Is called a non-data descriptor - In addition to defining
__get__
In addition, it defines__set__
或__delete__
Is called a data descriptor
Furthermore, as we can see from the order in which descriptors are called above, data descriptors take precedence over non-data descriptors when looking up properties in an object.
In the previous example, we defined __get__ and __set__, so those class attributes are data descriptors.
Let’s look at another example of a non-data descriptor:
class A:
def __init__(self) :
self.foo = 'abc'
def foo(self) :
return 'xyz'
print(A().foo) Output what?
Copy the code
In this code, we define an attribute and method foo with the same name. If we now execute A().foo, what do you think the output will be?
The answer is ABC.
Why print the value of the instance property foo instead of the method foo?
This has something to do with non-data descriptors.
We execute dir(a.foo) and observe:
print(dir(A.foo))
# [... '__get__', '__getattribute__', ...]
Copy the code
See? The foo method of A implements __get__. As we saw above, an object that defines only __get__ is A non-data descriptor. In other words, the method we define in A class is itself A non-data descriptor.
So, in a class, if attributes and methods with the same name exist, the attributes are retrieved from the instance in the same order as __getAttribute__, and from the non-data descriptor if they don’t exist, so the value of the instance attribute foo is retrieved first.
Here we can summarize the relevant knowledge about descriptors:
- The descriptor must be a class attribute
__getattribute__
Is an entry point to find a property (method)__getattribute__
Defines the search order of a property (method) : data descriptor, instance property, non-data descriptor, class property- If we rewrite it
__getattribute__
Method to prevent the descriptor from being called - All methods are actually a non-data descriptor because it defines
__get__
Usage scenarios for descriptors
Now that you know how descriptors work, what business scenarios are descriptors commonly used in?
Here I implement a property validator using descriptors, which you can use in similar scenarios by referring to this example.
First, we define a Validator base class. In the __set__ method, we call the validate method to verify that the attributes meet the requirements, and then assign values to the attributes.
class Validator:
def __init__(self) :
self.data = {}
def __get__(self, obj, objtype=None) :
return self.data[obj]
def __set__(self, obj, value) :
Check pass and then assign
self.validate(value)
self.data[obj] = value
def validate(self, value) :
pass
Copy the code
Next, we define two validation classes, inherit the Validator, and implement our own validation logic.
class Number(Validator) :
def __init__(self, minvalue=None, maxvalue=None) :
super(Number, self).__init__()
self.minvalue = minvalue
self.maxvalue = maxvalue
def validate(self, value) :
if not isinstance(value, (int.float)) :raise TypeError(f'Expected {value! r} to be an int or float')
if self.minvalue is not None and value < self.minvalue:
raise ValueError(
f'Expected {value! r} to be at least {self.minvalue! r}'
)
if self.maxvalue is not None and value > self.maxvalue:
raise ValueError(
f'Expected {value! r} to be no more than {self.maxvalue! r}'
)
class String(Validator) :
def __init__(self, minsize=None, maxsize=None) :
super(String, self).__init__()
self.minsize = minsize
self.maxsize = maxsize
def validate(self, value) :
if not isinstance(value, str) :raise TypeError(f'Expected {value! r} to be an str')
if self.minsize is not None and len(value) < self.minsize:
raise ValueError(
f'Expected {value! r} to be no smaller than {self.minsize! r}'
)
if self.maxsize is not None and len(value) > self.maxsize:
raise ValueError(
f'Expected {value! r} to be no bigger than {self.maxsize! r}'
)
Copy the code
Finally, we use this validation class:
class Person:
The validation rules for attributes are implemented internally with descriptors
name = String(minsize=3, maxsize=10)
age = Number(minvalue=1, maxvalue=120)
def __init__(self, name, age) :
self.name = name
self.age = age
The property conforms to the rule
p1 = Person('zhangsan'.20)
print(p1.name, p1.age)
Attribute does not match the rule
p2 = person('a'.20)
# ValueError: Expected 'a' to be no smaller than 3
p3 = Person('zhangsan', -1)
# ValueError: Expected -1 to be at least 1
Copy the code
Now, when we initialize the Person instance, we can verify that these properties conform to the predefined rules.
The function and the method
What’s the difference between function, unbound method, and bound method?
Take a look at this code:
class A:
def foo(self) :
return 'xyz'
print(A.__dict__['foo']) # <function foo at 0x10a790d70>
print(A.foo) # <unbound method A.foo>
print(A().foo) # <bound method A.foo of <__main__.A object at 0x10a793050>>
Copy the code
We can see the difference in the results:
function
It’s exactly a function, and it implements it__get__
Method, so every one of themfunction
Is a non-data descriptor, whereas in a class it would befunction
In the__dict__
Stored in the- when
function
When called by the instance, it is abound method
- when
function
When called by a class, it is aunbound method
Function is a non-data descriptor, as we’ve already covered.
The difference between bound method and unbound method is the type of the caller. If it is an instance, then this function is a bound method, otherwise it is an unbound method.
property/staticmethod/classmethod
Let’s look at property, StaticMethod, classMethod.
The implementation of these decorators is implemented by C by default.
We could have implemented these decorators directly by taking advantage of Python descriptors,
The Python version of property implements:
class property:
def __init__(self, fget=None, fset=None, fdel=None, doc=None) :
self.fget = fget
self.fset = fset
self.fdel = fdel
self.__doc__ = doc
def __get__(self, obj, objtype=None) :
if obj is None:
return self.fget
if self.fget is None:
raise AttributeError(), "unreadable attribute"
return self.fget(obj)
def __set__(self, obj, value) :
if self.fset is None:
raise AttributeError, "can't set attribute"
return self.fset(obj, value)
def __delete__(self, obj) :
if self.fdel is None:
raise AttributeError, "can't delete attribute"
return self.fdel(obj)
def getter(self, fget) :
return type(self)(fget, self.fset, self.fdel, self.__doc__)
def setter(self, fset) :
return type(self)(self.fget, fset, self.fdel, self.__doc__)
def deleter(self, fdel) :
return type(self)(self.fget, self.fset, fdel, self.__doc__)
Copy the code
The Python version of StaticMethod implements:
class staticmethod:
def __init__(self, func) :
self.func = func
def __get__(self, obj, objtype=None) :
return self.func
Copy the code
The Python version of classmethod implements:
class classmethod:
def __init__(self, func) :
self.func = func
def __get__(self, obj, klass=None) :
if klass is None:
klass = type(obj)
def newfunc(*args) :
return self.func(klass, *args)
return newfunc
Copy the code
In addition, you can implement other powerful decorators.
Thus, we can achieve powerful and flexible attribute management functions through descriptors. For some scenes requiring complex attribute control, we can choose to use descriptors to achieve.
conclusion
This article focuses on how Python descriptors work.
First, we learned from a simple example that a class property can be hosted by another class that implements the descriptor protocol method, then the class property is a descriptor. In addition, descriptors can be divided into data descriptors and non-data descriptors.
Then we look at the process of getting an attribute. All entries are in __getAttribute__, which defines the order in which attributes are found, with instance attributes taking precedence over data descriptor calls and data descriptors taking precedence over non-data descriptor calls.
We also learned that a method is a non-data descriptor, and that if we define instance attributes and methods with the same name in a class, the instance attributes are retrieved in the order in which the attributes in __getAttribute__ are found, the instance attributes are accessed first.
Finally, we looked at the differences between function and Method, and the use of Python descriptors to implement property, StaticMethod, and classMethod decorators.
Python descriptors provide powerful attribute access control capabilities that can be used in scenarios that require complex control over attributes.
My advanced Python series:
- Python Advanced – How to implement a decorator?
- Python Advanced – How to use magic methods correctly? (on)
- Python Advanced – How to use magic methods correctly? (below)
- Python Advanced — What is a metaclass?
- Python Advanced – What is a Context manager?
- Python Advancements — What is an iterator?
- Python Advancements — How to use yield correctly?
- Python Advanced – What is a descriptor?
- Python Advancements – Why does GIL make multithreading so useless?
Crawler series:
- How to build a crawler proxy service?
- How to build a universal vertical crawler platform?
- Scrapy source code analysis (a) architecture overview
- Scrapy source code analysis (two) how to run Scrapy?
- Scrapy source code analysis (three) what are the core components of Scrapy?
- Scrapy source code analysis (four) how to complete the scraping task?
Want to read more hardcore technology articles? Focus on”Water drops and silver bullets”Public number, the first time to obtain high-quality technical dry goods. 7 years of senior back-end development, with a simple way to explain the technology clearly.