The author of this article, from the Twisted development team, starts with an example of how cumbersome it is to define classes in Python, and then offers his own solution: the Attrs library. From the introduction, it’s really convenient.

This translation was produced by PythonTG Translation Group. First published in programming school public number and website.

Do you write Python programs? Then you should use Attrs.

Why, you ask? All I can say is, don’t ask, just use it.

All right, let me explain.

I love Python and it has been my workhorse programming language for over a decade. While there have been some interesting languages along the way (Haskell and Rust in particular), I have no plans to switch to another language just yet.

This is not to say that Python doesn’t have problems of its own. In some cases, Python makes you more prone to error. In particular, some libraries make heavy use of class inheritance and the God-object antipattern.

One reason for this may be that Python is such a convenient language that inexperienced programmers have to live with it when they make mistakes.

But I think perhaps the more important reason is that sometimes Python punishes you for trying to do the right thing.

In the context of object design, the “right thing” is to design small, independent classes that do one thing, and do it well. For example, if your object starts accumulating a lot of private methods, maybe you should make them public methods for private attributes. However, this kind of thing is so tedious that you probably won’t pay attention to it.

If you have some relevant data, and the relationships and behaviors between the data need to be explained, then you should define them as objects. Defining tuples and lists in Python is very convenient. Address =… Write host, port =… [(family, sockType, proto, canonName, sockaddr)] =… Such a statement, this is the time to regret. This is a lucky situation for you. If you’re unlucky, you might have to maintain code like Values [0][7][4][HOSTNAME][” Canonical “], and you’ll be miserable, not just sorry.

This raises the question: Is using classes in Python a hassle? Let’s look at a simple data structure: a three-dimensional cartesian coordinate. Start with the simplest:

So far so good. We already have a three-dimensional point. What’s next?

class Point3D(object):
    def __init__(self, x, y, z):
Copy the code

Actually, it’s kind of a shame. I just wanted to package the data, but I had to override a special method in the Python runtime that was named by convention. But not too bad; After all, all programming languages are just weird symbols made up of some form or another.

At least you can see the property name, and it makes sense.

class Point3D(object):
    def __init__(self, x, y, z):
        self.x
Copy the code

I have already said that I want an x, but must now specify it as a property…

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
Copy the code

Bind to x? Well, apparently…

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
Copy the code

You have to do this once for every property, so it’s pretty bad, right? I have to type each property name three times, right? ! ?

All right. At least I’ve defined it.

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
Copy the code

What, is it not over yet?

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
Copy the code

Come on. Now I have to type each property name five times if I want to know what the property refers to when I’m debugging. You don’t have to do this if you’re defining tuples, right? ! ? ! ?

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
Copy the code

On 7 times? ! ? ! ? ! ?

class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)
Copy the code

Knock nine times? ! ? ! ? ! ? ! ?

from functools import total_ordering
@total_ordering
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)
Copy the code

Okay, sweat wipe – while the extra two lines of code isn’t great, at least now we don’t have to define any other comparison methods. Everything’s settled now, right?

from unittest import TestCase
class Point3DTests(TestCase):
Copy the code

You know what? I’ve had enough. A class code is 20 lines long and nothing is done. We do this to solve quaternion equations, not to define “data structures that can be printed and compared.” I got bogged down in a lot of useless junk tuples, lists, and dictionaries; Defining proper data structures in Python is cumbersome.

Namedtuple namedtuple

The library’s solution to this conundrum is to use namedTuple. Unfortunately, the first draft (in many ways similar to my own approach to awkward and outdated) namedtuple still couldn’t save the day. It introduces a lot of unnecessary public functions that are a nightmare for compatibility maintenance, and it doesn’t solve half the problem. There are so many drawbacks to this approach, here are just a few highlights:

  • Whether you want it to or not, its fields are accessible through numeric indexes. This means you can’t have private attributes because all attributes are exposed through the public __getitem__ interface.
  • It is equivalent to a primitive tuple that has the same value, so it is easy to type clutter, especially if you want to avoid using tuples and lists.
  • This is a tuple, so it’s always immutable.

For the last point, you can use it like this:

Point3D = namedtuple('Point3D', ['x', 'y', 'z'])
Copy the code

In this case it doesn’t look like a species; Without special circumstances, simple parsing tools will not recognize it as a class. But then you can’t add any other methods to it, because there’s no place for any methods. Not to mention you have to type the name of the class twice.

Or you can use inheritance:

class Point3D(namedtuple('_Point3DBase', 'x y z'.split())):
    pass
Copy the code

Although this makes it possible to add methods and docstrings and look like a class, the internal name (what is displayed in the REPR, not the real name of the class) gets weird. At the same time, you unknowingly make unlisted attributes mutable, a strange side effect of adding a class declaration; Unless you add __slots__=’X Y z’.split() to the class body, but that goes back to the case where each property name must be typed twice.

And we haven’t even mentioned that science has proven that inheritance shouldn’t be used.

So, if you can only choose named tuples, choose named tuples, which is an improvement, though only in part.

Using attrs

This is where my favorite Python library comes in.

pip install attrs

Let’s revisit the above question. How do I write Point3D using the Attrs library?

Since it is not built into Python yet, you must start with the two lines above: import the package and then use the class decorator.

import attr
@attr.s
class Point3D(object):
Copy the code

You see, no inheritance! By using the class decorator, Point3D is still a normal Python class (although we’ll see some double-underscore methods later).

import attr
@attr.s
class Point3D(object):
    x = attr.ib()
Copy the code

Add attribute X.

import attr
@attr.s
class Point3D(object):
    x = attr.ib()
    y = attr.ib()
    z = attr.ib()
Copy the code

Add attributes Y and z, respectively. And you’re done.

Is that OK? And so on. Don’t you define the string representation?

>>> Point3D(1, 2, 3)
Point3D(x=1, y=2, z=3)
Copy the code

How do you compare?

>>> Point3D(1, 2, 3) == Point3D(1, 2, 3)
True
>>> Point3D(3, 2, 1) == Point3D(1, 2, 3)
False
>>> Point3D(3, 2, 3) > Point3D(1, 2, 3)
True
Copy the code

good But what if I want to extract data with well-defined attributes into a format suitable for JSON serialization?

>>> attr.asdict(Point3D(1, 2, 3))
{'y': 2, 'x': 1, 'z': 3}
Copy the code

Maybe it’s a little bit accurate. Even so, many things are made easier by using Attrs, which allows you to declare fields and associated metadata on classes.

>>> import pprint
>>> pprint.pprint(attr.fields(Point3D))
(Attribute(name='x', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='y', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='z', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))
Copy the code

I won’t go into every interesting feature of Attrs here; You can read its documentation. Also, the project is constantly updated, with new things coming out every once in a while, so I might miss some important features. But using Attrs, you can see that it does something that Python lacked before:

  1. It lets you define types succinctly, rather than manually typing def __init __.
  2. It allows you to say directly what you mean by your statement, rather than saying it in a roundabout way. Instead of saying: “I have A type called MyType and it has A constructor in which the parameter ‘A’ is assigned to the attribute ‘A'”, you should say: “I have a type, it’s called MyType, it has a property called A, and the methods associated with it” without having to reverse engineer its methods (e.g., run dir in an instance, or look at self.__ class__.__dict__).
  3. It provides useful default methods, unlike default behavior in Python which sometimes works and most of the time doesn’t.
  4. It starts simple, but provides room to add more rigorous implementations later.

Let’s elaborate on this last point.

Gradually improve

While I’m not going to cover every feature, I’d be remiss if I didn’t mention a few. You can see something interesting in the repr() of these extremely long attributes above.

For example, you validate a property by decorating a class with @attr.s. For example, the Point3D class should contain numbers. For simplicity, we can say that these numbers are of type float, like this:

import attr
from attr.validators import instance_of
@attr.s
class Point3D(object):
    x = attr.ib(validator=instance_of(float))
    y = attr.ib(validator=instance_of(float))
    z = attr.ib(validator=instance_of(float))
Copy the code

Because we used attrs, this means we have an opportunity to validate later: we can add type information only for each attribute we need. Some of these features allow us to avoid common mistakes. For example, this is a very common “Bug finding” interview question:

class Bag:
    def __init__(self, contents=[]):
        self._contents = contents
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self._contents[:]
Copy the code

To fix it, the correct code should look like this:

class Bag:
    def __init__(self, contents=None):
        if contents is None:
            contents = []
        self._contents = contents
Copy the code

Two additional lines of code have been added.

In this way, contents becomes an unintentional global variable, causing all Bag objects that do not provide a list to share a list. Using attrs, it looks like this:

@attr.s
class Bag:
    _contents = attr.ib(default=attr.Factory(list))
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self._contents[:]
Copy the code

Attrs also provides several other features that make building classes easier and more accurate. Another good example? If you are strictly controlling the attributes of the object (or CPython, which is more efficient in memory usage), you can use slots=True at the class level – e.g. @attr.s(slots=True) – to automatically match the __slots__ attribute declared by attrs. All of this makes the properties declared through atr.ib () better and more powerful.

The future of Python

Some are happy that Python 3 programming will become commonplace. What I’m looking forward to is being able to use Attrs all the time while programming Python. As far as I know, it has had a positive, subtle effect on every code base used.

Try it: You might be surprised to find that where you used to have tuples, lists, or dictionaries that weren’t easy to document, you can now use classes with clear explanations. Since writing clean-structured types is so easy and convenient, attrs should be used a lot in the future. This is a good thing for your code; I’m a good example.

Click here to see the original article.