Notes on the Python Collections module

namedtuple

Collections.namedtuple is a factory function that can be used to build a tuple with field names and a named class — the named class is a great help to the debugger.

We can create a User class like this:

 Card = collections.namedtuple('User'['name'.'age'.'height'])
Copy the code

How to record information about a city with a name tuple

In [1] :from collections import namedtuple

In [2]: City = namedtuple('City'.'name country population coordinates')

In [3]: tokyo = City('Tokyo'.'JP'.36.933, (35.689722.139.691667))

In [4]: tokyo
Out[4]: City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722.139.691667))

In [5]: tokyo.population
Out[5] :36.933

In [6]: tokyo.coordinates
Out[6] : (35.689722.139.691667)

In [7]: tokyo[1]
Out[7] :'JP'
Copy the code

Creating a named tuple takes two arguments, the name of the class and the names of the various fields of the class. The latter can be an iterable consisting of several strings, or a string consisting of field names separated by Spaces.

In addition to the attributes inherited from regular tuples, named tuples have some proprietary attributes of their own.

In [8]: City._fields
Out[8] : ('name'.'country'.'population'.'coordinates')

In [9]: LatLong = namedtuple('LatLong'.'lat long')

In [10]: delhi_data = ('Delhi NCR'.'IN'.21.935, LatLong(28.613889.77.208889))

In [11]: delhi = City._make(delhi_data)

In [12]: delhi._asdict()
Out[12]: 
OrderedDict([('name'.'Delhi NCR'),
             ('country'.'IN'),
             ('population'.21.935),
             ('coordinates', LatLong(lat=28.613889, long=77.208889))])

In [13] :for key, value indelhi._asdict().items(): ... : print(key +':', value) ... : name: Delhi NCR country: IN population:21.935
coordinates: LatLong(lat=28.613889, long=77.208889)

Copy the code

The _fields attribute is a tuple containing the names of all the fields of the class.

_make() generates an instance of this class by taking an iterable, which does the same thing as City(*delhi_data).

_asdict() returns the named tuple as collections.ordereddict, which we can use to render the information in the tuple in a friendly way.

defaultdict

Let’s start with an example.

Dict Counts the number of occurrences of a string in a list:

In [1]: langs = ['java'.'php'.'python'.'C#'.'kotlin'.'swift'.'python']

In [2]: res_dict = {}

In [3] :for lang inlangs: ... :if lang inres_dict: ... : res_dict[lang] +=1. :else:
   ...:         res_dict[lang] = 1. : In [4]: res_dict
Out[4] : {'C#': 1.'java': 1.'kotlin': 1.'php': 1.'python': 2.'swift': 1}

Copy the code

This is done once per loop, and can be eliminated by calling setDefault.

In [1]: langs = ['java'.'php'.'python'.'C#'.'kotlin'.'swift'.'python']

In [2]: res_dict = {}

In [3] :for lang inlangs: ... : res_dict.setdefault(lang,0)
   ...:     res_dict[lang] += 1. : In [4]: res_dict
Out[4] : {'C#': 1.'java': 1.'kotlin': 1.'php': 1.'python': 2.'swift': 1}
Copy the code

If the value does not exist, the exception will be thrown:

In [5]: res_dict['c++']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-269671e9ed5a> in <module>()
----> 1 res_dict['c++']

KeyError: 'c++'

Copy the code

Sometimes, for convenience, we want to get a default value for reading values through a key even if it doesn’t exist in the map. There are two ways to do this: by using defaultdict instead of regular dict, or by subclassing yourself a dict and implementing the __missing__ method in that subclass.

The use of defaultdict

In [7] :from collections import defaultdict

In [8]: res_dict= defaultdict(int)

In [9] :for lang inlangs: ... : res_dict[lang] +=1. : In [10]: res_dict
Out[10]: 
defaultdict(int,
            {'C#': 1.'java': 1.'kotlin': 1.'php': 1.'python': 2.'swift': 1})

In [11]: res_dict['c++']
Out[11] :0
Copy the code

The defaultdict constructor takes a callable object that is called to return a value when the __getitem__ method can’t find one.

So we can return more complex defaults:

In [25] :def gen_dict(a):. :return {'name': 'None'.'age': 0}
    ...: 

In [26]: res_dict = defaultdict(gen_dict)

In [27]: res_dict['zhangsan']
Out[27] : {'age': 0.'name': 'None'}

Copy the code

`missing`methods

In [28] :class CustomDict(dict):. :... :def __missing__(self, key):. :return {'name': 'None'.'age': 18}
    ...: 

In [29]: res_dict = CustomDict()

In [30]: res_dict['lisi']
Out[30] : {'age': 18.'name': 'None'}

Copy the code

deque

The Collections. deque class (two-way queue) is a thread-safe data type that can quickly add or remove elements from both ends. And if you want a data type to hold “the most recently used elements,” a deque is also a good choice. This is because when creating a new two-way queue, you can specify the size of the queue, and if the queue is full, you can remove the expired elements from the reverse end and add new elements at the end.

In [1] :from collections import deque

In [2]: dq = deque(range(10), maxlen=10)

In [3]: dq
Out[3]: deque([0.1.2.3.4.5.6.7.8.9])

In [4]: dq.rotate(3)

In [5]: dq
Out[5]: deque([7.8.9.0.1.2.3.4.5.6])

In [6]: dq.rotate(4 -)

In [7]: dq
Out[7]: deque([1.2.3.4.5.6.7.8.9.0])

In [8]: dq.appendleft(- 1)

In [9]: dq
Out[9]: deque([- 1.1.2.3.4.5.6.7.8.9])

In [10]: dq.extend([11.22.33])

In [11]: dq
Out[11]: deque([3.4.5.6.7.8.9.11.22.33])

In [12]: dq.extendleft([10.20.30.40])

In [13]: dq
Out[13]: deque([40.30.20.10.3.4.5.6.7.8])

Copy the code

Maxlen is an optional parameter that represents the number of elements the queue can hold, and once set, this attribute cannot be modified.

The rotate of the queue takes a parameter n, and when n > 0, the rightmost n elements of the queue are moved to the left. When n is less than 0, the n elements on the left are moved to the right.

When an attempt is made to tail-add a queue that is full (len(d) == d.mamaxlen), its header element is removed.

The extendLeft (iter) method adds iterator elements one by one to the left of the bidirectional queue, so the iterator elements appear in reverse order in the queue.

Counter

This mapping type prepares an integer counter for the key. This counter is incremented each time a key is updated. So this type can be used to count a hash table object, or to count a hash table object as a multiple set — a multiple set is a set whose elements can appear more than once. Counter implements the + and – operators to merge records, as well as useful methods like most_common([n]). Most_common ([n]) returns the most common n keys in the map and their count in order

In [1] :from collections import Counter

In [2]: langs = ['java'.'php'.'python'.'C#'.'kotlin'.'swift'.'python']

In [3]: ct = Counter(langs)

In [4]: ct
Out[4]: Counter({'C#': 1.'java': 1.'kotlin': 1.'php': 1.'python': 2.'swift': 1})

In [5]: ct.update(['java'.'c'])

In [6]: ct
Out[6]: 
Counter({'C#': 1.'c': 1.'java': 2.'kotlin': 1.'php': 1.'python': 2.'swift': 1})

In [7]: ct.most_common(2)
Out[7] : [('java'.2), ('python'.2)]
Copy the code

Of course, you can also manipulate strings directly:

In [9]: ct = Counter('abracadabra')

In [10]: ct
Out[10]: Counter({'a': 5.'b': 2.'c': 1.'d': 1.'r': 2})

In [11]: ct.update('aaaaazzz')

In [12]: ct
Out[12]: Counter({'a': 10.'b': 2.'c': 1.'d': 1.'r': 2.'z': 3})

In [13]: ct.most_common(2)
Out[13] : [('a'.10), ('z'.3)]

Copy the code

OrderedDict

This type preserves the order in which keys are added, so the keys are always iterated in the same order. OrderedDict’s popItem method removes and returns the last element in the dictionary by default, but if called like my_odict.popitem(last=False), it removes and returns the first element added.

Move_to_end (key, last=True) moves the existing key to the end of the ordered dictionary. If last=True (the default), item moves to the right, and if last=False, item moves to the start. If key does not exist, KeyError is raised:

In [1] :from collections import OrderedDict

In [2]: d = OrderedDict.fromkeys('abcde')

In [3]: d.move_to_end('b')

In [4] :' '.join(d.keys())
Out[4] :'acdeb'

In [5]: d.move_to_end('b', last=False)

In [6] :' '.join(d.keys())
Out[6] :'bacde'

Copy the code

Since OrderedDict remembers its insertion order, it can be used in conjunction with sorted to create a sorted dictionary:

In [11]: d = {'banana': 3.'apple': 4.'pear': 1.'orange': 2}
Sort by key
In [12]: OrderedDict(sorted(d.items(), key=lambda t:t[0]))
Out[12]: OrderedDict([('apple'.4), ('banana'.3), ('orange'.2), ('pear'.1)])
Sort by value
In [13]: OrderedDict(sorted(d.items(), key=lambda t:t[1]))
Out[13]: OrderedDict([('pear'.1), ('orange'.2), ('banana'.3), ('apple'.4)])
Sort by key length
In [14]: OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
Out[14]: OrderedDict([('pear'.1), ('apple'.4), ('banana'.3), ('orange'.2)])

Copy the code

When an entry is deleted, the newly sorted dictionary retains the sorted order. However, when a new key is added, the key is appended to the end and does not remain sorted.

ChainMap

The ChainMap class provides a way to quickly link multiple dicts so that they can be treated as a single unit. It’s usually much faster than creating new dict and running multiple update() calls.

In [1] :from collections import ChainMap

In [2]: d1 = {'java': 3.'python': 4}

In [3]: d2 = {'c++': 1.'java': 2}

In [4] :for key, val inChainMap(d1, d2).items(): ... : print(key, val) ... : c++1
java 3
python 4

Copy the code

Duplicate keys that appear later are ignored

ChainMap stores linked dict items in a list. The list is public and can be accessed or updated using the MAPS attribute.

In [10]: c1 = ChainMap(d1, d2)

In [11]: c1.maps[0]
Out[11] : {'java': 3.'python': 4}

In [12]: c1.maps[0] ['python'] = 2

In [13]: c1.items()
Out[13]: ItemsView(ChainMap({'java': 3.'python': 2}, {'c++': 1.'java': 2}))

In [14]: dict(c1)
Out[14] : {'c++': 1.'java': 3.'python': 2}

Copy the code

reference

Python required modules – Collections 8.3. Collections — Container Datatypes Related chapter of Fluent Python

Notes on the Python Collections module

namedtuple

defaultdict

The use of defaultdict

__missing__methods

deque

Counter

OrderedDict

ChainMap

reference

Related Posts

Didi Flink-1.10 Upgrade path

C++ constructor & operator overloading

What have we learned from the move from Cron to Airflow

`missing`methods