Column address: a Python module per week
Meanwhile, welcome to follow my wechat official account AlwaysBeta, more exciting content waiting for you.
Heapq implements a minimal heap sort algorithm for Python lists.
A heap is a tree-like data structure in which the children are in a sort relationship with their parents. The binary heap can be represented using lists or arrays such that the children of element N are at positions 2 * N + 1 and 2 * N + 2 (for zero-based indexes). This layout allows the heap to be rearranged in the right place, so there is no need to reallocate memory when data is added or removed.
Max-heap ensures that the parent level is greater than or equal to its children. Min-heap requires that the parent item be less than or equal to its children. Python’s HEAPq module implements a min-heap.
The sample data
The examples in this section use the data heapq_heapdata.py.
# heapq_heapdata.py
# This data was generated with the random module.
data = [19.9.4.10.11]
Copy the code
The heap output uses the print heapq_showtree.py.
# heapq_showtree.py
import math
from io import StringIO
def show_tree(tree, total_width=36, fill=' '):
"""Pretty-print a tree."""
output = StringIO()
last_row = - 1
for i, n in enumerate(tree):
if i:
row = int(math.floor(math.log(i + 1.2)))
else:
row = 0
ifrow ! = last_row: output.write('\n')
columns = 2 ** row
col_width = int(math.floor(total_width / columns))
output.write(str(n).center(col_width, fill))
last_row = row
print(output.getvalue())
print(The '-' * total_width)
print()
Copy the code
Create a heap
There are two basic ways to create a heap: heappush() and heapify().
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heap = []
print('random :', data)
print()
for n in data:
print('add {:>3}:'.format(n))
heapq.heappush(heap, n)
show_tree(heap)
# output
# random : [19, 9, 4, 10, 11]
#
# add 19:
#
# 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# add 9:
#
# 9
# 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# add 4:
#
# 4
9 # 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# add 10:
#
# 4
# 10 9
# 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# add 11:
#
# 4
# 10 9
11 # 19
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code
When heappush() is used, the heap order is preserved when new elements are added.
If the data is already in memory, use Heapify () to rearrange the elements in the list more efficiently.
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
# output
# random : [19, 9, 4, 10, 11]
# heapified :
#
# 4
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code
Access the contents of the heap
Once the heap is created correctly, use heappop() to remove the element with the minimum value.
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print('random :', data)
heapq.heapify(data)
print('heapified :')
show_tree(data)
print()
for i in range(2):
smallest = heapq.heappop(data)
print('pop {:>3}:'.format(smallest))
show_tree(data)
# output
# random : [19, 9, 4, 10, 11]
# heapified :
#
# 4
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
#
# pop 4:
#
# 9
10 # 19
# 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# pop 9:
#
# 10
19 # 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code
In this example, heapify() and heappop() are used for sorting.
To remove existing elements and replace them with new values in a single operation, use heapreplace().
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heapq.heapify(data)
print('start:')
show_tree(data)
for n in [0.13]:
smallest = heapq.heapreplace(data, n)
print('replace {:>2} with {:>2}:'.format(smallest, n))
show_tree(data)
# output
# start:
#
# 4
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# replace 4 with 0:
#
# 0
# 9 19
# 10, 11
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
#
# replace 0 with 13:
#
# 9
10 # 19
11 # 13
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Copy the code
Replacement elements can maintain a fixed-size heap, such as a jobs queue sorted by priority.
The data extreme value of the heap
Heapq also includes two functions to examine iterable and find the range of maximum or minimum values it contains.
import heapq
from heapq_heapdata import data
print('all :', data)
print('3 largest :', heapq.nlargest(3, data))
print('from sort :', list(reversed(sorted(data)[- 3:])))
print('3 smallest:', heapq.nsmallest(3, data))
print('from sort :', sorted(data)[:3])
# output
# all : [19, 9, 4, 10, 11]
# 3 largest : [19, 11, 10]
# from sort : [19, 11, 10]
# 3 smallest: [4, 9, 10]
# from sort : [4, 9, 10]
Copy the code
Using nlargest() and nsmallest() only works for relatively small values of n > 1, but can still be useful in a few cases.
Effectively merge sort sequences
Combining several sorted sequences into a new sequence is easy for small data sets.
list(sorted(itertools.chain(*data)))
Copy the code
For large data sets, it takes up a lot of memory. Instead of sorting the entire combined sequence, merge() is used to generate a new sequence one at a time.
import heapq
import random
random.seed(2016)
data = []
for i in range(4):
new_data = list(random.sample(range(1.101), 5))
new_data.sort()
data.append(new_data)
for i, d in enumerate(data):
print('{}: {}'.format(i, d))
print('\nMerged:')
for i in heapq.merge(*data):
print(i, end=' ')
print()
# output
# 0: [33, 58, 71, 88, 95]
# 1: [10, 11, 17, 38, 91]
# 2: [13, 18, 39, 61, 63]
# 3: [20, 27, 31, 42, 45]
#
# Merged:
# 10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95
Copy the code
Because merge() uses a heap implementation, it consumes memory based on the number of sequence elements being merged, not the number of elements in all sequences. Related documents:
Pymotw.com/3/heapq/ind…