Heap this kind of data structure application scenario is many, the most classic is heap sort. Heapsort is an O(nlogn) in place sorting algorithm. Today we will analyze the data structure of the heap.
What is a heap
A heap is a special kind of tree. It is called a heap if two things are true.
- The heap is a complete binary tree.
- Each node of the heap must have a value greater than or equal to the value of each node in its subtree.
A heap where the value of each node is greater than or equal to the value of each node in its subtree is called a “large top heap”. A heap where the value of each node is less than or equal to the value of each node in its subtree is called a “small top heap”.
How to implement a heap
First we need to know what operations the heap supports and how to store a heap.
For a full binary tree, using arrays is very space-efficient. Since we don’t need to store Pointers to left and right child nodes, we can simply find the left and right child nodes and parent nodes of a node by using the array index. See the figure below.
As shown in the figure, a node with index I in the array has a left child node with index 2i, a right child node with index 2i+1, and a parent node with index I /2. Now let’s look at some of the common operations on the heap. Let’s take a look at insert and delete operations on the large top heap as an example.
1. Insert elements into the heap
After inserting a new element into the heap, we need to continue to satisfy two features of the heap.
If we put the newly inserted element at the end of the heap, it is not conforming to the characteristics of the heap, so we need to adjust it to ensure that it satisfies the characteristics of the heap. Adjustment is divided into upward adjustment and downward adjustment. Let’s start with the upward adjustment. See the figure below.
The process is simple. We compare the size of the newly inserted node to its parent node. If the child node is less than or equal to the parent node, we swap the two nodes. Keep repeating the process until the requirements are met.
2. Delete the top element of the heap
Next, let’s look at the delete operation. We first swap the last element of the heap with the top element, and then delete the last element. The remaining heap elements do not meet the heap requirements, we need to start at the top of the heap and work down until the parent node meets the size relationship. See the figure below.
We know that for a complete binary tree with n nodes, the height of the tree will not exceed log2n. The process of heap adjustment is to compare and switch the paths of the nodes, so the time complexity is proportional to the height of the heap, which is O(logN). The main logic for inserting data and removing data from the top of the heap is the adjustment of the heap, so the time complexity of inserting an element and removing an element from the top of the heap is O(logN).
Three, heap sort
We talked about a lot of sorting methods last time, and you can click sorting algorithm to see that. Today we are going to continue with a new sort algorithm, heapsort, which takes order nlogN time, and is an in-place sort algorithm. We can roughly divide the heap sorting process into two major steps, which are heap building and sorting.
1. Build the heap
We’ll first build a heap of the array where it is. “In place” means to operate on an array without resorting to another array. The idea is to process the data backwards, with each piece of data adjusted from the top down.
Let’s take a look at the following step diagram for building a heap. Since the leaves can only be adjusted downward by themselves, we will start with the last non-leaf and adjust downward.
As shown in the figure, we adjust the data from n/2 to 1 downward. Nodes with subscripts n/2+1 to N are leaf nodes, so we do not need to adjust. \
2. The sorting
After the heap is built, the data in the array has been organized according to the characteristics of the large top heap. The first element in the array is the top of the heap, the largest element. We swap it with the last element, and the largest element goes to the n.
This process is similar to removing the top of the heap. When the top element is removed, we put the element with subindex n on the top of the heap, and then adjust it down to rebuild the heap with the remaining n-1 elements. Once that’s done, we take the top element of the heap, put it at n-1, and we keep doing that until we have only one element in the heap with index 1 left, and we’re done.
Now let’s examine the time, space, and stability of heapsort. The whole process of heap sort, only need individual temporary storage space, so heap sort is in place algorithm. Heapsort consists of two operations: heapsort and heapsort. Heapsort takes O(n) time and sorting takes O(nlog n) time. So, the total time complexity of heapsort is order nlogN. In the process of sorting, there is the operation of swapping the last node of the heap with the top of the heap, so it is possible to change the original relative order of the same value of data, so the heap sort is not a stable sorting algorithm.
Four, the application of heap
Let’s look at a few very important applications of the heap.
1. Priority queue
A priority queue, as the name suggests, is first and foremost a queue. The biggest feature of queues is first-in, first-out. However, in the priority queue, the order of queue exit is not first-in, first-out, but in order of priority.
How do you implement a priority queue? There are many ways to do this, but using the heap is the most straightforward and efficient way to do it. Because the heap and the priority queue are very similar. A heap can be thought of as a priority queue. Inserting an element into the priority queue is equivalent to inserting an element into the heap; Fetching the highest priority element from the priority queue is equivalent to fetching the top element of the heap. Consider the following application scenario.
Suppose we have 100 small files, each of which is 100MB in size. Each file stores an ordered string. We want to merge these small files into one large, ordered file. This is where the priority queue comes in. We’re going to take one string from each of the 100 small files, and then we’re going to create the little top heap, and the element at the top of the heap, which is the first element of the priority queue, is the smallest string. We put the string in a large file and remove it from the heap. Then it takes the next string from the small file and puts it into the heap. By repeating this process, you can place the data of 100 small files into the large file in turn.
2. Use heap to find topK
We can abstract the problem of finding topk into two classes. One is for static data sets, which means the data set is predetermined and never changes. The other is for dynamic data sets, that is, the data set is uncertain in advance, and data is added to the set dynamically. For a static data set, how do I find the first K pieces of data in an array of n pieces? We can maintain a small top heap of size K, traverse the array sequentially, fetching data from the array and comparing it to the top elements of the heap. If it is larger than the top element, we remove the top element and insert it into the heap; If it’s smaller than the top of the heap, we don’t do anything, we just keep going through the array. So once we’ve traversed all the data in the array, the data in the heap is going to be the top K. TopK is obtained based on dynamic data, that is, real-time topK. How do you understand that? Let me give you an example. There are two operations in a data set, one is to add data, and the other is to ask the current first K big data. If we recalculate based on the current data every time we ask for the previous k big data, then the time complexity is O(nlogN), where n represents the size of the current data. We can actually maintain a small top heap of size K all the time, and when we have data to add to the collection, we compare it to the top element of the heap. If it is larger than the top element, we remove the top element and insert it into the heap; If it’s smaller than the top of the heap, we don’t do it. This way, whenever we need to query the top K big data, we can immediately return it to it.
More hardcore knowledge, please pay attention to the public number “programmer senior”.