This is the 25th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021


Algorithm series, arch a pawn. For more highlights, check out my algorithm column (●’◡’●)

This article addresses the problem of “getting the median of a data stream” using the size heap.

Topic:

The median is the number in the middle of an ordered list. If the list length is even, the median is the average of the middle two numbers. For example, the median of [2,3,4] is 3 and the median of [2,3] is (2 + 3) / 2 = 2.5 Design a data structure that supports two operations: void addNum(int num) - Add an integer from the data stream to the data structure. Double findMedian() - Returns the current median for all elements.Copy the code
Advanced: How would you optimize your algorithm if all integers in the data stream were in the range 0 to 100? How would you optimize your algorithm if 99% of the integers in your data stream were in the range 0 to 100?Copy the code

Answer:

In a data stream, data is constantly pouring into the structure, so you face the need to dynamically adjust multiple times to get the median. Therefore, the implemented data structure needs to be both quick to find the median and quick to adjust.

The first thing we can think of is a binary search tree. In the equilibrium state, the top of the tree must be the middle number, and then we decide whether to take two numbers according to the parity of the length.

This method is efficient, but manual writing is time-consuming and laborious.

Based on the idea that you only need to get the middle number, you can divide the data into the left and right sides, with one side as the maximum heap, which allows you to quickly get the maximum number on the left, and the other side as the minimum heap. One thing to note is that the length difference between the left and right data cannot exceed 1. The efficiency of this approach is similar to that of AVL balanced binary search tree, but it is faster to write.

  • AVL balanced binary search tree

Balanced binary search tree: short for balanced binary tree. The highly balanced binary tree, also known as AVL tree according to the Scientist’s English name, was proposed by the former Soviet Union mathematicians AD Else-Velskil and Landis in 1962. It has the following properties:

  1. It could be an empty tree.
  2. If the tree is not empty, the left and right subtrees of any node are balanced binary trees, and the absolute value of the difference in height is not more than 1.

The average and worst case time complexity of search, insert, and delete is O(log n);

Illustration :(illustration source -Maple)

Dynamically maintain a maximum heap and a minimum heap, the maximum heap stores half of the data, the minimum heap stores half of the data, and maintain that the maximum heap top is smaller than the minimum heap top, and the size difference between the two heaps is at most 1.

When a new element is inserted, the details are as follows:

JS:

Const MedianFinder = function () {// Default Max const defaultCmp = (x, y) => x > y; / / exchange elements const swap = (arr, I, j) = > ([arr [I], arr [j]] = [arr [j], arr [I]]). Class Heap {constructor(CMP = defaultCmp) {this.container = []; this.cmp = cmp; } // Insert (data) {const {container, CMP} = this; container.push(data); let index = this.size() - 1; while (index) { let parent = (index - 1) >> 1; if (! cmp(container[index], container[parent])) { return; } swap(container, index, parent); index = parent; } // Pop () {const {container, CMP} = this; if (! this.size()) { return null; } swap(container, 0, this.size() - 1); const res = container.pop(); const length = this.size(); let index = 0, exchange = index * 2 + 1; While (exchange < length) {// // if there is a right node and the value of the right node is greater than that of the left node let right = index * 2 + 2; if (right < length && cmp(container[right], container[exchange])) { exchange = right; } if (! cmp(container[exchange], container[index])) { break; } swap(container, exchange, index); index = exchange; exchange = index * 2 + 1; } return res; } // Get heap size size() {return this.container.length; } // Get the heap top peek() {if (this.size()) return this.container[0]; return null; }} // Maximum Heap this.a = new Heap(); // this.b = new Heap((x, y) => x < y); }; MedianFinder.prototype.addNum = function (num) { if (this.A.size() ! == this.b.size () {this.a.size (num); this.a.size (num); this.B.insert(this.A.pop()); } else {// add an element to A when N is even // insert num into B, insert A this.b.insert (num); this.A.insert(this.B.pop()); }}; MedianFinder. Prototype. FindMedian = function () {/ / if the sum is even, return two / / if the sum of the average on the top of the heap for odd number, Return this.a.container. length === this.b.container. length? (this.A.peek() + this.B.peek()) / 2 : this.A.peek(); };Copy the code

Looking at the code implementation based on the diagram, it is too clear


OK, the above is the share ~ writing is not easy, like encourage 👍👍👍

I am Anthony Nuggets, the public account of the same name, every day a pawn, dig a gold, goodbye ~