Common data structures? Array: find element O(1), insert and delete slowly. Linked list: Insert delete element O(1), find element slow stack: Fifin fifout queue: Fifin fifout hop table: RE hash table Tree heap: heap is a completely binary tree graph

Sort: bubble: O(n) -O (n^2), stable insert: O(n) -O (n^2), stable select: O(n^2), unstable merge: O(nlogn), stable fast row: O(nlogn) -O (n^2), unstable heap row: Order n plus order n logn, unstable

Collections.sort() / Arrays.sort()

If the number of elements is less than 47, binary insertion sort is performed.

If 47 < number of elements <286, biaxial sort is used.

If the number of elements > 286, use merge sort timsor.sort ()

From the beginning, each successive ascending or descending sequence is partitioned, and the descending sequence is also adjusted to ascending and pushed onto the stack.

If a lifting partition is less than the minimum value, binary insertion sort is used first.

Each time a partition is pushed onto the stack, it is checked to see if existing partitions on the stack can be merged, and if so, it is merged.

Merge means to get the position of the left tail in the right partition and the right tail in the left partition, then the elements outside the two positions are orderly, so it is ok to merge the region between these two elements.

Full binary tree: The leaf nodes are at the bottom level, and each node except the leaf node has left and right child nodes.

Complete binary tree: The leaves are in the bottom two layers. The lowest leaf nodes are arranged to the left. Each node has two left and right nodes except for the penultimate layer. A heap is a complete binary tree.

Binary search tree (binary search/sort tree) : Any node in the tree whose left subtree value is less than the current node and whose right subtree value is greater than the current node. It is designed to support faster find, insert, and delete operations. The middle order traverses the binary search tree, and the result is an increasing data sequence. Time complexity: O(n) -O (logn), that is, linked list – complete binary tree, depending on the height of the tree.

Balanced binary search tree: The height difference between the left and right subtrees of any node in a binary tree cannot be greater than 1, and each node must be smaller on the left and larger on the right. But in fact, many balanced binary search trees do not strictly meet the height difference of less than 1, its core idea is to solve the problem of time complexity degradation of common binary search trees in frequent insertion and deletion. So balancing the binary search tree means that the tree looks symmetrical, that the left subtree is relatively equal to the right subtree. This will keep the height of the whole tree relatively low, resulting in higher operation efficiency. Time complexity: O(logn)

Red and black tree

Red-black tree idea

Red-black tree is a kind of balanced binary search tree, whose time complexity of insert, delete and search operation is O(logn). Each node of a red-black tree, and all paths from that node to its reachable leaf node, contains the same number of black nodes. So the black tree with the red nodes is lower than logn of a complete binary tree, so add the red nodes back, and it’s not contiguous, so the height is approximately 2 logn. So a red-black tree is a balanced binary search tree that is approximately balanced. In addition, the insertion and deletion operations of red-black trees have fewer nodes than AVL trees, resulting in lower maintenance costs.

Left-handedness and right-handedness of red-black trees

Left-handedness: the right child of the concerned node is rotated to the relative root node, then the concerned node becomes the left child of the relative root node, and the original left child of the relative root node moves to the right child of the concerned node.

Dextral: the left child node of the concerned node is selected as the relative root node, then the concerned node becomes the right child node of the relative root node, and the original right child node of the relative root node moves to the left child node of the concerned node.

Red-black tree insertion

First, the requirements of red-black trees should be clarified:

① The root node is black

(2) Each leaf node is black empty node (to facilitate balance adjustment)

③ Any adjacent nodes cannot be red

④ Each node, and all paths from that node to its reachable leaf node, contains the same number of black nodes. So the black tree with the red nodes is lower than logn of a complete binary tree, so add the red nodes back, and it’s not contiguous, so the height is approximately 2 logn. So a red-black tree is a balanced binary search tree that is approximately balanced. In addition, the insertion and deletion operations of red-black trees have fewer nodes than AVL trees, resulting in lower maintenance costs.

⑤ The newly inserted nodes are red nodes

Balance adjustment for insertion operation:

(1) If the parent node of the newly inserted node is black, no balance adjustment is required; If the root node is inserted, it becomes black.

② If the parent node of the new node is red:

If the uncle node is red, set both the parent node and the uncle node to black, set the grandfather node to red, and change the attention node to the grandfather node.

If the uncle node is black and it is the right child of the parent node, the attention node becomes the parent node and then spins left around the attention node

If the uncle node is black and it is the left child node of the parent node, it is rotated right around the grandfather node of the focus node, and then the parent node of the focus node is switched with its brother node.

Recurse the above process for each concerned node.

B tree it is a multi-path balanced search tree, each node can have multiple key values, the corresponding values of these keys in ascending order in the node. Each node should also meet the balance characteristics of small left and large right. N-order B tree, each node has at most N-1 keys, and at least N /2 keys time complexity:

Non-leaf nodes of a B + tree store only keys, but not values, and the keys in a node are sorted in ascending order, satisfying the balance characteristic of small on the left and large on the right. The leaf nodes themselves are sorted in ascending order by key value and are connected by linked list. The keys in the leaf node are sorted in ascending order, and the corresponding values are stored in the leaf node. Time complexity: O(log/n m), where N is the number of crosses and m is the number of layers

Top K

How to find the duplicate top100 url among 100 million urls

A hundred million urls cannot all be loaded into memory for processing. So we can use a divide-and-conquer strategy, where we put urls with the same characteristics into one small file after another (we can use the hash algorithm, hash(URL) % 1000, to store the result in one small file), that is, the same URL must be in a small file, and then pull the small file into memory, Use HashMap to record the number of repetitions. Then, using the small top heap, we get the top100 repeats in each HashMap. Then aggregate the top100 of each small file to get the final top100. There is a serious problem, however, because our hash() can’t split small files evenly, and there may be hash collisions, causing data errors!

First of all, a hundred million URLS can be divided into several small files, and then use memory to sort the URLS in these several small files. Finally, through memory, the ordered string in a number of small files is aggregated into a large file ordered string.

Massive video, the number of playback before K

To find the TopK of the largest solution, use the small top heap: when adding an element, if the value of the element is smaller than the top of the small top heap, do not add.

To find the TopK of the smallest solution, use the big top heap: when adding an element, if the value of the element is larger than the top of the big top heap, do not add.

Take the median of 10 billion numbers

You get the maximum and minimum of the 10 billion numbers, and then you get mid. Divide the 10 billion numbers into two files according to mid, so the median is in the file with the largest number. Repeat the process.

Or: according to the highest bit of each number is 0/1 minute file, the first time to separate the positive and negative, take a large number of files. Then we divide the size, go to the amount until we get.

Or: fast row, random number is divided into small left and large right, then the median must be in the more half. Recurse the process

Given a million numbers, each between [-100,100], print them in order of size.

A million numbers can be pulled into memory, so you can sort them using hashMap buckets; [-100,100] [-100,100] [-100,100] [-100,100] [-100,100]

Each number is between [-100,100]. Given a million numbers, print them in descending order

You can sort by hashMap buckets, where key is -100 to 100 and value is quantity.

How are large files evenly divided among each bucket

Multithreading?