algorithm
- Violence law
- Double pointer method (fast and slow pointer method, sliding window)
- The recursive method
- Hash tables: Generally hash tables are used to quickly determine whether an element is present in a set
- The sorting
- grouping
Many algorithms can be implemented both recursively and looping. Recursively based implementations typically have cleaner code, but not as good performance as lop-based implementations. During the interview, we can choose the appropriate programming method according to the characteristics of the topic and even discuss with the interviewer.
Often sorting and search – based algorithms are the focus of the interview. When preparing for an interview, we should focus on binary search, merge sort and quicksort, so that we can write their code correctly and completely at any time.
If an interview question asks you to search a path through a two-dimensional array, which might be a maze or a checkerboard, try backtracking. Backtracking is usually a good way to implement it in recursive code. Only when the interviewer decides that recursion is not possible can we consider using a stack to simulate recursion.
If the interview question is to find the optimal solution to a problem, and the problem can be divided into multiple sub-problems, then we can try dynamic programming. In the analysis of dynamic programming problems with top-down recursive thinking, we will find that there are smaller sub-problems overlapping among the sub-problems. In order to avoid unnecessary double calculation, we use bottom-up loop code to achieve, that is, calculate the optimal solution of the subproblem first and save it in an array (usually a one-dimensional or two-dimensional array), and then calculate the solution of the big problem based on the solution of the subproblem.
Characteristics of dynamic programming
- To find the optimal solution to a problem, this can be applied
Dynamic programming
The first feature of the problem. - The optimal solution of the global problem depends on the optimal solution of each sub-problem, which can be applied
Dynamic programming
The second characteristic of solving the problem. - We break the big problem down into a number of smaller problems, which overlap with smaller sub-problems, and that can be applied
Dynamic programming
The third feature of solving the problem. - Analyzing the problem from the top down, solving the problem from the bottom up, that can be applied
Dynamic programming
The fourth characteristic of solving the problem.
If, after we tell the interviewer the idea of dynamic programming, the interviewer reminds us whether there is a particular choice in solving the subproblem, and if that particular choice is used, the optimal solution is guaranteed, then usually this hint means that the interview problem may be suitable for greedy algorithms. Of course, interviewers will also ask candidates to prove that greedy choices do lead to an optimal solution.
Recursion needs to satisfy three conditions
1. The solution of a problem can be decomposed into several subproblems
What are the subproblems? The subproblem is the problem of smaller data size. For example, in the movie theater example, you need to know that the problem of “what row you are in” can be decomposed into a sub-problem of “what row are the people in front of you in?”
2. This problem is exactly the same as the decomposed sub-problem except for the different data scale
For example, in the movie theater, the way you solve “what row are you in” is exactly the same way that the people in front of you solve “what row are you in?”
3. Recursive termination conditions exist
To decompose the problem into sub-problems, sub-problems into sub-problems, layer by layer, there can be no infinite cycle, which requires termination conditions.
Again, in the movie theater, the people in the first row know what row they are in without asking anyone else, which is f(1)=1, and that’s the termination condition for recursion.
Bitwise operations can be thought of as a special class of algorithms that operate on zeros and ones after representing numbers in binary. Because the object of bit operation is binary numbers, so it is not very intuitive, but it is not difficult to master it, because there are only and, or, xOR, left shift and right shift 5 bit operations.
Linear and nonlinear tables
The linear table
Linear table: As the name implies, a linear table is a row of data formed as a line. Each linear table has at most two directions: forward and back. In addition to arrays, linked lists, queues, stacks, etc., are also linear table structures.
Linear table storage structure can be subdivided into sequential storage structure and chain storage structure.
- Sequential storage structure: Data is sequentially stored in a contiguous block of physical space. This storage structure is called sequential storage structure
- Chain storage structure: Data is distributed in physical space, and the logical relationship between them is preserved by a line. This storage structure is called chain storage structure
Nonlinear table
Nonlinear tables: such as binary trees, heaps, graphs, etc. It is called nonlinear because, in a nonlinear table, there is no simple contextual relationship between data.
Arrays and linked lists
- When an array is defined, its length is fixed. If you want to change the length of the array, you need to define a new array.
- The length of the linked list can be unfixed, and it can be dynamically added and deleted, which is suitable for scenarios where the amount of data is not fixed, the amount of data is frequently added and deleted, and there are few queries.
Insert/delete (time complexity) | Query (time complexity) | To adapt to the scene | |
---|---|---|---|
An array of | O(n) | O(1) | Fixed amount of data, frequent query, less addition and deletion |
The list | O(1) | O(n) | The amount of data is not fixed, frequently added and deleted, less query |
The sorting
Sorting algorithm | Average complexity | The best situation | The worst | Spatial complexity | The stability of |
---|---|---|---|---|---|
Bubble sort | O (n squared) | O(n) | O (n squared) | O(1) | stable |
Insertion sort | O (n squared) | O(n) | O (n squared) | O(1) | stable |
Selection sort | O (n squared) | O (n squared) | O (n squared) | O(1) | unstable |
Merge sort | O(nlogn) | O(nlogn) | O(nlogn) | O(n) | stable |
Quick sort | O(nlogn) | O(nlogn) | O (n squared) | O(logn) | unstable |
Bucket sort | O(n) | O(n) | O(n) | O(n + m) | stable |
Count sorting | O(n) | O(n) | O(n) | O(n + m) | stable |
Radix sort | O(n) | O(n) | O(n) | O(n + m) | stable |
Heap sort | O(nlogn) | O(nlogn) | O(nlogn) | O(1) | unstable |
Bubble sort -o (n ^ 2)11 | (top) : why is more popular than insertion sort bubble sort?
The bubbling process only involves the exchange operation of adjacent data and only requires constant level temporary space, so its spatial complexity is O(1) and it is an in-place sorting algorithm.
Bubble sort only operates on two adjacent pieces of data. Each bubbling operation compares two adjacent elements to see if they meet the size relationship requirements. If you don’t, switch them. One bubble moves at least one element to its proper position, n times, and n numbers are sorted.
The principle of bubble sort algorithm is relatively easy to understand, the specific code I posted below, you can combine the code to see the principle I talked about before.
Public void bubbleSort(int[] a, int n) {if (n <= 1) return; public void bubbleSort(int[] a, int n) {if (n <= 1) return; for (int i = 0; i < n; ++ I) {Boolean flag = false; for (int j = 0; j < n - i - 1; + + j) {if (a > [j] a [m + 1]) {/ / exchange int TMP = a, [j]. a[j] = a[j+1]; a[j+1] = tmp; flag = true; }} if (! flag) break; // No data exchange, exit early}}Copy the code
Bubble sort is in place sort algorithm?
The bubbling process only involves the exchange operation of adjacent data and only requires constant level temporary space, so its spatial complexity is O(1) and it is an in-place sorting algorithm.
Is bubble sort a stable sorting algorithm?
In bubble sort, only swapping can change the order of two elements. In order to ensure the stability of bubble sort algorithm, when there are two adjacent elements of the same size, we do not exchange, the same size of data before and after sorting will not change the order, so bubble sort is a stable sorting algorithm.
What is the time complexity of bubble sort?
Degree of order
In the best case, the data to be sorted is already in order, and we only need one bubble operation, and we’re done, so in the best case, the time is order n. And the worst case is, the data that we want to sort happens to be in reverse order, so we need to bubble n times, so the worst case is O(n ^ 2).
For an array in reverse order, such as 6,5,4,3,2,1, the order is 0; For a perfectly ordered array, like 1, 2, 3, 4, 5, 6, the order is n times n minus 1 over 2, which is 15. We call the degree of order of a completely ordered array full order.
- In the worst case, the order of the initial state is 0, so we’re going to do n times n minus 1 over 2 swaps.
- In the best case, the order of the initial state is n times n minus 1 over 2, so you don’t have to swap.
- We can take the median n times n minus 1 over 4 to represent the average case where the initial order is neither very high nor very low. In other words, on average, there are n times n minus 1 over 4 swaps, there are definitely more comparisons than swaps, and the upper limit of complexity is O(n ^ 2), so the average time complexity is O(n ^ 2).
Insert sort -o (n ^ 2)11 | (top) : why is more popular than insertion sort bubble sort?
So how exactly is insertion sort implemented with the help of the above ideas
First, we divide the data in the array into two intervals, sorted and unsorted. The initial sorted interval has only one element, the first element of the array. The core idea of insertion algorithm is to take the elements in the unsorted interval, find the appropriate insertion position in the sorted interval, and ensure that the sorted interval data is always in order. This process is repeated until the unsorted interval is empty and the algorithm ends.
Insertion sort is pretty simple, right? I’ve also posted the code implementation here, so you can look at it in conjunction with the code.
Public void insertionSort(int[] a, int n) {if (n <= 1) return; for (int i = 1; i < n; ++i) { int value = a[i]; int j = i - 1; // find the insertion position for (; j >= 0; --j) { if (a[j] > value) { a[j+1] = a[j]; } else {break; } } a[j+1] = value; // Insert data}}Copy the code
Insert sort is in place sort algorithm?
It is obvious from the implementation process that the insertion sort algorithm does not require additional storage space to run, so the space complexity is O(1), that is, this is an in-place sort algorithm.
Is insertion sort a stable sorting algorithm?
In insert sort, for elements with the same value, we can choose to insert the elements that appear later after the elements that appear before, so that the original order can remain unchanged, so insert sort is a stable sorting algorithm.
What is the time complexity of insertion sort?
If the data to be sorted is already in order, we do not need to move any data. If we look up the insertion position from end to end in an ordered data set, we can determine the insertion position by comparing only one data at a time. So in this case, the best time is order n. Notice that the ordered data is traversed from end to end.
If the array is in reverse order, each insert is equivalent to inserting new data into the first position of the array, so a lot of data needs to be moved, so the worst-case time is O(n²).
Remember what was the average time we took to insert a piece of data into an array? That’s right, order n. So, for insert sort, each insert is equivalent to inserting one data into an array, iterating n inserts, so the average time complexity is O(n2).
Selection sort -o (n ^ 2)11 | (top) : why is more popular than insertion sort bubble sort?\
The implementation of selection sort algorithm is similar to insertion sort, which is divided into sorted and unsorted intervals. But select sort will find the smallest element in the unsorted range each time and place it at the end of the sorted range.
Firstly, the sorting space complexity is O(1), which is an in-place sorting algorithm. The best case, worst case, and average case time complexities of selection sort are O(n²).
Is selection sort a stable sorting algorithm?
Let me focus on this. The answer is no, selection sort is an unstable sorting algorithm. As you can see from the graph I drew earlier, selection sort destroys stability by finding the minimum value of the remaining unsorted elements and swapping places with the previous element.
For example, for a set of data such as 5, 8, 5, 2, 9, sorted by the selective sorting algorithm, the first time we find the smallest element 2, we switch places with the first 5, the order of the first 5 and the middle 5 will change, so it will be unstable. Because of this, selection sort is somewhat inferior to bubble sort and insert sort.
Merge Sort – Best, average, worst all O(nlogn) space complexity: O(n)Under 12 | sort () : how to use the fast line thought in O (n) to find the first K elements?
- Merge sort (bottom up) handles subproblems first, then merges partitions,
- Space complexity: There is only one function running on the CPU at any one time, so there is only one temporary memory space in use. The temporary memory space can’t exceed the size of n data, so the space complexity is O(n).
- Stability: We can put the elements in A[p…q] into the TMP array as we did in the pseudocode. This ensures that elements with the same value are placed in the same order before and after the merge. So merge sort is a stable sorting algorithm.
Quick Sort – Average O(nlogn), Worst: O(n²) Space complexity: O(1)Under 12 | sort () : how to use the fast line thought in O (n) to find the first K elements?
- Quicksort (top down) finds a point partition and then deals with subproblems
- Spatial complexity: Using the swap method, there is no need to open up additional space.
Bucket sort -o (n)13 | linear: how to according to age 1 million user data sort?
Count sorting – O (n) [13 | linear: how to according to age 1 million user data sort?]
(time.geekbang.org/column/arti…).
The full score of examinees is 900 and the minimum score is 0. This data range is very small, so we can divide it into 901 buckets with scores ranging from 0 to 900. According to the scores of the examinees, we divided the 500,000 examinees into these 901 buckets. The data in the bucket are all examinees with the same score, so there is no need to sort. We just need to scan each bucket in turn, output the examinees in the bucket in turn into an array, and achieve a sort of 500,000 examinees. Because only scan traversal is involved, the time complexity is O(n).
Count sort can only be used in scenarios where the data range is small. If the data range K is much larger than the data to be sorted n, count sort is not suitable. Moreover, counting sort can only sort non-negative integers. If the data to be sorted is of another type, it is converted to a non-negative integer without changing its relative size.
Radix sort -o (n)13 | linear: how to according to age 1 million user data sort?
Compare the size of mobile phone numbers
Radix sort requires that the data to be sorted can be divided into independent “bits” for comparison, and there is a progressive relationship between the bits. If the high level of data A is larger than that of data B, the rest of the low level need not be compared. In addition, the data range of each bit should not be too large, so that the linear sorting algorithm can be used to sort, otherwise, the time complexity of radix sort cannot be O(n).
Sorting algorithm application -14 | sorting optimization: how to implement a generic, high-performance sort function?
Qsort () source code analysis, using merge sort, quicksort, and insert sort, sometimes requires space for time
To find the
Binary search -15 | binary search (top) : how to use the way of realizing fast search function in the province?
You can see that this is a geometric sequence. Where n/2k=1, the value of k is the total number of contractions. Each reduction operation only involves the size comparison of two data. Therefore, after k interval reduction operations, the time complexity is O(k). N over 2k is equal to 1, and we know that k is equal to log base 2n, so the time complexity is order log base n.
Order logn, log time. This is an extremely efficient time complexity, sometimes even more efficient than algorithms with constant O(1) time complexity. Why do you say that?
Because logn is a very scary order of magnitude, even if n is very, very large, logn is very small. For example, n is equal to 2 to the 32nd, which is a big number, right? It’s about 4.2 billion. In other words, if we were to use binary search to find one data in 4.2 billion data sets, we would need at most 32 comparisons.
Public int bsearch(int[] a, int n, int val) {return bsearchInternally(a, 0, n-1, val); } private int bsearchInternally(int[] a, int low, int high, int value) { if (low > high) return -1; int mid = low + ((high - low) >> 1); if (a[mid] == value) { return mid; } else if (a[mid] < value) { return bsearchInternally(a, mid+1, high, value); } else { return bsearchInternally(a, low, mid-1, value); }}Copy the code
Variant 1: Find the element whose first value is equal to the given value -16 | binary search (under) : how to quickly locate the IP address corresponding to the provinces?
Variant 2: Find the element whose last value is equal to the given value -16 | binary search (under) : how to quickly locate the IP address corresponding to the provinces?
Variant 3: Find the first element greater than or equal to the given value -16 | binary search (under) : how to quickly locate the IP address corresponding to the provinces?
Variant 4: Find the last element that is less than or equal to the given value -16 | binary search (under) : how to quickly locate the IP address corresponding to the provinces?
Jump table -17 | jump table: why Redis jump table must be used to achieve an ordered set?
As you can see from the graph, before there was no index, 62 nodes were traversed to find 62. Now we only need to traverse 11 nodes. Isn’t that much faster? Therefore, when the length n of the linked list is large, such as 1000 or 10000, the search efficiency will be significantly improved after the index is built.
This kind of linked list and multi-level index structure mentioned in front is a hop table.
Time complexity of hop table query
The number of k-th index nodes is 1/2 the number of k-th index nodes, so the number of k-th index nodes is n/(2k).
Suppose an index has rank H, and the highest index has two nodes. Using the formula above, we can get n/(2h)=2, so h=log2n-1. If the original list layer is included, the height of the entire hop list is log base 2n. If m nodes are traversed at each level, the time complexity of querying a data in the hop table is O(m*logn).
So the time to query any data in a hop table is order logn. The time of this search is the same as binary search. In other words, we actually implemented binary lookup based on a single linked list
The spatial complexity of a hop table
The sum of these indexes is n/2+n/4+n/8… + 8 + 4 + 2 = n – 2. So, the space complexity of the hop table is order n.
Hash table -18 | hash (top) : the spelling check function Word document is how to do?
Application: Word error word prompt. Hash conflict resolution: 1. Open addressing method (conflict occurs to find a new) 2. Linked list method (conflict, stay in the linked list)
The contestant numbers are called keys or keywords. We use it to identify a player. The mapping method that converts the entry number to the array index is called a Hash function, and the value that the Hash function evaluates is called a Hash value.
How do you construct a hash function?
I summarize the basic requirements of three-point hash function design:
- The hash function evaluates to a non-negative integer;
- If key1 = key2, hash(key1) == hash(key2);
- If key1 ≠ key2, hash(key1) ≠ hash(key2).
The third point may be difficult to understand, but LET me focus on it. This may seem reasonable, but in real life it is almost impossible to find a hash function that has a different hash value for different keys. Even the well-known hash algorithms such as MD5, SHA, and CRC cannot completely avoid such hash conflicts. Moreover, the limited storage space of arrays also increases the probability of hash collisions.
Therefore, it is almost impossible for us to find a perfect conflict-free hash function, and even if we can find it, the cost of time and calculation is very high. Therefore, we need to solve the problem of hash conflict through other ways.
Hash conflict
No good hash function is immune to hash conflicts. So how do you resolve hash conflicts? We commonly use two types of hash conflict resolution methods, open addressing and chaining.
1. Open addressing
The core idea of open addressing is that if a hash conflict occurs, we re-probe a free location and insert it.
When we look up, once we find a free position through linear detection, we can assume that this data does not exist in the hash table. However, if this free position is later deleted, it will invalidate the original search algorithm. Data that would otherwise exist is assumed to be nonexistent. How can this problem be solved? We can specifically mark deleted elements as deleted. When a linear probe searches for a space marked deleted, it does not stop but continues the probe.
Regardless of the detection method, the probability of hash collisions increases significantly when there are not many free positions in the hash table. In order to maximize the efficiency of the hash table operation, we generally try to ensure that a certain percentage of free slots in the hash table are available. We use the load factor to represent how many vacancies there are. The formula for calculating the loading factor is:
Load factor of the hash table = number of elements filled in/length of the hash table
The larger the load factor, the fewer free positions, the more conflicts, and the worse hash table performance.
2. List method
Linked list is a more common hashing conflict resolution method, which is much simpler than open addressing. In a hash table, there is a linked list for each “bucket” or “slot”, and all elements with the same hash value are placed in the linked list for the same slot.
Hash conflict resolution Prevention: -19 | hash () : how to build an industrial level hash table?
Hash conflict resolution prevention: Open addressing and linked lists (hash tables with low load factors are more suitable for open addressing). In most cases, linked lists are more universal. You can also change the linked list to a more dynamically lookups data structure (red-black tree -> avoid hash time degradation to O(n))
Hash table + linked list: -20 | hash (bottom) : why hash table with linked list often use?
- Arrays have the advantage of random access, but have the disadvantage of requiring continuous memory.
- Linked lists have the advantage of non-contiguous storage, but access lookups are linear.
- Hash tables are mixed with linked lists and hops to combine the advantages of arrays and linked lists and circumvent their disadvantages.
When deleting an element, although the target node can be found by O(1), it is necessary to get the pointer of the previous node to delete the node, and the complexity of traversing the previous node will become O(N). Therefore, it is more appropriate to use a double-linked list. For those of you who have iOS, YYMemoryCache is a combination of hash tables and bidirectional linked lists.
Hash algorithm -21 | hash algorithm (top) : how to prevent the user information in the database is to take off the library?
Application 1: Secure encryption
When it comes to the application of hash algorithm, the first thing that comes to mind should be secure encryption. The Hash algorithms most commonly used for encryption are MD5 (MD5 message-Digest Algorithm) and SHA (Secure Hash Algorithm).
In addition to these two, there are of course many other Encryption algorithms, such as Data Encryption Standard (DES) and Advanced Encryption Standard (AES).
Application 2: Unique identification
We can give each image a unique identifier, or summary of information. For example, we can take 100 bytes from the beginning of the binary string of an image, 100 bytes from the middle, and another 100 bytes from the end, and then put the 300 bytes together and use a hash algorithm (such as MD5) to get a hash string that can be used as the unique identifier of the image. Using this unique identifier to determine whether an image is in the gallery can save a lot of work.
Application 3: data verification (block chain stores hash values of block body and the previous block header, change one, all subsequent blocks store wrong hash values)
One of the characteristics of hashing algorithms is that they are sensitive to data. If the contents of the file block change even a little bit, the resulting hash value will be completely different. So, when the file blocks are downloaded, we can use the same hash algorithm to compute the hash values of the downloaded file blocks one by one, and then compare them with the hash values saved in the seed file. If not, the file block is incomplete or tampered with, and you need to download the file block from another host machine.
Application 4: Hash function (Uniform distribution)
The hashing algorithm used in the hash function is more concerned with whether the hashed values are evenly distributed, that is, whether a set of data is evenly hashed across slots. In addition, the speed of the hash function execution also affects the performance of the hash table. Therefore, the hash algorithm used by the hash function is generally relatively simple and pursues efficiency.
Application 5: Load balancing -22 | hash algorithm (below) : what hash algorithm in the distributed system is the application?
As we know, there are many load balancing algorithms, such as polling, random, weighted polling and so on. Then how to implement a sticky session load balancing algorithm? That is, we need all requests in a session to be routed to the same server on the same client.
All of these problems can be solved perfectly by hashing. We can use the hash algorithm to calculate the hash value of the client IP address or session ID, modulo the obtained hash value with the size of the server list, and finally obtain the server number that should be routed to. This way, we can route all requests from the same IP address to the same back-end server.
Application 6: Data sharding -22 | hash algorithm (below) : what hash algorithm in the distributed system is the application?
1. How to count the frequency of “search keywords”?
There are two difficulties with this problem,
- The first is that the search log is too large to fit into a machine’s memory.
- The second difficulty is that it takes a long time to process such a large amount of data using only one machine.
In view of these two difficulties, we can fragment the data first, and then use the method of multi-machine processing to improve the processing speed. The idea is as follows: In order to improve the speed of processing, we use N machines to process in parallel. We read each search term in turn from the log file of the search record, and calculate the hash value through the hash function, and then modulo with n, and finally get the value, which is the machine number that should be assigned.
In this way, search terms with the same hash value are assigned to the same machine. That is, the same search term will be assigned to the same machine. Each machine counts the number of occurrences of the key words separately, and when combined, the final result.
2. How to quickly judge whether the picture is in the gallery?
Given that we now have 100 million images in our gallery, it is clear that building a hash table on a single machine is not going to work. Since memory on a single machine is limited, building a hash table with 100 million images is obviously way over the memory limit on a single machine.
We can also fragment the data and use multi-machine processing. We have n machines, each of which maintains a hash table corresponding to a certain portion of the image. Each time we read a picture from the gallery, calculate the unique identifier, and then mod with the number of machines N, the value obtained corresponds to the number of machines to be allocated, and then send the unique identifier and image path of the picture to the corresponding machine to build the hash table.
When we want to determine whether a picture is in the gallery, we use the same hash algorithm to calculate the unique identifier of the picture, and then take modulo with the number of machines N. Assuming the value is k, look it up in the hash table constructed by the machine numbered k.
Application 7: Distributed storage
In order to improve the ability of reading and writing data, we generally use a distributed way to store data, such as distributed cache. We have tons of data to cache, so a caching machine is definitely not enough. Therefore, we need to distribute the data across multiple machines.
How do you decide which data to put on which machine? We can borrow the previous idea of data sharding, that is, we hash the data and modulo the number of machines, and the final value is the number of cached machines that should be stored.
The tree
Binary tree -23 | binary tree based (on) : what kind of binary tree for use an array to store
Among them, in the binary tree numbered 2, all the leaf nodes are at the bottom level. Besides the leaf node, each node has two left and right child nodes. This kind of binary tree is called full binary tree. In the binary tree numbered 3, all the leaf nodes are in the bottom two layers, the leaf nodes of the last layer are arranged to the left, and the number of nodes of other layers except the last layer should reach the maximum. This kind of binary tree is called complete binary tree.
Traversal of binary trees
In fact, the front, middle and back traversal of binary tree is a recursive process. For example, pre-order traversal is essentially printing the root node, then recursively printing the left subtree, and then recursively printing the right subtree.
The former sequence traversal
Pre-traversal means that, for any node in the tree, the node is printed first, then its left subtree, and finally its right subtree.
In the sequence traversal
Middle-order traversal means that, for any node in the tree, the left subtree is printed first, then itself, and finally the right subtree.
After the sequence traversal
Post-traversal means that, for any node in the tree, the left subtree is printed first, then the right subtree, and finally the node itself.
Void preOrder(Node* root) {if (root == null) return; Print root preOrder(root->left); preOrder(root->right); } void inOrder(Node* root) {if (root == null) return; inOrder(root->left); Print root inOrder(root->right); } void postOrder(Node* root) {if (root == null) return; postOrder(root->left); postOrder(root->right); Print root //Copy the code
Binary tree and hash table comparison -24 | binary tree based (under) : with such efficient hash table, binary tree why need?
Binary search tree
A binary search tree requires that at any node in the tree, the value of each node in the left subtree is less than the value of this node, and the value of each node in the right subtree is greater than the value of this node.
1. Search operation of binary search tree
We take the root node, and if it’s equal to what we’re looking for, we return it. If the data you’re looking for is smaller than the value of the root node, recursively look it up in the left subtree; If the data you’re looking for is larger than the value of the root node, recursively look it up in the right subtree.
Here I put the code to find the implementation, posted below, combined with the code, it will be easier to understand.
Public class BinarySearchTree {private Node tree; public Node find(int data) { Node p = tree; while (p ! = null) { if (data < p.data) p = p.left; else if (data > p.data) p = p.right; else return p; } return null; } public static class Node { private int data; private Node left; private Node right; public Node(int data) { this.data = data; }}}Copy the code
2. Insert the binary search tree
If the data to be inserted is larger than that of the node and the right subtree of the node is empty, the new data is directly inserted into the position of the right child node. If it is not empty, the right subtree is recursively traversed to find the insertion location. Similarly, if the data to be inserted is smaller than the value of the node and the left subtree of the node is empty, the new data is inserted at the position of the left child node. If it is not empty, the left subtree is recursively traversed to find the insertion location.
Similarly, I have implemented the code for the insertion, which is posted below for you to see.
Public void insert(int data) {if (tree == null) {tree = new Node(data); return; } Node p = tree; while (p ! = null) { if (data > p.data) { if (p.right == null) { p.right = new Node(data); return; } p = p.right; } else { // data < p.data if (p.left == null) { p.left = new Node(data); return; } p = p.left; }}}Copy the code
3. Delete the binary search tree
According to the different number of child nodes of nodes to be deleted, we need to deal with three cases:
In the first case, if the node to be deleted has no children, we simply set the pointer to the node to be deleted to NULL in the parent node. For example, delete node 55 in the figure.
In the second case, if the node to be deleted has only one child (either the left child or the right child), we simply update the pointer in the parent node to point to the child of the node to be deleted. For example, delete node 13 in the figure.
The third case is more complicated if the node to be deleted has two children. We need to find the smallest node in the right subtree of this node and replace it with the node we want to delete. Then delete the smallest node, because the smallest node definitely has no left children (if it has left children, it is not the smallest node), so we can apply the above two rules to delete the smallest node. For example, delete node 18 in the figure.
As usual, I’ll post the deleted code here.
Public void delete(int data) {Node p = tree; // p points to the Node to be deleted and initializes to the root Node. // pp records the parent of p while (p! = null && p.data ! = data) { pp = p; if (data > p.data) p = p.right; else p = p.left; } if (p == null) return; // The node to be removed has two children if (p.eft! = null && p.right ! = null) {// Find the smallest Node in the right subtree. Node minPP = p; While (minP. Left! = null) { minPP = minP; minP = minP.left; } p.data = minP.data; P = minP; Pp = minPP; } // Delete Node is leaf Node or has only one child Node child; // the child node of p if (p.left! = null) child = p.left; else if (p.right ! = null) child = p.right; else child = null; if (pp == null) tree = child; Else if (pp.left == p) pp.left = child; else pp.right = child; }Copy the code
Binary tree vs. hash table
As we discussed in the hash table section, the time complexity of the insert, delete, and find operations can be constant O(1), which is very efficient. Binary search tree in a balanced case, insert, delete, search operation time complexity is O(logn), relative to hash table, seems to have no advantage, so why we still use binary search tree?
- First, the data in the hash table is stored unordered, so if you want to output ordered data, you need to sort it first. For binary search trees, we only need middle-order traversal to output ordered data sequences within O(n) time complexity.
- Second, it takes a lot of time to expand hash table, and the performance is not stable when hash conflicts are encountered. Although the performance of binary lookup tree is not stable, in engineering, the performance of the most commonly used balanced binary lookup tree is very stable, and the time complexity is stable at O(logn).
- Third, in general terms, although the time complexity of operations such as the hash table lookup is constant, the constant is not necessarily less than logn because of hash conflicts, so the actual lookup may not be faster than O(logn). Plus the time consuming of the hash function, it is not necessarily more efficient than balancing binary search trees.
- Fourth, the construction of hash tables is more complex than binary search trees, and there are many things to consider. Such as the design of hash functions, conflict resolution, capacity expansion, capacity reduction, etc. Balanced binary search tree only needs to consider the problem of balance, and the solution of this problem is relatively mature and fixed.
Finally, in order to avoid too many hash collisions, the hash table load factor should not be too large, especially if the hash table is based on the open addressing method, otherwise some storage space will be wasted.
Red and black tree -25 | the red-black tree (top) : why are in engineering with red and black tree this binary tree?
Red-black tree is a self-balanced binary search tree. In addition to binary search tree characteristics, it also has the following characteristics:
- Every node is either red or black
- There can’t be red nodes linked together
- The root nodes are all black root
- The two children of each red node are black. The leaves are all black: output is 0. If you satisfy the property, you can approximate equilibrium, not necessarily red and black, but something else.
Recursive – tree27 | tree recursive: how to use the tree to solve the recursive algorithm’s time complexity?
The condition for the end of quicksort is that the size of the cell to be sorted is 1, that is, the data size in the leaf node is 1. From root n to leaf 1, the shortest path in the recursion tree is multiplied by 1/10 each time, and the longest path is multiplied by 9/10 each time. The shortest path from the root node to the leaf node is log10n, and the longest path is log10/9n.
So, the total number of traversal data is between nlog10n and nlog10/9n. According to the big O notation for complexity, whatever the base of logarithm complexity is, we write it as logn, so when the partition size ratio is 1:9, the time complexity of quicksort is still O(nlogn).
Heap -28 | heap and heap sort: why say heap no quick sort soon?
The definition of the heap
- The heap must be a complete binary tree
- The value of each node in the heap must be greater than or equal to (or less than or equal to) the value of each node in its child tree. (For a heap where each node is greater than or equal to the value of each node in the subtree, we call it a “big top heap.” A heap whose value of each node is less than or equal to the value of each node in the subtree is called a “small top heap”.
Inserts an element into the heap
After we insert an element into the heap, we need to continue to satisfy two characteristics of the heap.
If we put the newly inserted element at the end of the heap, you can see what I’ve drawn here, isn’t that going against the heap? We then need to adjust it to meet the characteristics of the heap again, a process we call heapify.
I’ve drawn a breakdown of the heap process here. We can compare the size of the newly inserted node to the parent node. If the child is less than or equal to the parent, we swap the two nodes. This process is repeated until the size relationship between the parent and child nodes is satisfied.
I’ve translated what I said above about inserting data into the heap into code, so you can look at it together.
Public class Heap {private int[] a; Private int n; private int n; Private int count; private int count; Public Heap(int capacity) {a = new int[capacity + 1]; n = capacity; count = 0; } public void insert(int data) { if (count >= n) return; ++count; a[count] = data; int i = count; While (I / 2 > 0 && a [I] > [I / 2] a) {/ / heap from down to up the swap (a, I, I / 2); // Swap () swaps elements with subscripts I and I /2. }}}Copy the code
Delete the heaptop element
From the second section of the heap definition, where any node is greater than or equal to (or less than or equal to) the value of a subtree node, we can see that the top element stores the maximum or minimum value of the heap.
Let’s say we’re building a big top heap, and the top element is the largest element. After we remove the top element, we need to put the second largest element at the top of the heap, which must be in the left and right child nodes. We then iteratively remove the second largest node, and so on, until the leaf node is removed.
We know that for a perfect binary tree with n nodes, the height of the tree is no higher than log base 2n. The heapification process is commutated along the path of nodes, so the time complexity of heapification is proportional to the height of the tree, that is, O(logn). The primary logic for inserting data and removing heaptop elements is heapization, so the time complexity for inserting an element into the heap and removing the heaptop element is O(logn).
Heap sort
1. Build the heap
Time to build the heap is order n.
2. The sorting
The sorting process is order nlogn.
The whole process of heap sort requires very little temporary storage, so heap sort is an in-place sort algorithm. Heap sort consists of two operations: heap build and sort. The time complexity of heap build is O(n), and the time complexity of sort is O(nlogn). Therefore, the overall time complexity of heap sort is O(nlogn).
Heap applications -29 | pile of applications: how to quickly get to the Top 10 most popular search terms?
The heap data structure has several important applications: priority queuing, finding Top K, and finding median.
Application 1 of the heap: priority queue
Merge ordered small files
Suppose we have 100 small files, each 100MB in size, and each file stores an ordered string. We want to merge these 100 small files into one ordered large file. That’s where the priority queue comes in.
This is where you use priority queues, or heaps. We take the string from the small file and put it into the small top heap, and the element at the top of the heap, the element at the top of the priority queue, is the smallest string. We put the string into a large file and remove it from the heap. Then it takes the next string from the small file and puts it in the heap. By repeating this process, you can place the data from 100 smaller files into a larger file.
We know that the time to remove the top of the heap and to insert the data into the heap is order logn, where n is the number of data in the heap, which is 100. Is it much more efficient than the old array way?
Heap application two: use heap to find Top K
It takes O(n) time to go through the array, and O(logK) time for one heaping-operation, so in the worst case, all n elements are heaped at once, which is O(nlogK) time.
If every time before asking K big data, we recalculate based on the current data, then the time complexity is O(nlogK), and N represents the size of the current data. In fact, we can always maintain a small top heap of K size, and when data is added to the collection, we can compare it to the elements at the top of the heap. If it is larger than the heaptop element, we remove the heaptop element and insert it into the heap. If it is smaller than the top of the heap, it is not processed. In this way, whenever we need to query the current pre-K big data, we can immediately return to him.
Heap application three: use the heap to find the median
The heap is a data structure that allows us to do medians very efficiently without sorting. Let’s see, how does it do that?
We need to maintain two heaps, one big top heap and one small top heap. The big top heap stores the first half of the data, and the small top heap stores the second half of the data, and the data in the small top heap is larger than the data in the big top heap.
Thus, we can use two heaps, a big top heap and a small top heap, to achieve the operation of finding the median in a dynamic data set. The time to insert data is order (logn) because it involves heaping-but to find the median we just need to return the top element of the big top heap, so it’s order (1).
In fact, using two heaps can quickly compute not only the median, but also other percentiles, and the principle is similar.
Figure -30 | figure said: how to store the microblogging, WeChat social network of friends relationship?
The elements in a tree are called nodes, and the elements in a graph are called vertices. As you can see from the graph I drew, one vertex in the graph can be connected to any other vertex. We call this established relationship an edge.
Degree is the number of edges that are connected to a vertex. We just talked about the concept of degrees in undirected graphs, which is how many edges a vertex has. In directed graphs, we divide degrees into in-degree and out-degree.
In the weighted graph, each edge has a weight, which can be used to indicate the closeness between QQ friends.
Adjacency matrix storage method
Adjacency list storage method
Remember when we talked about the interchangeable design of time and space complexity? Adjacency matrices are space – wasting to store, but time – saving to use. Adjacency lists, on the other hand, are space-efficient to store but time-consuming to use.
The adjacency matrix storage method has the disadvantage of wasting space, but its advantage is high query efficiency and convenient matrix operation. In the adjacency list storage method, each vertex corresponds to a linked list, which stores other vertices connected to it. Although adjacency list saves storage space, linked list is not easy to search, so the query efficiency is not as high as adjacency matrix. To solve this problem, adjacency lists are also improved and upgraded, which will replace linked lists with more efficient dynamic data structures, such as balanced binary lookup trees, hop tables, hash tables and so on.
The basic data structure is array and linked list, and the more complex tree queue graph and so on can be stored by array and linked list and so on. The reason for the emergence of tree queue graph and other data structures is to solve the problem of high time complexity in the processing of some problems, so the data structure is born for algorithm
Breadth First Search (BFS) -31 | depth and the breadth first search: how to find out the relationship between three friends in the social network?
Breadth First Search, or BFS for short. Intuitively speaking, it is actually a “carpet” layer by layer of search strategy, that is, first find the closest to the initial vertex, and then the next closest, search out in turn. It’s not that hard to understand, so I’ve drawn a diagram that you can look at. (Layer by layer search)
Depth first search (DFS) – [31 | depth and the breadth first search: how to point out the three friends in social network relation?]
Depth-first-search, or DFS for short. The most intuitive example is “running a maze.”
Suppose you’re standing at a fork in a maze and trying to find the exit. You choose a fork in the road at random, and when you find that you can’t get there, you go back to the previous fork, choose a new road and continue walking until you finally find the exit. This approach is a depth-first search strategy. (First search to the end, no then go back to the next level of search)
String matching
-
- String matching: BF algorithm: violent algorithm Time complexity O(n*m) RK algorithm: Matching the hash value of the substring Time complexity O(n)
-
- String matching: BM Algorithm: Bad Characters and Good suffixes principle Time Complexity O(n)
-
- String matching: KMP algorithm time complexity O(n+m), only one next array O(n) extra space
-
- Trie tree: find prefix matching string -> automatic completion function, such as input method automatic completion function, IDE code editor automatic completion function, browser URL input automatic completion function
- 36. AC automata algorithm
algorithm
Recursive -10 | recursive: how to find out “at the end of the referee,” with three lines of code?
What is recursion?
- Recursion is a very efficient and concise coding technique. A very widely used algorithm, such as DFS depth-first search, front-middle-back-order binary tree traversal, etc., all use recursion.
- The way a method or function calls itself is called a recursive call, the call is called a recursion, and the return is called a return.
- Basically, all recursive problems can be expressed by recursive formulas, such as
- f(n) = f(n-1) + 1;
- f(n) = f(n-1) + f(n-2);
- f(n)=n*f(n-1);
Why use recursion? Pros and cons of recursion?
- Advantages: strong expression of code, simple to write.
- Disadvantages: high space complexity, stack overflow risk, double calculation, too many function calls will be time-consuming and other problems.
Third, what kind of problems can be solved recursively?
A problem can be solved recursively as long as the following three conditions are met:
- The solution of a problem can be decomposed into several subproblems. What are the subproblems? It’s a smaller data scale problem.
- Problems and sub-problems, except the data scale is different, the solution idea is exactly the same
- There are recursive termination conditions
How to achieve recursion?
1. Recursive code writing
The key to writing recursive code is to figure out how to break a big problem into smaller problems, write a recursive formula based on that, then refine the termination conditions, and finally translate the recursive formula and termination conditions into code.
2. Understand recursive code
When it comes to recursive code, trying to figure out the whole recursive process is a mistake.
So how do you understand recursive code? If A problem A can be decomposed into subproblems B, C, and D, you can assume that subproblems B, C, and D have been solved. Moreover, you only need to think about the relationship between the two layers of problem A and subproblem B, C and D, instead of thinking about the relationship between subproblems and subproblems and subproblems. It’s a lot easier to understand by masking the recursive details.
So, to understand recursive code, abstract it as a recursive formula, don’t think about layers of calls, don’t try to break down each step of the recursion with your brain.
Common problems and solutions of recursion
- Beware of stack overflows: You can declare a global variable to control the depth of recursion, thus avoiding stack overflows.
- Be wary of double counting: Avoid double counting by storing the evaluated values in some data structure.
How to rewrite recursion into non-recursive code?
In general, all recursive code can be rewritten as non-recursive iteration loops. How to do? The recursion formula, initial values and boundary conditions are abstracted and then implemented in an iterative loop.
Greedy algorithm -37 | greedy algorithm is: how to use greedy algorithm to achieve Huffman coding?
Divide-and-conquer algorithm38 | partition algorithm, to talk about a large-scale computing framework graphs of divide and conquer
Divide-and-conquer algorithm: Divide and conquer, divide the original problem into N smaller sub-problems with similar structure to the original problem, solve these sub-problems recursively, and then combine the results
Backtracking algorithm -39 | backtracking algorithm: learn from the movie “butterfly effect” the core of the backtracking algorithm
Backtracking algorithm: The backtracking algorithm is essentially enumeration, the advantage is that it is similar to the search strategy of crossing the river by feeling the stones, and can be pruned to reduce the number of unnecessary steps. It may be suitable for search scenarios where there are no rules, or where we don’t yet know the rules.
Dynamic programming -40 | dynamic programming: the person that how to solve the “double a” shopping together the single problem?
Knapsack problem
- Greed: a road to black, a chance, can only look at which side pleasing to the eye walk which side
- Backtracking: A road to black, numerous opportunities to start again, still afraid I can not get out (Snapshot View)
- Dynamic planning: With the perspective of God, holding an archive of the history of countless parallel universes while simultaneously developing countless futures
The algorithm problems involving push relationship can be solved by dynamic programming thinking, and can also be solved by recursion. The key is to pay attention to the algorithm performance, and store the calculation results of intermediate process through matrix array, so as to avoid unnecessary repeated calculation. In short, the recursion that eliminates double computation is dynamic programming