Chapter 1 data structure

  • 1. Several easily extended linked lists. (1) Use a pointer at the end of the list and make it point to the data at the head of the list to turn the list into a ring. This is called a circular list, also known as a circular list. Circular lists have no concept of heads and tails. Such lists are often used when you want to keep a fixed amount of up-to-date data. (2) Set the Pointers to two, and let them respectively point to the data before and after, this is the “two-way linked list”. With this list, you can easily traverse data not only from front to back, but also from back to front. However, there are two disadvantages of bidirectional linked lists. First, the increase of Pointers will increase the storage space demand. Second, more Pointers need to be changed when adding and deleting data.
  • 2. In the hash table, we can use the hash function to quickly access the target data in the array. If a hash conflict occurs, a linked list is used for storage. That way, we can be flexible with whatever amount of data we have. If the array space is too small, the hash table is prone to collisions, and linear lookups will be used more frequently. On the other hand, if the array space is too large, there will be many empty boxes, resulting in a waste of memory. Therefore, it is important to set the appropriate space for the array.
  • 3. In the process of storing data, if a conflict occurs, you can use linked lists to insert new data after existing data to resolve the conflict. This method is called “chain address method”. In addition to the chain address method, there are several ways to resolve conflicts. Among them, “open address method” is widely used. This method means that when a conflict occurs, an alternate address (the location in the array) is immediately calculated and stored in it. If there is still a conflict, the calculation of the next alternate address continues until there is a free address. Alternate addresses can be calculated using methods such as multiple hash functions or “linear probing.”

The description of hash functions will mention the condition that the original value cannot be inferred from the hash value. However, this is only a condition to be aware of when using hash tables for security purposes such as passwords, not a mandatory rule to follow when using hash tables.

  • 4. Examples of heaps. The numbers in the nodes are the stored data. Each node in the heap has at most two children. The shape of the tree depends on the number of data. In addition, the nodes are arranged from top to bottom and from left to right in the same row.
  • 5. When storing data in the heap, there is a rule that children must be larger than parents. Therefore, the minimum value is stored in the top root. When adding data to the heap, to comply with this rule, the new data is typically placed on the bottom row to the left. When there is no more space on the bottom line, add the data on the left end of the line.
  • 6. The data at the top of the heap is always the smallest, so no matter how much data there is, the time complexity of extracting the minimum value is O(1).

In addition, since the last data needs to be moved to the top after the data is extracted, and then it is moved down while comparing its size with the child data, the running time required to extract the data is proportional to the height of the tree. Assuming that the amount of data is N and the height of the tree is log 2 N according to the shape characteristics of the heap, the time complexity of tree reconstruction is O(logn). The same goes for adding data. When data is added at the end of the heap, it compares its size with the parent data and moves up until the heap condition is satisfied, so the running time required to add data is proportional to the height of the tree, also O(logn).

  • 7. Using a heap can be handy if you need to frequently extract minimum values from managed data. The Dikstra algorithm, for example, selects the vertex closest to the starting point from a list of alternate vertices at each step. At this point, the heap can be used in the selection of vertices.
  • 8. Binary search trees have two properties. The first is that each node has a value greater than any other node in its left subtree. The second is that the value of each node is less than the value of any node in its right subtree.

From these two properties the following conclusions can be drawn. First, the smallest node in a binary search tree starts at the top and goes to the lower left end. Conversely, the largest node in a binary search tree starts at the top and goes to the lower right end.

  • 9. Consider binary search tree as the tree structure of binary search algorithm. Because of the two properties mentioned above, when looking for data or looking for a good place to add data, you can simply compare the size of the data to the existing data and know which way to move based on the comparison result.

The number of comparisons depends on the height of the tree. So if you have n nodes, and the shape of the tree is balanced, the maximum number of times you compare the size and move is log 2 n. Therefore, the time complexity is O(logn). However, if the shape of the tree extends vertically in one direction, the tree becomes very tall and the time complexity becomes O(n).

  • 10. There are many data structures that extend binary lookup trees, such as “balanced binary lookup trees”. This data structure can correct unbalanced trees and keep them in balanced shape to improve search efficiency.

In addition, although a node in the binary search tree introduced in this paper has at most two children, we can extend the number of child nodes to m (m is a preset constant). A tree like this, where the seed number can be set freely and the shape is balanced, is a B tree.

Chapter 2 Sorting

  • 1. The time complexity of bubble sort is O(n^2), selection sort is O(n^2), insertion sort is O(n^2), heap sort is O(nlogn), merge sort is O(nlogn)
  • 2. Quicksort is a divide-and-conquer method with an average running time of O(nlogn)

Chapter 4 graph search

  • 1. Depth-first search is an in-depth search that continues down a path. Although breadth-first search and depth-first search are very different in search order, there is only one difference in operation procedure, that is, which alternate vertex is chosen as the base of the next vertex.

Breadth-first search selects the earliest vertex to become a candidate, because the closer the vertex is to the starting point, the earlier the vertex will become a candidate, so the search will start from the place near the starting point in order; Depth-first searches select the most recent vertices to become candidates, so they work their way down the newly discovered path.

  • 2. If the number of vertices in the graph is set as N and the number of edges is set as M, what is the time complexity of Behrman-Ford algorithm? The algorithm will stop after n rounds of update operation, and each edge needs to be confirmed once in each round of update operation. Therefore, the time spent in one round of update is O(m), and the overall time complexity is O(nm).
  • 3. If the sum of the weights of the edges in a closed loop is negative, the weight of the path will decrease as long as the loop is continuously traversed, which means that there is no shortest path at all. In the case that the vertex can be updated after n times, it can be directly regarded as “there is no shortest path”.
  • 4. Compared with The Behrman-Ford algorithm, which requires repeated calculation and updating of weights for all edges, the Dikstra algorithm has one more step to select vertices, which makes it more efficient in finding shortest paths.

If the number of vertices in the graph is set as N and the number of edges is set as m, the time complexity of the algorithm is O(n 2) without any prior processing. However, if the data structure is optimized, the time complexity becomes O(m+nlogn).

  • 5. In general, when there is no negative weight, it is more suitable to use the highly efficient Dikstra algorithm, while when there is negative weight, even if it is time-consuming, it should also use the Behrman-Ford algorithm that can get the correct answer.
  • 6.A* (A-star) algorithm is also an algorithm to solve the shortest path problem in the graph, developed from the Dikstra algorithm.

The Dikstra algorithm works out the shortest path from the vertex to each vertex, starting from the vertex near the starting point. In other words, the shortest paths of some vertices farther from the destination will also be calculated, but this part is actually useless. A*, on the other hand, preestimates A value and uses it to save some useless calculation.

  • 7. If we can get some heuristic information about the approximate distance between each vertex and its destination (this distance does not need to be an exact value), we can use A* algorithm. Of course, sometimes this kind of information is completely impossible to estimate, and then A* algorithm cannot be used.

The closer the estimated distance is to the actual value from the current vertex to the end point, the higher the search efficiency of A* algorithm is. Conversely, if the distance estimate is significantly different from the actual value, the efficiency of the algorithm may be even lower than that of the Dikstra algorithm. If the gap were much wider, you might not even get the right answer. However, when the distance estimate is smaller than the actual distance, the correct answer can always be obtained (although the efficiency is worse if the distance estimate is not set properly).

Chapter 5 Security algorithm

  • 1. Feature 1 of hash function – The length of the output hash data remains unchanged. 2- If the input data is the same, the output hash must be the same. 3- The third characteristic is that even if the input data is similar, even if they are only one bit different, the output hash value can be very different. Similar input data does not result in similar output hashes. 4- The fourth characteristic is that even if the two inputs are completely different, the hash value of the output may be the same, although this is less likely. This situation is called “hash conflict”. 5- It is impossible to work backwards from the hash to deduce the original data. The irreversibility of inputs and outputs is very different from encryption. 6- The hash is relatively easy to compute.
  • 2. The representative algorithms of hash function are MD5 ①, SHA-1 ② and SHA-2, etc. Sha-2 is widely used, but MD5 and SHA-1 are not recommended because they have security risks. Different algorithms also compute differently, as SHA-1, for example, requires hundreds of addition and shift operations to generate a hash.
  • 3. Shared key encryption is an encryption mode in which both encryption and decryption use the same key. This algorithm is also known as “symmetric encryption” because it uses the same key.
  • 4. The shared key encryption algorithms include Caesar password, AES ①, DES ②, and dynamic password, among which AES is the most widely used.
  • 5. Public-key encryption is an encryption method that uses different keys for encryption and decryption. This algorithm is also known as “asymmetric encryption” because of the different keys used. The encryption key is called the public key, and the decryption key is called the private key.
  • 6. Public key encryption algorithms include RAS algorithm and elliptic curve encryption algorithm, among which RSA algorithm is the most widely used.
  • 7. If shared key encryption is used, the number of required keys increases rapidly as the number of sender users increases. In the example in the previous section, there were only 2 people, so only 2 keys were needed, but 10 for 5 people and 4950 for 100 people (n = number of people, so the number needed is n*(n+1)/2).

Chapter 6 clustering

  • 1. In the K-means algorithm, with the continuous repetition of operations, the location of the central point must converge somewhere, which has been proved on the mathematical level.
  • 2. Even if the number of clusters is the same, as long as the initial location of randomly set center points is different, the clustering results will also change.

Therefore, we can constantly try the K-means algorithm by changing the randomly set center point position, and then select the most appropriate clustering result.

  • 3. In addition to k-means algorithm, there are many clustering algorithms, among which “hierarchical clustering algorithm” is famous. Different from k-means algorithm, hierarchical clustering algorithm does not need to set the number of clusters in advance.

In hierarchical clustering, each data is initially classified into its own category. In other words, n data will form n clusters. Then repeat the operation “merge the two nearest clusters into one” n-1 times. For each execution, the number of clusters decreases by 1. After n-1 execution, all data is grouped into a cluster. In this process, the number of clusters at each stage is different, and the corresponding clustering results are also different. Just pick the one that makes the most sense.

  • 4. When merging clusters, define the distance between clusters in order to find the two nearest clusters. Depending on the definition method, there will be “shortest distance method”, “longest distance method”, “intermediate distance method” and other algorithms.

Excerpt from [My First Algorithm Book], [Japan] Ishida, Shuichi Miyazaki, translated by Zhang Bei (Posts and Telecommunications Press).

Purchase Address:

  • Jingdong – item.jd.com/12789121.ht…
  • Dangdang – product.dangdang.com/29190985.ht…