preface

Program = data structure + algorithm, good algorithm can make the program run more efficiently; In today’s era of data and information, data analysis and data processing are inevitable, and algorithms have become the threshold requirements of many companies, especially large factories;

Hurry up, maybe it is only one step away from the big factory (algorithm)~~~

Introduction of algorithm

An algorithm is a set of instructions for completing a task, and any snippet of code can be considered an algorithm. As follows:

1. Five features of the algorithm

  • Finitude: An algorithm must end after a finite number of steps, each of which can be completed in a finite amount of time. Popular understanding is that there can be no such thing as an infinite loop, resulting in the algorithm can not end.
  • Deterministic: Each instruction in the algorithm must have an exact meaning. For the same input, only the same output can be produced.
  • Feasibility: Each step described in the algorithm can be implemented a limited number of times by performing the basic operation already implemented.
  • Input: An algorithm has zero or more inputs. Just like a method, you can either pass no arguments, or you can pass arguments. Zero input means that the algorithm itself has initial conditions.
  • Outputs: An algorithm has one or more outputs that correspond to the inputs. An algorithm without output is meaningless.

A good algorithm should also have the following characteristics:

  • Correctness: Can solve the problem correctly, the result is correct;
  • Readability: algorithm implementation steps are easy to understand;
  • Robustness: The algorithm can handle exceptions; For example, when the input is illegal, the algorithm can give corresponding processing;
  • High efficiency, low storage: low time complexity, low space complexity; That is, it runs fast and takes up less memory.

2. Criteria for measuring the quality of an algorithm

To measure the quality of an algorithm, it can be judged from two dimensions:

  • Time complexity: estimate the time cost of the algorithm in advance; The amount of data, hardware configuration, program language and other factors will directly affect the execution time of an algorithm. For example, an algorithm with a small amount of data will be faster than an algorithm with a large amount of hardware configuration. Therefore, the specific time after the algorithm is executed cannot be used to measure the performance of an algorithm. An algorithm whose level of time overhead can be estimated (independent of other external conditions) is usually represented by the big O notation. Here is an example: The above method, for ease of understanding, assumes that each step takes 1ms. When n=1000 is passed in, each step takes 1ms. ② Because each cycle needs to be judged, 1001 times are needed; Use 1001 ms; ③ and ④ need to be executed 1000 times respectively in the circulation body, consuming 2000ms in total; So the total time is: T(1000)=1+1001+2*1000; The specific time is related to the incoming N, so the total time is: T(n)=1+(n+1)+2n; Here T stands for time, and usually we don’t talk about time complexity in terms of units. In order to be more concise and intuitive, the big O notation will be used, removing the constant part and the coefficient part, as follows: T(n)=1+(n+1)+2n=O(n); Because when n is large enough, the coefficient and constant have little effect on the measure of the algorithm; I won’t go into details here;
  • Space complexity: the order of magnitude of memory overhead after the algorithm is executed is estimated in advance. Space complexity and time complexity is similar, also can use the big O, said just this represents the memory consumed by a algorithm, such as int four bytes, pictured above, used in the intermediate variable nResult, without considering other cases, the capacity of 4 bytes consumed, with big O notation, remained and constant coefficient, O(1) for constants;

For time complexity and space complexity, the smaller the corresponding quantity level is, the more efficient the algorithm is. The order from good to bad is often encountered as follows:

O(1)<O(log2 n)<O(n)<O(nlog2 n)<O(n2)<O(n3)<O(2n)<O(n!) <O(nn)

3. Stability of the algorithm

If there are two equal elements A and B in the data to be sorted, A is ahead of B before sorting, and if A is still ahead of B after A certain sorting algorithm is used, the sorting algorithm is said to be stable, otherwise it is unstable. But stability can not be used to measure the quality of an algorithm, only a property of the algorithm, for some scenarios, do not care about the order of two equal elements.

Start with sort

Sort in the actual development of the use of more, let’s start with this; Sort is divided into internal sort and external sort:

  • Internal sort: during sort all elements are stored in memory for sorting; Common insert sort, swap sort, and selection sort are all internal sorts.
  • External sort: Elements cannot all be stored in memory during sort and must be constantly moved between internal and external memory during sort.

1. Let’s talk about direct insert sort

1.1 Algorithm idea

Insert sort is to insert the data to be sorted into a previously sorted subsequence, initially consider the first element is the sorted sequence, compare, and then insert to the appropriate position, until the sorting is complete.

The key to insert sort is as follows:

  • The data to be sorted is divided into three parts: the sorted data, the data to be inserted next, and the data to be sorted.
  • Each time, a data to be inserted from the data to be sorted is taken out and placed in the sentry position;
  • Compare the sentry position data (in fact, the data to be inserted) with the sorted data. If the conditions are met, the data will be inserted to the corresponding position, and other data will be uniformly shifted backward.

1.2 Algorithm implementation and analysis

The algorithm code is as follows (in ascending order) :

The result is as follows:

Parse the sort step process as follows:

Step description:

The green box represents the sorted list, the arrow is the next element to be inserted, and the yellow box represents the remaining unordered elements. The yellow squares represent the data moved each time, and the green squares represent the space vacated by the last ordered list.

  • Copy the original data array to arrayB in the new array. The main purpose of this step is not to declare additional temporary variables later, but also to achieve simple logic for the subsequent core code and reduce excessive judgment.
  • Step 1 takes the first element as an ordered list (the first element is 2), the next element to be inserted is 5, and places 5 in the sentinel position, where the index is 0; It then iterates through the elements in the ordered list, comparing them to the sentinel’s value 5. There are only 2 and 5 comparisons, 2 is less than 5, so there is no need to change the position;
  • In step 2, there are 2 and 5 elements in the ordered list. The next element to be inserted is 6. Place 6 in the sentry position, where the index is 0. The elements in the ordered list (2 and 5) are all less than 6 compared to the sentry value 6, so there is no need to change position;
  • In step 3, the elements in the ordered list are 2, 5, and 6. The next element to be inserted is 1. Place 1 in the sentry position, that is, the position with index 0. The elements in the ordered list (2, 5, 6) are then iterated, compared to the sentry value 1; In step 3-1, since it is traversal in reverse order, 6 in the ordered list is compared with 1 first. 6 is greater than 1, so 6 moves back one bit. In steps 3 to 2, we continue iterating, comparing 5 to 1 in the ordered list, 5 is greater than 1, so 5 moves back one bit; At steps 3-3, we continue iterating, comparing 2 to 1 in the ordered list, 2 is greater than 1, so 2 moves back one bit; In steps 3-4, after traversing the elements in the ordered list, the elements to be inserted are equal to those in the sentry position, and the traversal is terminated. Then assign the element of the sentinel position (currently sentinel position 1) to the free space (index bit 1);
  • In step 4, the elements in the ordered list are 1, 2, 5, and 6. The next element to be inserted is 9. Put 9 in the sentry position, that is, the position with index 0. Then the elements in the ordered list (1, 2, 5, 6) are all smaller than the value of 9 of the sentinel position, so there is no need to insert, the position remains the same;
  • In step 5, the elements in the ordered list are 1, 2, 5, 6, and 9. The next element to be inserted is 3. Place 3 in the sentry position, that is, the position with index 0. The elements in the ordered list (1, 2, 5, 6, 9) are then iterated, compared to the sentry value 3; In step 5-1, since it is traversal in reverse order, 9 in the ordered list is first compared with 3. 9 is greater than 3, so 9 moves back one bit. In steps 5-2, continue iterating, comparing 6 to 3 in the ordered list, 6 is greater than 3, so 6 moves back one bit; At steps 5-3, continue iterating, comparing 5 to 3 in the ordered list. 5 is greater than 3, so 5 moves back one bit; Step 5-4, continue traversal, compare 2 and 3 in the ordered list, 2 is less than 3, terminate traversal; Then assign the element of the sentinel position (currently sentinel position 3) to the free space (index position 3);

After the completion of step 5, the sorting of the unordered elements in the yellow box has been completed, and the sorting is complete; The final result is one, two, three, five, six, nine.

This comparison with the diagram to see the details, is not a lot easier to understand.

If you do not understand the above code, you can use the method of defining temporary variables as sentinels. The steps are basically the same as above, but the sentinels are different, as follows:

1.3 Algorithm Analysis

Mainly from the time complexity, space complexity, whether the stability of the analysis:

Time complexity

Time complexity is analyzed from the best, average and worst cases;

Best time complexity: the incoming data is ordered (consistent with the final result), so every time traversal, can find the location, so the best time complexity of insertion sort is O(n), related to the number of elements passed;

Worst time complexity: The incoming data is completely opposite to the desired result, so two cycles are needed each time to find the right place to insert, so the worst time complexity is O(n2);

The average time complexity is O(n2);

Spatial complexity

In the core part of the algorithm, only a few fixed intermediate variables (I,j, arrayB [0]) are used, so the memory consumed in the algorithm is a constant, and the space complexity is O(1).

The stability of

In the algorithm process, the less than symbol is used for comparison, and the judgment is terminated when the data is equal, so the original data order will not be affected, so the direct insertion sort is stable.

To sum up, insertion sort is a stable algorithm whose time complexity is O(n2) and space complexity is O(1).

conclusion

The first part reviewed the knowledge about algorithms, and then ended with a simple direct insertion sort. After that, other algorithms will be summarized in turn, or the way of illustration and explanation, to make each algorithm easier to learn.

Thanks for your likes, favorites and comments. Continue next time ~~~