define
In a broad sense
A data structure is a storage structure for a group of data.
An algorithm is a set of methods for manipulating data.
In a narrow sense
It refers to some well-known data structures and algorithms, such as queue, stack, heap, binary search, dynamic programming, etc
Data structure and algorithm relationships
Data structures serve algorithms, which operate on specific data structures
For example, common binary search algorithms require arrays to store data because of the random access nature of arrays. But if we choose a data structure like a linked list, the binary lookup algorithm won’t work because linked lists don’t support random access
Complexity analysis
Limitations of ex post statistical methods
- Test results are highly dependent on the test environment
Hardware differences in the test environment can have a significant impact on test results
- Test results are greatly influenced by the size of the data
For small data sorts, insertion sort may actually be faster than quicksort
Large O complexity notation
int cal(int n) {
int sum = 0;
int i = 1;
int j = 1;
for (; i <= n; ++i) {
j = 1;
for(; j <= n; ++j) { sum = sum + i * j; }}}Copy the code
Lines 2, 3, and 4 each require a unit_time execution time,
Lines 5 and 6 loop n times, requiring 2n * unit_time,
Lines 7 and 8 loop n2 times, so 2 n2n^2n2 * unit_time is required. So, the total execution time of the whole code T(n) = (2n2n ^2n2+2n+3)*unit_time.
Formula:
The above example is expressed as T(n) = O(2n^2+2n+3).
The maximum O time complexity does not specifically represent the real execution time of the code, but the trend of code execution time with the increase of data size. Therefore, it is also called the asymptotic time complexity (also called time complexity).
When n is large, you can think of it as 10,000, 100,000. However, the low-order, constant and coefficient parts in the formula do not control the growth trend, so they can be ignored. We only need to record a maximum order of magnitude. If we use the big O notation to express the time complexity of the two pieces of code we just talked about, it can be written as T(n) = O(n); T(n) = O(n2n^2n2).
Constant time
Even if this code loops 10,000 times, 100,000 times, as long as it’s a known number, it doesn’t matter what n is, it’s still a constant execution time
Time complexity analysis
1. Focus only on the piece of code that loops the most
Ignore constants, low orders, and coefficients in the formula, and just record the magnitude of the largest order
When we analyze the time complexity of an algorithm or a piece of code, we only focus on the piece of code that has the most cycles. The order of n of the number of times the core code is executed is the time complexity of the entire code to be analyzed.
int cal(int n) {
int sum = 0;
int i = 1;
for (; i <= n; ++i) {
sum = sum + i;
}
return sum;
}
Copy the code
- Lines 2 and 3 are constant-level execution times, independent of the size of n, so it has no effect on complexity.
- Lines 4 and 5 are the ones that loop the most, so focus on that. These two lines of code are executed n times, so the total time is zero
O(n)
2. Rule of addition: The total complexity is equal to the complexity of the code with the highest order of magnitude
int cal(int n) {
int sum_1 = 0;
int p = 1;
for (; p < 100; ++p) {
sum_1 = sum_1 + p;
}
int sum_2 = 0;
int q = 1;
for (; q < n; ++q) {
sum_2 = sum_2 + q;
}
int sum_3 = 0;
int i = 1;
int j = 1;
for (; i <= n; ++i) {
j = 1;
for(; j <= n; ++j) { sum_3 = sum_3 + i * j; }}return sum_1 + sum_2 + sum_3;
}
Copy the code
The code is divided into three parts, respectively sum_1, sum_2, sum_3
Don’t analyze the time complexity of each section, then put them together and take the highest order of magnitude as the complexity of the whole code.
What is the time complexity of the first period? This code loops 100 times, so it’s a constant execution time, independent of the size of n.
The time complexity of the second and third code is O(n) and O(n2n^2n2).
The time complexity of three pieces of code, we take the highest order of magnitude. So, the time of the whole code is O(n2n^2n2).
3. Multiplication rule: The complexity of nested code is equal to the product of the complexity of both inside and outside of the nested code
In code, we can think of the product rule as a nested loop
int cal(int n) {
int ret = 0;
int i = 1;
for (; i < n; ++i) {
ret = ret + f(i);
}
}
int f(int n) {
int sum = 0;
int i = 1;
for (; i < n; ++i) {
sum = sum + i;
}
return sum;
}
Copy the code
Assuming that f() is just an ordinary operation, the time complexity of lines 4 to 6 is T1(n) = O(n).
But f() itself is not a simple operation, and its time complexity is T2(n) = O(n).
The time complexity of the whole CAL () function is T(n) = T1(n) * T2(n) = O(n*n) = O(n2n^2n2).
Analysis of several common time complexity examples
It can be roughly divided into two categories, polynomial and non-polynomial.
There are only two nonpolynomial magnitudes: exponential order O(2n2^n2n) and factorial order O(n!).
We call the algorithm problem with time complexity of non-polynomial order NP (non-deterministic Polynomial) problem
When the data size n becomes larger and larger, the execution time of the non-polynomial algorithm increases sharply
Non-polynomial time complexity algorithms are actually very inefficient algorithms
Let’s look at some common polynomial time complexity
1. O(1) Constant time complexity
We want the execution time of our code not to increase as n increases, so we’ll call the time complexity of our code O(1).
As long as there are no circular statements or recursive statements in the algorithm, the time complexity of this algorithm is equal to two o (1), even if there are thousands of lines of code.
2. O(logn), O(nlogn) log-order time complexity
i=1;
while (i <= n) {
i = i * 2;
}
Copy the code
The value of the variable I starts at 1 and is multiplied by 2 each time through the loop. When it’s greater than n
The value of the variable I is a geometric sequence
Solve for x by 2×2^x2x=n,x= log2n\log_2{n}log2n, so the time complexity of this code is O(log2n\log_2{n}log2n).
Whether it’s base two, base three, or base ten, we can write the time complexity of all logarithmic orders as order logn.
3. O(m+n) and O(m*n) are determined by the size of the two data
int cal(int m, int n) {
int sum_1 = 0;
int i = 1;
for (; i < m; ++i) {
sum_1 = sum_1 + i;
}
int sum_2 = 0;
int j = 1;
for (; j < n; ++j) {
sum_2 = sum_2 + j;
}
return sum_1 + sum_2;
}
Copy the code
The time complexity of the code above is order m plus n.
Spatial complexity analysis (only memory space related to N is counted)
The full name of the elliptic space complexity is the increasing relationship between the storage space and data size of an algorithm
Common spatial complexity is O(1), O(n), O(n2), logarithmic complexity such as O(logn), O(nlogn) is not usually used
void print(int n) {
int i = 0;
int[] a = new int[n];
for (i; i <n; ++i) {
a[i] = i * i;
}
for (i = n-1; i >= 0; --i) {
print out a[i]
}
}
Copy the code
In line 2, we request a space to store variable I, but it is of constant order and has nothing to do with size N, so we can ignore it.
Line 3 applies for an array of int type n, except that the rest of the code doesn’t take up any more space, so the entire code is O(n).
conclusion
Complexity is also called progressive complexity, including time complexity and space complexity, which is used to analyze the growth relationship between algorithm execution efficiency and data size. It can be roughly expressed as: The higher the algorithm of order complexity, the lower the execution efficiency
Analysis of best, worst, average and amortized time complexity
Best – and worst-case time complexity
// n indicates the length of the array
int find(int[] array, int n, int x) {
int i = 0;
int pos = -1;
for (; i < n; ++i) {
if (array[i] == x) pos = i;
}
return pos;
}
Copy the code
The complexity of the above code is O(n), where n represents the length of the array
// n indicates the length of the array
int find(int[] array, int n, int x) {
int i = 0;
int pos = -1;
for (; i < n; ++i) {
if (array[i] == x) {
pos = i;
break; // add break}}return pos;
}
Copy the code
If the first element in the array happens to be the variable x we’re looking for, then we don’t need to go through the remaining n-1 values, so the time is O(1).
But if there is no variable x in the array, then we need to iterate through the entire array, and the time is order n. Therefore, the time complexity of this code is different in different cases.
The best-case time complexity is, in the best case, the time complexity of executing this code
The worst-case time complexity is, in the worst case, the time complexity of executing this code
Average case time complexity
// n indicates the length of the array
int find(int[] array, int n, int x) {
int i = 0;
int pos = -1;
for (; i < n; ++i) {
if (array[i] == x) {
pos = i;
break; // add break}}return pos;
}
Copy the code
Let’s say the probability of being in the array and not being in the array is 1/2.
In addition, the probability that the data to be searched appears in the n positions from 0 to n-1 is the same, which is 1/n.
So, according to the probability product rule, the probability that the data you’re looking for is anywhere from 0 to n-1 is 1 over 2n.
This is the weighted average of probability theory, also known as the expected value,
So the full name for average time complexity should be weighted average time complexity or expected time complexity.
In big O notation, the weighted average time complexity of this code is still O(n) after removing the coefficients and constants.
For the most part, we don’t need to distinguish between best case, worst case, and average case time complexity
We use these three complexity representations only when the time complexity of the same piece of code varies by an order of magnitude in different cases
Amortize the time complexity
Amortized analysis, the time complexity that we get from amortized analysis is called amortized time complexity.
// array represents an array of length N
// Array. length is equal to n
int[] array = new int[n];
int count = 0;
void insert(int val) {
if (count == array.length) {
int sum = 0;
for (int i = 0; i < array.length; ++i) {
sum = sum + array[i];
}
array[0] = sum;
count = 1;
}
array[count] = val;
++count;
}
Copy the code
This code implements a function to insert data into an array.
When the array is full (count == array.length), we loop through the sum through the for loop, emptying the array, and placing the sum at the top of the array
New data is then inserted. But if the array has free space to begin with, the data is inserted directly into the array
In the best case, we have free space in the array, and we just insert the data into the array at index count, so the best case is order one.
In the worst case, there is no free space in the array, so we need to do the sum of the array first, and then insert the data, so the worst case is order n.
The average time complexity is O(1): Assuming the length of the array is N, we can divide it into n cases depending on where the data is inserted, and the time complexity of each case is O(1). In addition, there is the “extra” case where you insert data when there is no free space in the array, which is O(n) time. And all of these n+1 scenarios have the same probability of happening, which is 1 over n+1. Therefore, according to the calculation method of weighted average, the average time complexity obtained by us is:
O(n) insert is the worst case summation empty insert and O(1) insert is the best case direct insert
Each O(n) insertion operation will be followed by n-1 O(1) insertion operation. Therefore, if the time-consuming operation is amortized to the next n-1 less time-consuming operation, the amortization time complexity of this group of continuous operations is O(1), which is the general idea of amortization analysis
Where amortized time complexity analysis can be applied, the general amortized time complexity is equal to the best-case time complexity.
// Global variable, array of size 10, length len, subscript I.
int array[] = new int[10];
int len = 10;
int i = 0;
// Add an element to the array
void add(int element) {
if (i >= len) { // The array space is running out
// Reapply for an array space twice the size
int new_array[] = new int[len*2];
// Copy the data from the original array to new_array
for (int j = 0; j < len; ++j) {
new_array[j] = array[j];
}
// new_array is copied to array. Array is now twice as big as len
array = new_array;
len = 2 * len;
}
// Place element at subscript I, subscript I plus one
array[i] = element;
++i;
}
Copy the code
PS: Learn the beauty of data structure and algorithm from Teacher Wang Zheng