Complexity analysis in interviews

What exactly is the Big O

N indicates the data scale
O of f of n, f sub n is a function of n. Represents the number of instructions required to run the algorithm, proportional to f(n).

It doesn’t have much to do with constant terms like A.B.C.D. It depends on what level it is.

Algorithm A: O(n) Number of required instructions: 10000N Algorithm B: O(N ^2) Number of required instructions: 10N ^2

N gets bigger and bigger. The instruction number of algorithm a.b changes.

When n is greater than some critical point, A must be greater than B. It’s a difference of magnitude.
Highly complex algorithms may have an advantage over others and make sense when data volumes are small.
- For all advanced sorting algorithms, insertion sorting can be optimized when the data size is small enough. 10% – 15%. Detail optimization.
Different time complexity increases with the data size

convention

In academia, strictly speaking, O(f(n)) represents the upper bound of algorithm execution
Merge sort is order nlogn, which is order n^2.
c.nlogn < a.n^2
In the industry, we use O to represent the lowest upper bound on algorithm execution
We don’t usually say merge sort is order n^2

example

Whichever prevails:

O( nlogn + n ) = O( nlogn )

O( nlogn + n^2 ) = O( n^2 )

The above formula requires the algorithm to deal with the same n (O(AlogA + B), O(AlogA + B^2)).
- I can’t leave this out
- Iterate over the graph implemented by the adjacency list:
- Time complexity: O(V + E) V is the number of vertices, E is the number of edges.
- Dense graph, even complete graph. E is almost V^2.

A time complexity problem

You have an array of strings, and you sort each string in the array alphabetically; The entire string array is then sorted lexicographically. The time complexity of the entire operation?

Each string n*nlogn + the entire string array: nlogn
- Error string length confused with array length

Assume that the longest string length is s; Sort each string in array with n strings: O(slogs) Sort each string in array alphabetically: O(n*slog(s))

Sort the entire string array lexicographically: O(s*nlog(n))
- Explanation: Understanding of the time complexity of sorting algorithms:
- Nlogn is the number of comparisons. Sorting an integer array only takes nlogn comparisons.
- Because two integers compare O(1) to each other. Two strings compare differently O(s).
O(nslog(s)) + O(snlog(n)) = O( nslogs + snlogn )= O( ns(logs+logn) )
Lexicographical sorting of string arrays. Compare nlogn times. Each comparison takes O(s) time.

Algorithm complexity is in some cases use-case dependent

Insertion sort O(n^2)

Worst case: O(n^2)
Best case: O(n) : nearly orderly
Average case: O(n^2)

Quicksort algorithm O(nlogn)

Worst case: O(n^2) is not random. The orderly
Best case: O(nlogn) randomize the standard point
Average case: O(nlogn)

Rigorous algorithms are best worst averaging. We often focus on the majority. You just have to know the extremes.

The concept of data size

Do a selection sort on 10^5, and the computer freezes?

#include <iostream>
#include <cmath>
#include <ctime>

using namespace std;

int main(a) {

    // The data size is increased 10 times each time to test
    // If you are interested, you can also try to increase the data size by 2 times each time
    for( int x = 1 ; x <= 9 ; x ++ ){

        int n = pow(10, x);

        clock_t startTime = clock();

        long long sum = 0;
        for( int i = 0 ; i < n ; i ++ )
            sum += i;
        clock_t endTime = clock();

        cout << "sum = " << sum << endl;
        cout << "10 ^" << x << ":"
             << double(endTime - startTime)/CLOCKS_PER_SEC
             << " s" << endl << endl;
    }
    return 0;
}    
Copy the code

The results

sum = 45 10^1 : 2e-06 s sum = 4950 10^2 : 1e-06 s sum = 499500 10^3 : 4e-06 s sum = 49995000 10^4 : 2.9e-05 s sum = 4999950000 10^5:0.000305s sum = 499999500000 10^6:0.003049s sum = 49999995000000 10^7: 0.029234s sum = 4999999950000000 10^8:0.308056s sum = 499999999500000000 10^9:2.98528sCopy the code

If you want to solve the problem in 1s

The O(n^2) algorithm can handle about 10^4 levels of data;
The O(n) algorithm can handle data of about 10^8 levels;
O(nlogn) algorithms can handle data at about 10^7 levels

Because what we just did was very simple, just simple addition. So normally you have to underestimate a little bit, and divide by 10

Spatial complexity

Open one more auxiliary array: O(n)
Open one more auxiliary two-dimensional array: O(n^2)
Multi-open constant space: O(1): sort in place array
Recursive calls have a spatial cost:
- The function before the recursive call is pushed onto the system stack.

Common complexity analysis

O(1) There is no change in data size

// O(1)
void swapTwoInts( int &a , int &b ){
    int temp = a;
    a = b;
    b = temp;
    return;
}
Copy the code

The number of O(n) cycles is C.N. C is a constant that doesn’t necessarily have to be greater than 1

// O(n) Time Complexity
int sum( int n ){

    int ret = 0;
    for( int i = 0 ; i <= n ; i ++ )
        ret += i;
    return ret;
}
Copy the code

O(n)

The number of loops is 1/2 x n string flips. Abc-cba. First and last. Second and penultimate scan half swap: 1/2*n swap: O(n)

void reverse( string &s ){

    int n = s.size();
    for( int i = 0 ; i < n/2 ; i ++ )
        swap( s[i] , s[n-1-i] );
    return;
}
Copy the code

O(n^2) selects sorting methods. O(n^2) double loop: first to n. Second weight to n. Both +1. The number of instructions executed is proportional to n^2.

i = 0; J performs n-1 arithmetic summation

(n-1) + (n-2) + (n-3) + … + 0
= (0+n-1)*n/2
= (1/2)n*(n-1)
= 1/2*n^2 - 1/2*n
= O(n^2) 


// O(n^2) Time Complexity
void selectionSort(int arr[], int n){

    for(int i = 0 ; i < n ; i ++){
        int minIndex = i;
        for( int j = i + 1 ; j < n ; j ++ )
            if( arr[j] < arr[minIndex] )
                minIndex = j;

        swap( arr[i] , arr[minIndex] );
    }
}
Copy the code

30N base operations: O(n) because the second loop is fixed and not affected by n.

// O(n) Time Complexity
void printInformation(int n){

    for( int i = 1 ; i <= n ; i ++ )
        for( int j = 1 ; j <= 30 ; j ++ )
            cout<<"Class "<<i<<" - "<<"No. "<<j<<endl;
    return;
}
Copy the code

O (logn) finds the middle element of an ordered array to determine the relationship between the elements and the middle element. If you don’t find it, you can throw away half of it.

Binary search method

// O(logn) Time Complexity
int binarySearch(int arr[], int n, int target){

    int l = 0, r = n-1;
    while( l <= r ){
        int mid = l + (r-l)/2;
        if( arr[mid] == target ) return mid;
        if( arr[mid] > target ) r = mid - 1;
        else l = mid + 1;
    }
    return -1;
}
Copy the code

O(logn)

N is equal to 0 after how many times you divide by 10? Log of 10n = O(log of n) while we divide by 10 until we end at 0. Reverse (s) complexity: 1/2 n switching operations. The number of bits in the string s, consistent with n.

string intToString( int num ){

    string s = "";
    string sign = "+";
    if( num < 0 ){
        num = -num;
        sign = "-";
    }

    while( num ){
        s += '0' + num%10;
        num /= 10;
    }

    if( s == "" )
        s = "0";

    reverse(s);
    if( sign == "-" )
        return sign + s;
    else
        return s;
}
Copy the code

O(nlogn)

The second loop is n and the first loop is size+=size which is multiplied by 2.log2n

// O(nlogn) void hello(int n){ for( int sz = 1 ; sz < n ; sz += sz ) for( int i = 1 ; i < n ; i ++ ) cout<<"Hello, Algorithm!" <<endl; }Copy the code

Order SQRT (n) x goes from n to the square root of n

// O(sqrt(n)) Time Complexity
bool isPrime( int num ){

    for( int x = 2 ; x*x <= num ; x ++ )
        if( num%x == 0 )
            return false;
    return true;
}

bool isPrime2( int num ){

    if( num <= 1 ) return false;
    if( num == 2 ) return true;
    if( num%2 == 0 ) return false;

    for( int x = 3 ; x*x <= num ; x += 2 )
        if( num%x == 0 )
            return false;

    return true;
}
Copy the code

Complexity experiments.

We thought we were writing an algorithm for order N logn, but it was order n^2, right?

If you want to solve the problem within 1s:

The O(n2) algorithm can handle data of about 10^4 levels;
The O(n) algorithm can handle data of about 10^8 levels;
O(nlogn) algorithms can handle data at about 10^7 levels

The difference in constants ahead can be large.

2. To experiment or observe trends:

Double the size of the data at a time and watch the time change

Four algorithms of different complexity.

namespace MyAlgorithmTester{ // O(logN) int binarySearch(int arr[], int n, int target){ int l = 0, r = n-1; while( l <= r ){ int mid = l + (r-l)/2; if( arr[mid] == target ) return mid; if( arr[mid] > target ) r = mid - 1; else l = mid + 1; } return -1; } // O(N) int findMax( int arr[], int n ){ assert( n > 0 ); int res = arr[0]; for( int i = 1 ; i < n ; i ++ ) if( arr[i] > res ) res = arr[i]; return res; } // O(NlogN) bottom-up void __merge(int arr[], int l, int mid, int r, int aux[]){for(int I = l; i <= r ; i ++) aux[i] = arr[i]; int i = l, j = mid+1; for( int k = l ; k <= r; k ++ ){ if( i > mid ) { arr[k] = aux[j]; j ++; } else if( j > r ){ arr[k] = aux[i]; i ++; } else if( aux[i] < aux[j] ){ arr[k] = aux[i]; i ++; } else { arr[k] = aux[j]; j ++; } } } void mergeSort( int arr[], int n ){ int *aux = new int[n]; for( int i = 0 ; i < n ; i ++ ) aux[i] = arr[i]; for( int sz = 1; sz < n ; sz += sz ) for( int i = 0 ; i < n ; i += sz+sz ) __merge(arr, i, i+sz-1, min(i+sz+sz-1,n-1), aux ); delete[] aux; return; } // N (N^2) select sort void ectionsort (int arr[], int N){for(int I = 0; i < n ; i ++){ int minIndex = i; for( int j = i + 1 ; j < n ; j ++ ) if( arr[j] < arr[minIndex] ) minIndex = j; swap( arr[i] , arr[minIndex] ); } return; }}Copy the code

Code to generate the test case:

namespace MyUtil { int *generateRandomArray(int n, int rangeL, int rangeR) { assert( n > 0 && rangeL <= rangeR ); int *arr = new int[n]; srand(time(NULL)); for (int i = 0; i < n; i++) arr[i] = rand() % (rangeR - rangeL + 1) + rangeL; return arr; } int *generateOrderedArray(int n) { assert( n > 0 ); int *arr = new int[n]; for (int i = 0; i < n; i++) arr[i] = i; return arr; }}Copy the code

The test is O(N) level

Int main() {// findMax // O(n) cout<<"Test for findMax:"<<endl; for( int i = 10 ; i <= 26 ; i ++ ){ int n = pow(2,i); int *arr = MyUtil::generateRandomArray(n, 0, 100000000); clock_t startTime = clock(); MyAlgorithmTester::findMax(arr, n); clock_t endTime = clock(); cout<<"data size 2^"<<i<<" = "<<n<<"\t"; cout<<"Time cost: "<<double(endTime - startTime)/CLOCKS_PER_SEC<<endl; delete[] arr; } return 0; }Copy the code

Running results:

Test forfindMax: data size 2^10 = 1024 Time cost: 5e-06 s data size 2^11 = 2048 Time cost: 7E-06 s data size 2^12 = 4096 Time cost: 1.2e-05 s data size 2^13 = 8192 Time cost: 2.5E-05 s data size 2^14 = 16384 Time cost: 2.7e-05 s data size 2^15 = 32768 Time cost: 9.2e-05 s data size 2^16 = 65536 Time cost: 0.000169s data size 2^17 = 131072 Time cost: 0.000431 s data size 2^18 = 262144 Time cost: 0.003037 s data size 2^19 = 524288 Time cost: 0.001325s data size 2^21 = 1048576 Time cost: 0.002489s data size 2^21 = 2097152 Time cost: 0.005739s data size 2^22 = 4194304 Time cost: 0.011373s data size 2^23 = 8388608 Time cost: 0.019566s data size 2^24 = 16777216 Time cost: 0.040289s data size 2^25 = 33554432 Time cost: 0.095169s data size 2^26 = 67108864 Time cost: 0.201682 s data size 2^27 = 134217728 Time cost: 0.330673s data size 2^28 = 268435456 Time cost: 0.750136sCopy the code

N has tripled. The time is roughly doubled, so the algorithm is O(n) level.

Is it order n^2?

Int main() {// selectionSort (n^2) cout<<"Test for selectionSort:"<<endl; for( int i = 10 ; i <= 15 ; i ++ ){ int n = pow(2,i); int *arr = MyUtil::generateRandomArray(n, 0, 100000000); clock_t startTime = clock(); MyAlgorithmTester::selectionSort(arr,n); clock_t endTime = clock(); cout<<"data size 2^"<<i<<" = "<<n<<"\t"; cout<<"Time cost: "<<double(endTime - startTime)/CLOCKS_PER_SEC<<endl; delete[] arr; } return 0; }Copy the code

Run result: approximately 4 times

Test forSelection Sort: data size 2^10 = 1024 Time cost: 0.001581 s data size 2^11 = 2048 Time cost: 0.006221s data size 2^12 = 4096 Time cost: 0.02139s data size 2^13 = 8192 Time cost: 0.081103 s data size 2^14 = 16384 Time cost: 0.323263 s data size 2^15 = 32768 Time cost: 1.32474s data size 2^16 = 65536 Time cost: 5.19642sCopy the code

The amount of data n has doubled. That’s a fourfold increase in time.

Is the test order logN?

Int main() {binarySearch // O(logn) cout<<"Test for binarySearch:"<<endl; for( int i = 10 ; i <= 28 ; i ++ ){ int n = pow(2,i); int *arr = MyUtil::generateOrderedArray(n); clock_t startTime = clock(); MyAlgorithmTester::binarySearch(arr,n,0); clock_t endTime = clock(); cout<<"data size 2^"<<i<<" = "<<n<<"\t"; cout<<"Time cost: "<<double(endTime - startTime)/CLOCKS_PER_SEC<<endl; delete[] arr; } return 0; }Copy the code

Complexity test

 log2N / logN
=  (log2 + logN)/logN
= 1 + log2/logN
Copy the code

When the data is twice as large. Operation efficiency increased by 1. Several times.

Running results:

Test for Binary Search:
data size 2^10 = 1024	Time cost: 1e-06 s
data size 2^11 = 2048	Time cost: 0 s
data size 2^12 = 4096	Time cost: 0 s
data size 2^13 = 8192	Time cost: 2e-06 s
data size 2^14 = 16384	Time cost: 1e-06 s
data size 2^15 = 32768	Time cost: 1e-06 s
data size 2^16 = 65536	Time cost: 1e-06 s
data size 2^17 = 131072	Time cost: 2e-06 s
data size 2^18 = 262144	Time cost: 3e-06 s
data size 2^19 = 524288	Time cost: 1e-06 s
data size 2^20 = 1048576	Time cost: 4e-06 s
data size 2^21 = 2097152	Time cost: 3e-06 s
data size 2^22 = 4194304	Time cost: 3e-06 s
data size 2^23 = 8388608	Time cost: 4e-06 s
data size 2^24 = 16777216	Time cost: 4e-06 s
data size 2^25 = 33554432	Time cost: 1.2e-05 s
data size 2^26 = 67108864	Time cost: 9e-06 s
data size 2^27 = 134217728	Time cost: 1.1e-05 s
data size 2^28 = 268435456	Time cost: 2.4e-05 s
Copy the code

Run the result, change little

Sequential search is converted to binary search, greatly improving efficiency

Is the test order NlogN?

It’s almost O(N)

Int main() {// Test mergeSort // O(nlogn) cout<<"Test for mergeSort:"<<endl; for( int i = 10 ; i <= 24 ; i ++ ){ int n = pow(2,i); Int * arr = MyUtil: : generateRandomArray (n, 0, 1 < < 30); clock_t startTime = clock(); MyAlgorithmTester::mergeSort(arr,n); clock_t endTime = clock(); cout<<"data size 2^"<<i<<" = "<<n<<"\t"; cout<<"Time cost: "<<double(endTime - startTime)/CLOCKS_PER_SEC<<endl; delete[] arr; } return 0; }Copy the code

Running results:

Test forMerge Sort: data size 2^10 = 1024 Time cost: 0.000143s data size 2^11 = 2048 Time cost: 0.000325s data size 2^12 = 4096 Time cost: 0.000977s data size 2^13 = 8192 Time cost: 0.001918s data size 2^14 = 16384 Time cost: 0.003678s data size 2^15 = 32768 Time cost: 0.007635s data size 2^17 = 65536 Time cost: 0.007635s data size 2^17 = 65536 Time cost: 0.034462 s data size 2^18 = 262144 Time cost: 0.069586 s data size 2^19 = 524288 Time cost: 0.136214s data size 2^20 = 1048576 Time cost: 0.294626s data size 2^21 = 2097152 Time cost: 0.619943s data size 2^22 = 4194304 Time cost: 1.37317s data size 2^23 = 8388608 Time cost: 0.619943s data size 2^23 = 8388608 Time cost: 2.73054s data size 2^24 = 16777216 Time cost: 5.60827sCopy the code

About two times

Complexity analysis of recursive algorithms

It must be order nlogn factorial if it doesn’t recurse.

Recursive implementation of binary search:

Left half or right half. I’m only going to do it once either way and I’m going to cut it in half each time, the depth of the recursive call is logn, and the complexity of the problem is O(1)

// binarySearch
int binarySearch(int arr[], int l, int r, int target){

    if( l > r )
        return -1;

    int mid = l + (r-l)/2;
    if( arr[mid] == target )
        return mid;
    else if( arr[mid] > target )
        return binarySearch(arr, l, mid-1, target);
    else
        return binarySearch(arr, mid+1, r, target);

}
Copy the code

If the recursive function, only one recursive call, recursion depth; In each recursive function, the time complexity is T; The total time complexity is O(T * depth).

Summation recursive implementation

Recursion depth: n Time complexity: O(n)

// sum
int sum( int n ){

    assert( n >= 0 );

    if( n == 0 )
        return 0;
    return n + sum(n-1);
}
Copy the code

Compute x to the NTH power

// pow2 double pow( double x, int n ){ assert( n >= 0 ); If (n == 0) return 1.0; double t = pow(x, n/2); // if(n%2) return x*t*t; return t*t; }Copy the code

Recursion depth: logn Time complexity: O(logn)

Recursion makes multiple recursive calls

The depth of the recursion tree is N

2^0 + 1 + 2^2 + 2^3 +… + 2^n = 2n+1 – 1 = O(2^n)

Exponential algorithms: very slow. N is around 20. Very slow pruning operations on 30: dynamic programming. Artificial intelligence: Search tree

// f
int f(int n){

    assert( n >= 0 );

    if( n == 0 )
        return 1;

    return f(n-1) + f(n-1);
}
Copy the code

Merge sort n=8

The depth of the tree is logN and when n is equal to 8, the number of levels is 3. Each layer processes smaller and smaller amounts of data and one is divided into logn layers. The total size of each of these layers is still n

// mergeSort
void mergeSort(int arr[], int l, int r){

    if( l >= r )
        return;

    int mid = (l+r)/2;
    mergeSort(arr, l, mid);
    mergeSort(arr, mid+1, r);
    merge(arr, l, mid, r);
}
Copy the code

Amortized Complexity Analysis Amortized Time Analysis

One algorithm is relatively complex, but it is intended to facilitate other operations. The higher one is going to equal the whole thing.

Dynamic array (Vector)

template <typename T> class MyVector{ private: T* data; int size; // Store the number of elements in the array. // Store the maximum number of elements that an array can hold // O(n) : a loop. void resize(int newCapacity){ assert( newCapacity >= size ); T *newData = new T[newCapacity]; for( int i = 0 ; i < size ; i ++ ) newData[i] = data[i]; delete[] data; data = newData; capacity = newCapacity; } public: MyVector(){ data = new T[100]; size = 0; capacity = 100; } ~MyVector(){ delete[] data; } // Average: O(1) void push_back(T e){// if(size == capacity) resize(2* capacity); data[size++] = e; } // O(1) T pop_back(){ assert( size > 0 ); size --; //size starts at 0. // Return data[size]; // return data[size]; }};Copy the code

capitation

The NTH +1 is going to cost order n, but it’s going to distribute that n over the previous n operations. So it becomes order two or order one constant. Resize is conditional and not called every time.

int main() {

    for( int i = 10 ; i <= 26 ; i ++ ){

        int n = pow(2,i);

        clock_t startTime = clock();
        MyVector<int> vec;
        for( int i = 0 ; i < n ; i ++ ){
            vec.push_back(i);
        }
        clock_t endTime = clock();

        cout<<n<<" operations: \t";
        cout<<double(endTime - startTime)/CLOCKS_PER_SEC<<" s"<<endl;
    }

    return 0;
}
Copy the code

1024 operations: 2.5e-05s 2048 Operations: 2.9E-05s 4096 Operations: 7.4e-05s 8192 Operations: 0.000154s 16384 operations: 0.000265s 32768 operations: 0.000391 s 65536 operations: 0.001008s 131072 operations: 0.002006s 262144 Operations: 0.003863s 524288 operations: 0.005842s 1048576 operations: 0.014672 s 2097152 operations: 0.029367 s 4194304 operations: 0.06675s 8388608 operations: 0.014672 s 2097152 operations: 0.029367 s 4194304 operations: 0.06675s 8388608 operations: 0.124446s 16777216 Operations: 0.240025s 33554432 Operations: 0.486061s 67108864 operations: 0.960224sCopy the code

It basically satisfies the two-fold relationship

Delete elements to reduce space.

The time complexity of each common deletion is O(1).

We only have n left. This time resize n removes this element to 1

If I repeat this process, I can’t amortize it. It’s order n.

If the number of elements is 1/4 of the size of the array, resize

template <typename T> class MyVector{ private: T* data; int size; // Store the number of elements in the array. O(n) void resize(int newCapacity){assert(newCapacity >= size); T *newData = new T[newCapacity]; for( int i = 0 ; i < size ; i ++ ) newData[i] = data[i]; delete[] data; data = newData; capacity = newCapacity; } public: MyVector(){ data = new T[100]; size = 0; capacity = 100; } ~MyVector(){ delete[] data; } // Average: O(1) void push_back(T e){ if( size == capacity ) resize( 2* capacity ); data[size++] = e; } // Average: O(1) T pop_back(){ assert( size > 0 ); T ret = data[size-1]; size --; if( size == capacity/4 ) resize( capacity/2 ); //resize the data[size] element. }};Copy the code