This series is my summary of data structure and algorithm when I was looking for a job many years ago. There are basic parts and classic interview questions of various companies. It was first published on CSDN. Is sorted out for a series of friends in need of reference, if there is a mistake, welcome to correct. The full code address for this series is here.

0 overview

As the basic content of data structure, string is also one of the basic skills often examined in interviews, such as the implementation of strcpy, STRCMP and other basic functions, palindrome string, string search, regular expression and so on. See the code for this article here.

1 Basic Operations

Let’s start with the implementation of some basic string functions. The following code is taken from the MIT6.828 course.

// String length int strlen(const char *s) {int n;for(n = 0; *s ! ='\ 0'; s++)
		n++;
	returnn; } char *strcpy(char * DST, const char * SRC) {char *ret; ret = dst;while((*dst++ = *src++) ! ='\ 0') / *do nothing */;
	returnret; } char *strcat(char * DST, const char * SRC) {int len = strlen(DST); strcpy(dst + len, src);returndst; } // string comparison int STRCMP (const char *p, const char *q) {while (*p && *p == *q)
		p++, q++;
	return(int) ((unsigned char) *p - (unsigned char) *q); Char * STRCHR (const char *s, char c) {for (; *s; s++)
		if (*s == c)
			return (char *) s;
	return0; Void *memset(void *v, int c, size_t n) {char *p; int m; p = v; m = n;while (--m >= 0)
		*p++ = c;

	returnv; Void *memmove(void * DST, const void * SRC, size_t n) {const char *s; char *d; s = src; d = dst;if (s < d && s + n > d) {
		s += n;
		d += n;
		while (n-- > 0)
			*--d= * --s;
	} else
		while (n-- > 0)
			*d++ = *s++;

	return dst;
}
Copy the code

2. String related interview questions

2.1 Longest callback substring

Given a string, find the longest substring of the string. A palindrome string is a string that looks the same from left to right, aba, CDDC are palindrome strings. The string abbacdc has a substring of abba and CDC, so its longest substring is abba.

An easy mistake to make

At first glance, you might think of something like this: invert the string S to get the new string S’, and then find the longest common substring of S and S’.

  • Such asS = caba.S' = abac, then the longest common substring of S and S’ isabaThis is correct.
  • But if theS = abacdfgdcaba.S’ = abacdgfdcaba, then the longest common substring of S and S’ isabacdObviously this is not a palindrome string. So this approach is wrong.

Determines whether a string is a palindrome string

To find the longest substring, you first solve the problem of determining whether a string is a palindrome string. The most obvious way to do this is to set two variables I and j, pointing to the beginning and the end of the string, to see if they are equal, and then I ++, j–, until I >= j. The following code checks whether the string STR [I, j] is a palindrome string, that is, whether the substring STR from I to j is a palindrome string, which will be used later.

*/ int isPalindrome(string s, int start, int end) {for (; start < end; ++start,--end) {
        if(s[start] ! = s[end])return 0;
    }
    return 1;
}
Copy the code

Solution 1: Brute force method to find the largest string

The brute force method evaluates all substrings of a string and, if it is a palindrome string, updates the length of the longest palindrome. Since there may be (1+N)*N/2 substrings of length N, it takes O(N) time to determine each substring, so it takes O(N^3) time to find the longest substring.

O(N^3) */ string longestPalindrome(string s) {int len = s.length(), maxLen = 1; int start=0, i, j; /* Traverses all substrings of the string, updating the length of the longest palindrome if the substring is a palindrome string */for (i = 0; i < len - 1; i++) {
        for (j = i + 1; j < len; j++) {
            ifInt pLen = j - I + 1; int pLen = j - I + 1; int pLen = j - I + 1;if(pLen > maxLen) { start = i; // Update the longest palindrome starting position maxLen = pLen; // Update the length of the longest palindrome}}}}return s.substr(start, maxLen);
}
Copy the code

Solution 2: Dynamic programming method

Because the brute force method needs a lot of repeated calculations to determine palindromes, it can be improved by dynamic programming method. Assuming we know that “bab” is a palindrome, “ababa” must also be a palindrome.

Let's define P[I, j] =trueIf the substring P[I, j] is a palindrome string. Then P[I, j] <- (P[I +1, j-1] &&s [I] = s[j]). Base Case: P[i, i ] =true
P[i, i+1 ] = true <- s[i] = s[i+1]
Copy the code

Accordingly, the implementation code is as follows:

/** * longest substring - dynamic programming method, the time complexity of this method is O(N^2), space complexity is O(N^2). */ /** * the longest loop substring - dynamic programming method, the time complexity of this method is O(N^2), space complexity is O(N^2). * * idea: define P[I, j] = 1 if the substring P[I, j] is a palindrome string. * then P[I, j] <- (P[I +1, j-1] &&s [I] == s[j]). * * Base Case: * P[ i, i ] <- 1 * P[ i, i+1 ] <- s[i] == s[i+1] */ string longestPalindromeDP(string s) { int n = s.length(); int longestBegin = 0, maxLen = 1; int **P; int i; P*/ P = (int **)calloc(n, sizeof(int *));for (i = 0; i < n; i++) {
        P[i] = (int *)calloc(n, sizeof(int));
    }


    for (i = 0; i < n; i++) {
        P[i][i] = 1;
    }

    for (int i=0; i<n-1; i++) {
        if (s[i] == s[i+1]) {
            P[i][i+1] = 1;
            longestBegin = i;
            maxLen = 2;
        }
    }

    /*依次求P[i][i+2]...P[i][i+n-1]等*/
    int len = 3;
    for (; len <= n; ++len) {
        for (i = 0; i < n-len+1; ++i) {
            int j = i + len - 1;
            if(s[i] == s[j] && P[i+1][j-1]) { P[i][j] = 1; longestBegin = i; maxLen = len; }}} /* Free memory */for (i = 0; i< n; i++)
        free(P[i]);
    free(P);

    return s.substr(longestBegin, maxLen);
}
Copy the code

Solution 3: Center method

There is an even easier way to find the longest loop substring using O(N^2) time without extra space. We know that palindrome strings are symmetrical in the center of the string, such as ABBA and ABA. A better approach is to start in the middle, because palindrome strings are symmetrical in the center of the string. A string of length N can have 2n-1 centers of symmetry, and the reason why it’s 2n-1 instead of N is because the point of symmetry can be between two characters, such as abba, where the point of symmetry is between the first letter B and the second letter B. The implementation code is as follows:

Void expandAroundCenter(string s, int l, int r, int *longestBegin, int *longestBegin, int *longestBegin, int *longestBegin, int *longestBegin, int *longestBegin, int *longestBegin, int *longestBegin, int *longestBegin int *longestLen) { int n = s.length();while(l>=0 && r<=n-1 && s[l]==s[r]) { l--, r++; } *longestBegin = l + 1; *longestLen = r - l - 1; } /** * longest substring - center method, time O(N^2). */ string longestPalindromeCenter(string s) { int n = s.length();if (n == 0) 
        return s;

    char longestBegin = 0;
    int longestLen = 1;

    for(int i = 0; i < n; i++) { int iLongestBegin, iLongestLen; expandAroundCenter(s, i, i, &iLongestBegin, &iLongestLen); // The longest palindrome string centered around position Iif(iLongestLen > longestLen) { longestLen = iLongestLen; longestBegin = iLongestBegin; } expandAroundCenter(s, i, i+1, &iLongestBegin, &iLongestLen); // The longest palindrome string centered around the position between I and I +1if(iLongestLen > longestLen) { longestLen = iLongestLen; longestBegin = iLongestBegin; }}return s.substr(longestBegin, longestLen);
}
Copy the code

2.2 Switching Sort

Given a character array containing R, G, and B characters, it is required to sort all characters in RGB order. For example, given an array char s[] = “RGBBRGGBGB”, then RRGGGGBBBB should be sorted.

Solution 1: This problem is somewhat similar to the method used in quicksort to partition the array, but there are three characters, so you need to call the partition method twice, the first partition with B, the second partition with G, so that the original character array can be divided into RGB order. This method is natural and easy to think of. The code is as follows. The disadvantage of this method is that you need to traverse the array twice.

void swapChar(char *s, int i, int j) { char temp = s[i]; s[i] = s[j]; s[j] = temp; Void partition(char *s, int lo, int hi, char t) {int m = lo-1, I;for (i = lo; i <= hi; i++) {
        if(s[i] ! = t) { swapChar(s, ++m ,i); }}} /** * rgbSortTwice(char *s) {int len = strlen(s); partition(s, 0, len-1,'G'); RBBRBBGGGG partition(s, 0, len-1,'B'); RRGGGGBBBB} RRGGGGBBBB}Copy the code

Solution 2: There is a method that only traverses the array once, but the number of swaps is not reduced. It mainly sets two variables R and G to indicate the position of the current r and G characters respectively, traversing the number group.

  • 1) If the ith position is character R, it should be exchanged with the last character of the preceding indicator variable R, namely, the character at ++ R, and ++ G. At this time, it should also be determined whether the character stored in the swapped I is G. If it is G, it should be exchanged with the character at G;

  • 2) If the ith position is G, swap it with the character at ++ G. Plus plus g always points to the next place where we should swap g, plus plus r points to the next place where we should swap R.

  • 3) If the i-th position is character B, do nothing and continue traversing.

Void rgbSortOnce(char *s) {int len = strlen(s); int lo = 0, hi = len - 1; int r, g, i; // r = lo - 1; // r = lo - 1;for (i = lo; i <= hi; i++) {
        if (s[i] == 'R'SwapChar (s, ++ R, I); ++g;if (s[i] == 'G'SwapChar (s, G, I) swapChar(s, G, I); }else if (s[i] == 'G'SwapChar (s, ++ G, I); }else{// do nothing with B}}}Copy the code

Solution 3: If swapping is not considered, you can directly count the number of characters in RGB, and then reassign the array to RGB from scratch. That’s so much easier, haha. But if you have a different problem, and you want to arrange positive numbers, negative numbers, and 0 in a certain order, then you have to swap.

2.3 Maximum sliding window

Given an array A, there is A sliding window of size W that slides from the left to the end. You can only see w numbers in this window, and you can only move one position at a time. Our goal is to find the maximum number of w digits per window and store these maximum values in array B.

For example, array A = [1 3-1-3 5 3 6 7], window size w = 3. Then the window sliding process is as follows:

Window position Max --------------- ----- [1 3 -1] -3 5 3 6 7 3 1 [3 -1 -3] 5 3 6 7 3 1 3 [-1 -3 5] 3 6 7 5 1 3 -1 [-3 5 3] 6 7 5 1 3-1 1-3 [5 3 6] 7 6 1 3-1 1-3 5 [3 6 7] 7 Input: array A and w size output: array B, where B[I] stores the maximum number of w digits from A[I] to A[I + W-1].Copy the code

Solution 1: Simple implementation

One of the simplest ideas is to Max out w digits for each move and save it. It takes O(w) time to Max out W digits for each move, whereas the sliding process takes N-W +1 swipes, where n is the array size, so the total time is O(NW).

Int maxInArray(int A[], int n) {int Max = A[0], I;for (i = 1; i < n; i++) {
        if(A[i] > max) { max = A[i]; }}returnmax; } /* * maxSlidingWindowSimple(int A[], int n, int w, int B[]) {int I;for (i = 0; i <= n-w; i++) 
        B[i] = maxInArray(A + i, w);
}
Copy the code

Solution 2: maximum heap solution

The first method is simple in thought, but time complex, so it needs to be improved. You can use a maximum heap to hold W numbers, which takes O(LGW) time each time a number is inserted, and O(1) time to get the maximum from the heap (the average size of the heap is about W). As the window slides from left to right, some numbers in the heap are invalidated (because they are no longer contained in the window). If the array itself is ordered, the heap size increases to N. Since the heap size does not remain constant at w, the time complexity of this algorithm is O(NLGN).

/ / void maxSlidingWindowPQ(int A[], int n, int w, int B[]) {typedef pair<int, int> pair; priority_queue<Pair> Q; // The priority queue holds the values in the windowfor(int i = 0; i < w; i++) Q.push(Pair(A[i], i)); // Build a maximum heap of w elementsfor (int i = w; i < n; i++) {
        Pair p = Q.top();
        B[i-w] = p.first;
        while (p.second <= i-w) {
           Q.pop();
           p = Q.top();
        }
        Q.push(Pair(A[i], i));
    }
    B[n-w] = Q.top().first;
}
Copy the code

Solution 3: bidirectional queue solution

The maximum heap solution preserves redundant elements in the heap. For example, if the original element in the heap is [10 5 3] and the new element is 11, then there will be [11 5 3] in the heap. In fact, we can clear the entire queue and then add 11 to the queue, that is, only hold in the queue [11]. A bidirectional queue can be used, where the maximum sliding window is always stored at the head of the queue and the data in the queue is always sorted from largest to smallest. When a value greater than the current sliding window maximum is encountered, the queue is emptied and the new maximum is inserted into the queue. If a value less than the current maximum is encountered, it is inserted directly to the end of the queue. Check whether the current maximum value is in the valid range. If it is not, delete it from the queue. As each element enters and leaves the queue at most once, the time complexity of the algorithm is O(N).

Void maxSlidingWindowDQ(int A[], int n, int w, int B[]) {deque<int> Q; void maxSlidingWindowDQ(int A[], int n, int w, int B[]);for (int i = 0; i < w; i++) {
        while(! Q.empty() && A[i] >= A[Q.back()]) Q.pop_back(); Q.push_back(i); }for (int i = w; i < n; i++) {
        B[i-w] = A[Q.front()];
        while(! Q.empty() && A[i] >= A[Q.back()]) Q.pop_back();while(! Q.empty() && Q.front() <= i-w) Q.pop_front(); Q.push_back(i); } B[n-w] = A[Q.front()]; }Copy the code

2.4 Longest common subsequence

Given two sequences X = < x1, x2… , xm > and Y = < y1, y2… Ym >, hoping to find the common subsequence (LCS) of the maximum length of X and Y.

Analysis: The simplest way to solve LCS is to use brute force method, enumerating all subsequences of X, and then checking whether they are subsequences of Y one by one, and recording the largest sequence found, and finally taking the largest subsequence. But all of the subsequences of X have 2 to the m, and this method takes exponential time, which is not very practical, but the LCS problem actually has optimal substructure properties.

Optimal substructure of LCS:

If X =

, Y =

, then the longest common subsequence of X and Y is

or

. That is, there can be multiple LCS.
,>
,>
,>
,>

Let X = < x1, x2… , xm > and Y = < y1, y2… , yn > are two sequences, and let Z = < z1, z2… , zk > is any LCS of X and Y.

    1. If xm = yn, zk = xm = yn, and Zk-1Is Xm-1And Yn-1One LCS of theta.
    1. If xm! = yn, zk! = xmAnd Z is Xm-1One LCS of Y.
    1. If xm! = yn, zk! = ynAnd Z is X and Yn-1One LCS of theta.

Therefore, we can define c[I, j] as the length of an LCS of sequence Xi and Yj, then we can get the following recursion:

C/I, j = 0 / / I = 0 or j = 0 c/I, j = c [] I - 1, j - 1 + 1 / / I, j > 0, And Xi = Yj c/I, j = Max (c/I - 1, j, c [I]] [j - 1) / / I, j > 0, and Xi! = YjCopy the code

From this we can write the following code to find the length of the LCS and the LCS, using an auxiliary array B to store the LCS path. Here is a recursive algorithm for LCS length, the use of dynamic programming algorithm code see this source code.

/** * LCS- recursive algorithm */#define UP 1
#define LEFT 2
#define UPLEFT 3

int lcsLengthRecur(char *X, int m, char *Y, int n, int **b)
{
    if (m == 0 || n == 0)
        return 0;

    if (X[m-1] == Y[n-1]) {
        b[m][n] = UPLEFT;
        return lcsLengthRecur(X, m-1, Y, n-1, b) + 1;
    } 

    int len1 = lcsLengthRecur(X, m-1, Y, n, b);
    int len2 = lcsLengthRecur(X, m, Y, n-1, b);

    int maxLen;
    if (len1 >= len2) {
        maxLen = len1;
        b[m][n] = UP;
    } else {
        maxLen = len2;
        b[m][n] = LEFT;
    }
    returnmaxLen; } /** * print the LCS using the auxiliary array b */ voidprintLCS(int **b, char *X, int i, int j)
{
    if (i == 0 || j == 0)
        return;

    if (b[i][j] == UPLEFT) {
        printLCS(b, X, i-1, j-1);
        printf("%c ", X[i-1]);
    } else if (b[i][j] == UP) {
        printLCS(b, X, i-1, j);
    } else {
        printLCS(b, X, i, j-1); }}Copy the code

The process of printing LCS is shown below (figure taken from Introduction to Algorithms) :

2.5 All Strings are arranged

Given an array of characters char arr[] = “ABC”, print the full array of characters in the array.

Solution: Output full permutation using recursion. The perm(arr, k, len) function outputs all permutations of the character array arr starting at position k. The array length is len. The base condition is k == len-1, the last element has been reached, the permutation has been completed, and the output is direct. Otherwise, each element starting at position K swaps with the value of position K (including itself), and then goes through the next permutation, remembering to restore the original sequence when it’s done.

If len=3, then perm(arr, 0, 3) is called: swap 0, 0, and execute perm(arr, 1, 3). Swap 0, 0 again, and restore the array to its original value. Perm (arr, 1, 3) is executed. After that, the array is swapped again, and the array is restored to its original value. Swap 2,0 for the third time and execute perm(arr, 1, 3). Swap 2,0 for the third time and restore the array to its original value.

The program output is ABC, ACB, BAC, BCA, CBA, CAB. That is, the first permutation of a is printed, followed by the first permutation of B and C.

Void perm(char *arr, int k, int len) {//k is the starting position and len is the size of the arrayif (k == len-1) { 
        printf("%s\n", arr);
	return;
    }

    for(int i = k; i < len; i++) { swapChar(arr, i, k); // Switch perm(arr, k+1, len); SwapChar (arr, I, k); // Restore the original sequence}}Copy the code

2.6 Regular Expressions

Implementation of a simple version of the regular expression, support ^, $,. Features.

Regular expression basics: A regular expression is itself a sequence of characters that defines a collection of strings that can be matched. In common Unix/Linux regular expressions, the character ^ indicates the beginning of a string and $indicates the end of a string. Thus, ^x matches only x at the beginning of a string, x$matches only x at the end, ^x$matches only x in a single string, and ^$matches only empty strings. Character. Can match any character. So pattern x.y matches xay, x2y, and so on, but it does not match xy or xaby. Obviously ^.$can match any single character string. A set of characters written in square brackets [] can match any of these characters. For example, [0123456789] can match any number. This pattern can also be abbreviated to [0-9].

The following is the main function match for regular expression matching, and the parameters are regexp and text. If the regular expression begins with ^, the body must match the rest of the expression from the beginning. Otherwise, we go down the string and use matchhere() to see if the text matches somewhere. Once a match is found, the job is done. Note the use of do-while. Some expressions can match an empty string (for example, $can match an empty string at the end of the string, and * can match any number of characters, including zero). So, even if we encounter an empty string, we still need to call Matchhere ().

int match(const char *regexp, const char *text)
{
    if (regexp[0] == A '^')
        return matchhere(regexp+1, text);
    do {
        if (matchhere(regexp, text))
            return 1;
    } while(*text++ ! ='\ 0');
    return 0;
}
Copy the code

The recursive function matchhere() does most of the matching:

  • ifregexp[0]=='\0'If the match is successful, 1 is returned.
  • If the end of the expression is zero$, the successful matching condition is that the text also reaches the end, that is, the judgment*text=='\0'. If the text is at the end, the match succeeds; otherwise, the match fails.
  • If the text is not at the end, andregexp[0] == *textorregexp=='.'(.Represents matching any character), a recursive call to Matchhere continues the next match.
  • ifregexp[1]=='*', the process is slightly complicated, for examplex*. At this point we callmatchstarThe first argument is the asterisk argument (x*In thex), followed by the schema after the asterisk and the corresponding body string.
int matchhere(const char *regexp, const char *text)
{
    if (regexp[0] == '\ 0')
        return 1;

    if (regexp[0]=='$' && regexp[1]=='\ 0')
        return *text == '\ 0';

    if (regexp[1] == The '*')
        return matchstar(regexp[0], regexp+2, text);

    if(*text ! ='\ 0' && (regexp[0] == '. ' || regexp[0] == *text))
        return matchhere(regexp+1, text+1);

    return 0;
}

int matchstar(int c, const char *regexp, const char *text)
{
    do {
        if (matchhere(regexp, text))
            return 1;
    } while(*text ! ='\ 0' && (*text++ == c || c == '. '));
    return 0;
}
Copy the code

Example:

  • char *regexp="abc", text="dagabcdefg"The match is successful.
  • char *regexp="^abc", *text="abcdefg"The match is successful.
  • char *regexp="^abc", *text="bcdefgabc"Failed to match.
  • char *regexp="abc$", *text="defghabc"The match is successful.

2.7 KMP algorithm and BM algorithm

String matching famous KMP algorithm and BM algorithm, online information is more, you can see grep string search algorithm Boyer-Moore from shallow to deep (faster than KMP 3-5 times) and string matching KMP algorithm.

The resources

  • Articles.leetcode.com/sliding-win…
  • Leetcode.com/problems/lo…
  • The programming of mission
  • Introduction to algorithms
  • MIT6.828 code