Offer to come, dig friends take it! I am participating in the 2022 Spring Recruit Punch card activity. Click here for details.
1. Title Description
Given a string s, find the length of the smallest string that does not contain repeating characters. Check the title source
Second, train of thought analysis
My thought process
First, a substring is a contiguous fragment of a string. So the logic of solving this problem is pretty simple
First solve the problem: exhaustive traversal, traversal from the first character, cut out the substring without repeated characters, and then save the length of the substring; The second character is traversed, truncated, compared to the previously saved substring length, and replaced if the new substring is longer. The general idea is a little bit like bubble sort.
The idea is this, but the implementation of code ah requires more detailed steps to easily type the code :(because the initial type does not type)
First, we need an array called charTemp, which is used to store each substring and determine if there are duplicate characters. The maximum length of the array is 26+26+10+1+32+1 = 96 (letters + numbers + Spaces + symbols +\0, symbols I only counted the symbols that the keyboard can type). You also need a variable head to specify the starting position. You also need a variable result, which holds the currently obtained oldest string and is also the final return result. You also need a temporary variable, length, to calculate the substring length, which needs to be set to zero each time you change the starting position.
Begin from the first character, then add the character charTemp, judge whether a repeat, do not repeat the length++ then took out a second character (the second character in the full position in the string can use the result and length calculation, so I don’t need a new variable).
If repeated, compare length to maximum length result to see if result is overridden, then start the next round (head++), repeat until head reaches the end (here I think we don’t need to wait until head reaches the end, We only need the distance between head and end to be less than the current maximum result.
Understand the thinking behind good code
By debugging good code step by step, I think the idea should go like this:
What we need to find is the longest non-repeating substring length, and the key to improve efficiency is how to reduce the number of substrings traversed, and my previous method is to traverse all cases, which is very inefficient. To improve this, imagine having two Pointers: start and end. Start must not come after end, and we want to make sure that the characters between start and end are not duplicated! That is, the substring of [start-end] is always a non-repeating substring. So how do we do that?
To understand this, assume that the string is abcdec
First, both Pointers start at the starting point, and then check whether the character A of end exists. I don’t repeat the substring [0-0] and then end++, I repeat that.
If the value of “end” is 5 and the value of “C” is “C”, it can be judged that “C” already exists. In this case, the substring [0-5] does not meet the condition of no duplicate characters. Therefore, you need to change the value of “start” to 3 to ensure that the substring [3-5] is a non-duplicate substring. We also need end++
Repeat until end reaches the end. In this process, we can ensure that all non-repeating substrings are traversed, so we just need to save the longest of them and solve the problem.
This is my idea of good code, but if I were to prove that the procedure above traverses the longest non-repeating substring, I wouldn’t be able to say, but it feels right: End tries its best to stay away from start, and start tries its best to “stay still”, but when end encounters a repeating character, start must move because they want to ensure that [start-end] is preceded by a non-repeating substring.
According to the above ideas, there are several points to consider when implementing:
- How to store characters?
- How do I determine if the character already exists?
- How does start locate repeated characters?
-
The rough and easy way, just save it
The end result is that temporary storage is as large as the original string. In this case, the presence of the character is checked in the [start-end] range. Start ++ until start is the same value as end
-
Treat char as int
Because the ASCII code itself is not that big, we can treat char itself as an int so that when we store it, char itself is the subscript, and we don’t have to go through it to see if it’s there, we can just use char as the subscript.
In this case, we need to be careful to immediately clean up the characters that are no longer in the non-repeating substring, i.e., when positioning start, we need to pay attention to the following line of code
while(*(s+start) ! = *(s+end)) { store[ *(s+start) ] =0; / / clean up start++; } Copy the code
Problems encountered
- ==42==ERROR: AddressSanitizer: heap-buffer-overflow on address
This problem is the old friend ಥ _ ಥ, address overflow error, cause I don’t have to be added into charTemp value judgment
int add (char *charTemp, char val) {...if (*charTemp == val) // Error code, no judgment on val
if (*charTemp == val || val == 0) // Correct code. }Copy the code
It is also important to note how much memory is assigned to charTemp, and to include a position for the closing identifier \0
- Submitted after using optimized code, cannot pass
""
A single space string and""
The empty string case
Simple and crude solution:
if (*s == 0)
return 0;
if (*(s+1) = =0)
return 1;
Copy the code
- store to address 0x62c001ecf730 with insufficient space for an object of type ‘char’
Translation: Lack of space. The reason for this is that I have to create as much storage space as the original character. The crude solution is to copy the unpassable string, look at its length, and then change the value in malloc
.char* store = malloc(sizeof(char) *31656); // The longest string in the test case is 31655.Copy the code
AC code
Feels inefficient because less than 10% of users (time and memory) are exceeded
int add (char *charTemp, char val) {
while(*charTemp ! =0) {
if (*charTemp == val || val == 0)
return 0;
charTemp++;
}
*charTemp = val;
*(charTemp+1) = 0;
return 1;
}
int lengthOfLongestSubstring(char * s){
char* charTemp;
int head = 0;
int result = 0;
int length;
while(*(s+head) ! =0) {
length = 0;
charTemp = malloc(sizeof(char) *96);
*charTemp = 0;
while (add(charTemp, *(s+head+length))) {
length++;
}
if (length > result)
result = length;
head++;
}
return result;
}
Copy the code
After referring to the excellent code, the algorithm is improved (the storage method is rough, so the memory effect is not good, but the time consumption can achieve 0ms)
int isThere (char *head, char val) {
while(*head ! =0) {
if (*head == val || val == 0)
return 1;
head++;
}
return 0;
}
int lengthOfLongestSubstring(char * s){
if (*s == 0)
return 0;
if (*(s+1) = =0)
return 1;
char* store = malloc(sizeof(char) *31656);
int start = 0;
int end = 0;
int maxLength = 0;
while(*(s+end) ! =0) {
*(store+end) = *(s+end); / / save
end++;
*(store+end) = 0;
if (isThere(store+start, *(s+end))) { / / determine
while(*(s+start) ! = *(s+end))/ / position
start++;
start++;
}
if (maxLength < end-start+1)
maxLength = end-start+1;
}
return maxLength;
}
Copy the code
Optimize memory
int lengthOfLongestSubstring(char * s){
char store[256] = {0};
int start = 0;
int end = 0;
int maxLength = 0;
while(*(s+end) ! =0) {
if (store[ *(s+end) ] == 0) { / / determine
store[ *(s+end) ] = 1; / / save
} else {
while(*(s+start) ! = *(s+end)) {/ / position
store[ *(s+start) ] = 0; / / clean up
start++;
}
start++;
}
end++;
if (maxLength < end-start)
maxLength = end-start;
}
return maxLength;
}
Copy the code
Four,
On the 1st version of the code, at a special time at free(Chatte), more ram will be occupied. As far as I can tell, the space malloc() created loses its pointer (there is no pointer to it), and it is automatically destroyed, so if I go to free it would take some resources to execute the free function.
After looking at the good code, and trying to understand the idea and their own implementation, I found myself really good.