preface

In the last article, I explained the KMP algorithm in detail using animation (if you haven’t seen it, go back and look at it).

This time I still use animation to introduce you to another string matching algorithm that you will love once you use it: Sunday algorithm, hope you can harvest your likes!

KMP algorithm is a milestone like algorithm, whose appearance announced that human has found a string matching algorithm with linear time complexity. After this, there are many string matching algorithms, such as BM algorithm and Sunday algorithm.

These algorithms have reached linear time in time complexity. However, the time spent in the actual application is still different.

The efficiency of BM algorithm in practical application has reached four or five times that of KMP algorithm.

The Sunday algorithm is even more efficient than BM algorithm.

And if you know both algorithms, you’ll see:

The Sunday algorithm is really much easier to understand than the BM algorithm.

The body of the

Okay, so that’s it for the Sunday algorithm. Let’s get started!

PS: The following strings with matches are called text strings, and the strings used to match are called pattern strings.

Why is the Sunday algorithm easy to understand?

Because it’s only one more step than the brute force matching algorithm!

Without further ado, go straight to my elaborate GIF:

As you can see, it only took us three moves to find the final result.

Sunday algorithm is a front-to-back matching algorithm (BM algorithm is back-to-front). When the matching fails, the focus is on the next character of the last character in the text string to be matched.

  • If the character does not appear in the pattern string, it is skipped directly, that is, the number of moves = the pattern string length + 1.

  • Otherwise, the number of moves = pattern string length – the right-most position of the character (starting with 0) = the distance from the right-most position of the character in the pattern string to the tail + 1.

The most clever part of the Sunday algorithm is that it can directly examine the next character of the last character in the text string to be matched after it finds that the match fails.

In Python code, we use dictionaries to store the index of the last occurrence of each character in the pattern string, so that it takes O(M) in the initial preparation time, which is the length of the pattern string, and O(1) in the query time.

Also to prevent transgressions, I have manually added a ‘\0’ character to the end of the string in the Python code posted below.

code

class Sunday(object): def __init__(self, pattern:str): Self. pattern, self.length = pattern, len(pattern) self.shift_dict = {} value in enumerate(pattern): self.shift_dict[value] = self.length - index def match(self, text:str): i = 0 text_length = len(text) text += '\0' while i <= text_length - self.length: j = 0 while self.pattern[j] == text[i + j]: j += 1 if j >= self.length: return i offset = self.shift_dict[text[i+self.length]] if text[i+self.length] in self.shift_dict else self.length + 1 i += offset return -1 s = Sunday('nihao') print(s.match('dasoijfoasjdoifjasdoifjoinihao'))Copy the code

The code is very simple, and I built a class to simplify the code by reusing its dictionary of locations within the same pattern string.

Sunday algorithm and KMP algorithm compete

After writing the code, I conducted a rough test on the matching time of KMP algorithm and Sunday algorithm, and the test results are as follows:

amazing!
The average matching speed of Sunday algorithm is about seven times that of KMP algorithm!

We construct an object for KMP and Sunday, and then generate a random 100,000-character string each time to start matching them.

Generate –> match the process of a hundred times, the final calculation of the average time. If there are big guys feel uneasy, I release the detection code below, we can modify the test, take it to use!

The detection code is as follows

class KMP(): def __init__(self, ss: str) -> list: self.length = len(ss) self.next_lst = [0 for _ in range(self.length)] self.next_lst[0] = -1 i = 0 j = -1 while i < self.length - 1: if j == -1 or ss[i] == ss[j]: i += 1 j += 1 if ss[i] == ss[j]: self.next_lst[i] = self.next_lst[j] else: self.next_lst[i] = j else: j = self.next_lst[j] self.pattern = ss def match(self, ss:str): ans_lst = [] j = 0 for i in range(len(ss)): if ss[i] ! = self.pattern[j]: j = self.next_lst[j] if self.next_lst[j] ! = -1 else 0 if ss[i] == self.pattern[j]: j += 1 if j == self.length: return i + 1 - self.length return -1 class Sunday(object): def __init__(self, pattern:str): Self. pattern, self.length = pattern, len(pattern) self.shift_dict = {} value in enumerate(pattern): self.shift_dict[value] = self.length - index def match(self, text:str): i = 0 text_length = len(text) text += '\0' while i <= text_length - self.length: j = 0 while self.pattern[j] == text[i + j]: j += 1 if j >= self.length: return i offset = self.shift_dict[text[i+self.length]] if text[i+self.length] in self.shift_dict else self.length + 1 i += offset return -1 import random import time sunday = Sunday('helloworld') kmp = KMP('helloworld') kmp_average_time = 0  sunday_average_time = 0 for i in range(100): ss = ''.join([chr(random.randint(97, 122)) for _ in range(100000)]) st = time.process_time() sunday.match(ss) ed = time.process_time() sunday_average_time += Ed-st st = time.process_time() kmp.match(ss) Ed = time.process_time() kmp_average_time += ed-st print(' KMP average time: Format (kmp_average_time / 100) print(' Sunday average_time: {}'. Format (kmp_average_time / 100)) print(' Sunday average_time: {}'.Copy the code

The last

Finally, if you think this article is helpful to you, give me a point of attention, favorites!

My personal public account is [program member], welcome your attention, your recognition is my biggest power!

I will keep updating articles that are helpful to you!

I am Luo Yang, thank you for visiting!