First of all, I will simply talk about why we need to learn regular expressions and why we can’t do without regular expressions in the web crawler. Regular expressions play a very important role in processing strings, and are also very common in web crawlers. You can learn it more easily, but you have to learn it.
Although web crawler libraries provide us with rich libraries such as CSS, BS4, LXML and so on, which allow us to match strings through selectors, data in HTML often exists in tags. It is possible to match the contents of the tag through the selector, but sometimes many of the contents in the tag are redundant, and we only need to match some of them (such as matching numbers, times, etc.), as shown in the figure below. We can usually get the string “782 funny” from the selector, but if we only need the number “782”, then the regular expression will come in handy. Finally, if your time is not very tight, and want to quickly improve, the most important thing is not afraid of hardship, I suggest you can contact Wei: 762459510, that is really good, many people progress quickly, need you not afraid of hardship oh! You can go to add a look at ~
Regular expressions can help us determine whether a certain string conforms to a certain pattern. Secondly, regular expressions can help us extract important parts of a certain string and perform sub-string extraction. Today, we will simply explain a few regular expression special characters – “^”, “. , “*”, and use examples to demonstrate, so that we have a preliminary understanding of the regular expression.
Pycharm = pyCharm = pyCharm = pyCharm = PyCharm = PyCharm = PyCharm = PyCharm
Regular expressions there is a special library in Python called the Re module, which starts with importing the module. Define a string STR, and then a regular expression matching rule regex.
2. “^d” means any string that begins with the d element. This means that any element that follows a string that begins with a D element is valid.
3, “.” More commonly used, it represents the meaning of any character, the range of its representation is very wide, can be any character, no matter in Chinese and English, or special characters such as underscore, can be represented. For example, the regular expression “^d.” is a string that starts with d and can be followed by any character. Finally, if your time is not very tight, and want to quickly improve, the most important thing is not afraid of hardship, I suggest you can contact Wei: 762459510, that is really good, many people progress quickly, need you not afraid of hardship oh! You can go to add a look at ~
4, “*” is also very common, which means that the preceding character can be repeated as many times as you want, can be 0, 1, 2 times, etc.
5, understand the use of these special characters, then through the code to simply feel. As shown in the figure below, if the match is successful, return yes; If there is no match, nothing is returned.
As you can see, after the program runs, the result is yes, indicating a successful match. The regular expression “^d.*” represents a string starting with d, followed by any character, appearing any number of times. Obviously, the result of the regular expression match is the same as the original string. If then determines that the return value is true, so the result is printed as yes.
6. To further verify that this pattern is correct, we change b to A to indicate whether the string in this pattern starts with a. Then run the program again, as shown below. Finally, if your time is not very tight, and want to quickly improve, the most important thing is not afraid of hardship, I suggest you can contact Wei: 762459510, that is really good, many people progress quickly, need you not afraid of hardship oh! You can go to add a look at ~
You can see that no output is displayed, indicating that the special character “^” works.
Open Python and see how regular expressions work