Regular expressions for programming ideas
What is a regular expression?
A Regular Expression is a formula that uses a pattern to match a string. If you want to find in an article the first word is “luo” and the last word is “Hao” of the name of the three words, namely “Luo * Hao”; So “Luo * Hao” is the formula, also known as Pattern, and this article is the string (or text) to be matched. For example, if you want to check whether a string entered is in the 126 mailbox format, you have to make a rule to check. This rule is a regular expression.
Start at the beginning
Let’s start with the example mentioned above: checking if a string matches the 126 mailbox format.
From the mailbox registration page of netease, you can see that the user name of mailbox 126 must meet the following formats: 6 to 18 characters, including letters, digits, and underscores (_), and must start with a letter. We can define a pattern: ^[a-za-z]\w{5,17}@126.com
The pattern can be understood as follows:
[a-za-z] : any letter from A to Z or a to Z
^ : what does ^[a-za-z] start with
\ W: Word character [A-za-z_0-9], that is, any of a-z or A-z or 0-9 or _
{5,17} : indicates the occurrence of 5 to 17 times (at least 5 times, but not more than 17 times), \w{5,17} indicates 5 to 17 characters.
^[a-za-z]\w{5,17} contains 6 to 18 letters, digits, and underscores (_), and must start with a letter.
@126.com: the user name that meets the preceding rules is followed by the character string @126.com to form an email address.
“^[a-za-z]\w{5,17}@126.com” is what we call a regular expression, which is simply implemented in Java as follows:
String regex = "^ \ \ w [a zA - Z] {5} in 2 126 @ \\.com". // Define matching rules: regular expressions // Description: in 126.com. Need to escape \\. String text = "ZhanSan@126fcom"; Boolean matched = text.matches(regex); Regex system.out.println (isMatched);Copy the code
Common symbols of regular expressions
The “^”, “\w”, and “{5,17}” used in the previous example are all common symbols in regular expressions. These symbols have special meanings in regular expressions. The following table shows the meaning of the common symbols for regular expressions in Java (only the parts that are commonly used are illustrated, and these parts can solve most of the problems with regular expressions).
model |
Matching content (meaning) |
|
|
Character classes |
|
[abc] |
A, B or C (Simple class) |
[^abc] |
Any character other than a, B, or C (negative) |
[a-zA-Z] |
A to Z or a to Z, both ends of the alphabet included |
[a-d[m-p]] |
A to D or M to P: [a-dm-p] (union) |
[a-z&&[def]] |
D, E, or F (intersection) |
[a-z&&[^bc]] |
A to Z, except b and C: [ad-z] (minus) |
[a-z&&[^m-p]] |
A to Z, not m to p: [a-lq-z] (minus) |
|
|
Predefined character class |
|
. |
Any character (which may or may not match the line terminator) |
\d |
The Numbers: [0-9] |
\D |
Non-numeric: [^0-9] |
\s |
Blank character: [\t\n\x0B\f\r] |
\S |
Non-whitespace character: [^\s] |
\w |
Word characters: [A-za-z_0-9] |
\W |
Non-word characters: [^\w] |
|
|
Boundary matcher |
|
^ |
The beginning of |
$ |
The end |
\b |
Word boundaries |
\B |
Non-word boundary |
\A |
The beginning of the input |
\G |
The end of the last match |
\Z |
The end of the input, used only as the last terminator (if any) |
\z |
End of input |
|
|
Quantifiers (Greedy strategy) |
|
X? |
X, not once or not |
X* |
X, zero or more times |
X+ |
X, one or more times |
X{n} |
X, exactly n times |
X{n,} |
X, at least n times |
X{n,m} |
X, at least n times, but no more than m times |
|
|
Logical operator |
|
XY |
X is followed by Y |
X|Y |
X or Y |
(X) |
X, as the capture group |
Reference document: Class Pattern
These commonly used symbols have the same meaning in regular expressions in all programming languages (because the idea of regular expressions is the same) and can be used as arguments. However, there may be some subtle differences between different languages. For more accurate and authoritative descriptions of various programming languages, please refer to their official documentation:
C++(VS2013 compiler) : msdn.microsoft.com/zh-cn/libra…
Java: docs.oracle.com/javase/7/do…
JavaScript: www.w3school.com.cn/jsref/jsref…
Use of regular expressions
The following describes the use of regular expressions in C++, Java, and JavaScript, starting with common requirements
Regular expressions in C++
There are three main implementations of regular expressions in C++ : C regex, C++ standard library (C++ Regex), and Boost regex. C Regex is a procedural programming method, which is not very convenient to use. The C++ regex method can be used directly because it is part of the standard library (it seems that Linux does not support it), but C++ regex is very difficult to use, the syntax is strict, and many of the default options are not what we normally think; Boost regex is an open source third-party library that is widely used in C++ projects. Boost regex is very flexible and easy to use, which is the preferred approach to C++ development.
The use of boost regex will be covered further in a future article, but here is a sample use in the form of a C++ regex.
1. Verify the IP address
#include <regex> #include <iostream> #include <string> bool IsIpV4Address(const STD ::string& strIp) { The "\" in the "\." here is the escape character, said this is a const STD: : regex pattern (" (\ \ d {1, 3}) {1} \. (\ \ d {1, 3}) {1} \. (\ \ d {1, 3}) {1} \. (\ \ d {1, 3}) {1} "); Return STD ::regex_match(strIp, pattern); } int main() {STD ::string strIp1 = "134.34.34.4"; //192.168.1.1 STD ::string strIp2 = "192.168.255"; std::cout << strIp1 << " : " << (IsIpV4Address(strIp1) ? "valid" : "invalid") << std::endl; std::cout << strIp2 << " : " << (IsIpV4Address(strIp2) ? "valid" : "invalid") << std::endl; return 0; }Copy the code
Regular expressions in Java
1. Verify that a string is a URL
public static boolean isUrl(String text) { String regex = "^http://([\\w-]+.) +[\\w-]+(/[\\w-./?%&=#]*)? $"; return text.matches(regex); }Copy the code
2. Determine how many urls there are in a text and add hyperlinks to all urls.
Such as the following text:
C++(VS2013 compiler) : msdn.microsoft.com/zh-cn/libra…
Java: docs.oracle.com/javase/7/do…
JavaScript: www.w3school.com.cn/jsref/jsref…
After adding the link, it becomes:
++(VS2013) :
“> msdn.microsoft.com/zh-cn/libra…
Java: < a href = “docs.oracle.com/javase/7/do…
“> docs.oracle.com/javase/7/do…
JavaScript: < a href = “www.w3school.com.cn/jsref/jsref.
“> www.w3school.com.cn/jsref/jsref…
@param text Specifies the String to link to. @param url specifies the url to link to text, String url) { return "<a href=\"" + url + "\">" + text + "</a>"; } /** * find the URL string in the text, Public static String AddLinkToText(String text) {Pattern Pattern = Pattern.compile("http://([\\w-]+.) +[\\w-]+(/[\\w-./?%&=#]*)?" ); Matcher matcher = pattern.matcher(text); StringBuffer sb = new StringBuffer(); While (matcher.find()) {String matchedSubStr = matcher.group(); // Define a character buffer to hold the new text. / / to extract the substring to find the matcher. AppendReplacement (sb, AddHref (matchedSubStr matchedSubStr)); AppendTail (sb); // Insert a link into the character buffer} matcher.appendTail(sb); return sb.toString(); }Copy the code
Regular expressions in JavaScript
Regular expressions in JavaScript are implemented through RegExp objects. RegExp objects can be created in one of three ways:
Simplified mode:
/pattern/attributes
New way:
new RegExp(pattern, attributes);
How to call a function:
RegExp(pattern, attributes);
The pattern argument can be either a pattern string or a RegExp object. If pattern itself is an object of a RegExp, the Attributes argument has no effect. (The newly created object must be the same as the Pattern object.) If not, TypeError will be thrown.
The attributes parameter has three attributes “G”, “I”, and “m” that specify global matching, case-sensitive matching, and multi-line matching, respectively.
RegExp has three main methods:
compile |
Compile regular expressions, which can be used to change and recompile regular expressions. |
exec |
Retrieves the value specified in the string. Returns the value found and determines its position. |
test |
Retrieves whether a specified value exists in the string. Returns true or false. |
1. Verify that the string is a number
<script type="text/javascript">
function isNumber(text) {
var pattern = new RegExp("^\\d*$");
return pattern.test(text);
}
var value1 = "1234";
document.write(value1 + " is Numed:" + isNumber(value1));
</script>Copy the code
2. Email format:
< script type = "text/javascript" > function isEmail (text) {var reg = / ^ ([\ w -]) + @ ([-] \ w) + ((\. [\ w -] {2, 3})} {1, 2) $/; return reg.test(text); } var value2 = "[email protected]"; document.write(value2 + " is Email:" + isEmail(value2)); </script>Copy the code
3. Print all email addresses and locations in a text to the page
<script type="text/javascript"> function PrintEmail(text) {// The RegExp (" ([\ \ w -]) + @ ([-] \ \ w) + ((\. [\ \ w -] {2, 3})} {1, 2) ", "g"); var result; While ((result = reg.exec(text))! = null) { document.write(result[0] + "<br/>" + result.index); document.write("<br/><br/>"); }} var text = "[email protected]; Li si [email protected]; Fifty [email protected] "; PrintEmail(text); </script>Copy the code
Application scenarios
Data verification:
For example, you can check the input string to see if it is in the format of a phone number or an email address. This is often used in form entry on web pages.
Find substring:
You can find substrings within a document (or within a string) that match a specified pattern.
Alternate text:
You can use regular expressions to identify specific content in a document, remove that content entirely, or replace it with another string.
Tools:
Word, NotePad++, EditPlus, and other text editors all support regular expressions, which allow you to do more diverse searches. Development tools like VS, CodeBlock, Eclipse, Intellij Idea, and other IDE find-and-replace features also support regular expressions, which allow you to change variable names, reformat code, count lines of code, and so on.
Summary of common regular expressions
Common regular expressions have been summarized by many people, and there are so many available online that I don’t need to write any more. Here is a summary that I think is good.
This part is reproduced from: www.cnblogs.com/zxin/archiv…
One, the expression of the check number
Number 1: ^ [0-9] * $2 n bit Numbers: ^ \ d {n} $3 at least n bit Numbers: ^ \ d {n,} $4 m - n bit Numbers: ^ \ d {m, n} $5 the number of zero and non-zero start: ^ (0 | [1-9] [0-9] *) $6 non-zero start up with two decimal Numbers: ^ ((1-9] [0-9] *) + (. [0-9] {1, 2})? $7 Positive or negative number with 1-2 decimal places: ^(\-)? \ d + (\ \ d {1, 2})? $8 positive, negative, and decimal: ^ (\ | \ +)? \d+(\.\d+)? $9 is a positive real number with two decimal places: ^[0-9]+(.[0-9]{2})? $10 has 1 to 3 decimal places: ^[0-9]+(.[0-9]{1,3})? $11 a non-zero positive integers: ^ 1-9] [\ d * $or ^ (1-9 [] [0-9] *) {1, 3} $or ^ \ +? [1-9] [0-9] * $12 non-zero negative integers: ^ \ - [1-9] [] 0-9 "* $or ^ - [1-9] \ d * $13 nonnegative integers: + $or ^ ^ \ d \ [1-9] d * | 0 $14 a positive integer: ^ - [1-9] \ d * | 0 $or ^ ((\ d +) | (0 +)) $15 nonnegative floating-point number: ^ \ d + (. \ \ d +)? $or ^ [1-9] \ d * \ \ d * \ | 0. [1-9] \ d \ d * * | 0? \. | 0 0 + $16 is a floating point number: ^ ((\ d + (. \ \ d +)?) | (0 + (\. 0 +)? ) $or ^ (- (1-9] [\ d * \ \ d * \ | 0. [1-9] \ d \ d * *)) | 0? \. | 0 0 + $17 is floating point Numbers: ^ 1 - [9] \ d * \ \ d * \ | 0. 1-9] [\ \ d * d * $or ^ (([0-9] + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \ [0-9] +) | ([0-9] * [1-9] [0-9] *)) $18 Negative floating point: ^ - (1-9] [\ d * \ \ d * \ | 0. 1-9] [\ \ d * d *) $or ^ (- (([0-9] + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \ [0-9] +) | ([0-9] * [1-9] [0-9] *))) $ 19 Floating point: ^(-? \d+)(\.\d+)? $or ^ -? ([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0? $\. | 0 + 0)Copy the code
Second, the expression of the verification character
1 Chinese characters: ^ [\ u4e00 - \ u9fa5] {0} $2 English and Numbers: ^ [A Za - z0-9] + $or ^ [A Za - z0-9] 40 {4} $3 length is 3-20 all characters: ^. {3, 20} $4 string composed of 26 English letters: ^[A-zA-z]+$5 A string of 26 uppercase letters: ^[A-z]+$6 A string of 26 lowercase letters: ^[A-z]+$7 A string of 26 digits and 26 letters: ^[a-za-z0-9]+$8 string of digits, 26 letters, or underscores: ^\w+$or ^\w{3,20}$9 ^[\ U4E00-zA-z0-9_]+ 10 Symbols in Chinese, English and numerals but not underlined: ^ [\ u4E00 - \ u9FA5A - Za - z0-9] + $or ^ [\ u4E00 - \ u9FA5A - Za - z0-9] {2, 20} $11 can input contains & ^ % ',; =? $\" : [^%&',;=?$\x22]+ 12 Do not enter characters containing ~ : [^~\x22]+Copy the code
Special requirements expression
1 Email address: ^ \ w + (\ w + / - +.]) * @ \ w + ([-] \ w +) * \ \ w + ([-] \ w +) * $2 domain name: [a - zA - Z0-9] [9] - a - zA - Z0 - on conversion {0} (/. [a zA - Z0-9] [9] - a - zA - Z0 - on conversion {0}) + /.? 3 InternetURL: [a zA - z] + : / / [^ \ s] * or ^ http:// ([-] \ w + \.) +[\w-]+(/[\w-./?%&=]*)? $4 Mobile Phone Number: ^(13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\d{8}$ 5 Phone number (" XXX - XXXXXXX ", "XXXX - XXXXXXXX", "XXX - XXXXXXX", "XXX - XXXXXXXX", "XXXXXXX" and "XXXXXXXX) : ^ (\ (\ d {3, 4} -) | \ d {} 3.4 -)? \ d {7, 8} $6 domestic telephone number (0511-4405222, 021-87888822) : \ d {3} - \ d {8} | \ d {4} - \ d {7} 7 id number (15, 18 digits) : ^ \ d {15} | \ d {and} $8 short id number (Numbers, the letter "x" at the end) : ^ ([0-9] {7} 16 (x | x)? $or ^ \ d {8} 16 | x [0-9] {8} 16 | x [0-9] {8} 16? $9 whether the account is valid (starts with a letter and contains 5-16 bytes and alphanumeric underscores) : ^[a-za-z][a-za-z0-9_]{4,15}$10 password (starts with a letter and must contain 6 to 18 letters, digits, and underscores) : ^[a-za-z]\w{5,17}$11 strong password (must contain uppercase and lowercase letters and digits, cannot use special characters, between 8 and 10 characters in length) : ^(? =.*\d)(? =.*[a-z])(? =. * [a-z]). 8, 10 {} $12 date format: ^ \ d {4} \ d {1, 2} - \ d {1, 2} 13 12 months of the year (01 ~ 09 and 1 ~ 12) : ^ (0? [1-9] | 1 [2-0]) at $14 a month of 31 days (01 ~ 09 and 1 ~ 31) : ^ ((0? [1-9]) | | 2 (1) ([0-9]) | | 30 31) $15 money input format: 1. There are four forms of money we can accept :"10000.00" and "10,000.00", and "10000" and "10,000" without "cent" : ^[1-9][0-9]*$17.2. This means that any one does not begin with 0, but it also means that a character "0" is not through, so we use the form below: ^ (0 | [1-9] [0-9] *) $18 3. A 0 or a number that doesn't start with 0. We can also allow the beginning there is a minus sign: ^ (0 | -? [1-9][0-9]*)$19 4. This represents a 0 or a possibly negative number that does not start with a 0. Let the user start with 0. Let's get rid of the minus, because money can't be negative. ^[0-9]+(.[0-9]+)? $20 5. Must be behind the decimal point should be at least 1 digit, so it is not through "10.", but "10" and "10.2" is through: ^ [0-9] + (. [0-9] {2})? $21 6. So we must have two decimal point behind, if you think that is too harsh, it can be: ^ [0-9] + (. [0-9] {1, 2})? $22.7. This allows the user to write only one decimal digit. Below we should consider a comma in the digital, we can be like this: ^ [0-9] {1, 3} ([0-9] {3}), * (. [0-9] {1, 2})? $8.1 to 3 number 23, followed by any comma + 3 Numbers, commas become optional, rather than having to: ^ ([0-9] + | [0-9] {1, 3} ([0-9] {3}), * (. [0-9] {1, 2})? $24 Note: This is the final result, remember that "+" can be replaced with "*" if you think empty strings are acceptable (strange, why?). Finally, don't forget to get rid of the backslash when you use the function. The usual error is here. + [a zA - Z0-9] + \ \ | x [x] [m | m] [l] | l $26 regular expressions of Chinese characters: [\ u4e00 - \ u9fa5] 27 double-byte characters: [^\x00-\ XFF] (including Chinese characters, which can be used to calculate the length of a string (a two-byte character is 2, ASCII character is 1)) 28 Regular expression for blank lines: \n\ S* \r (which can be used to delete blank lines) 29 Regular expression for HTML tags: <(\ s*?) [^ >] * >. *? < 1 > / \ | <. *? /> < span style = "box-sizing: border-box; color: RGB (74, 74, 74); ^ | \ \ s * s * $or (^ \ s *) | (\ s * $) (can be used to delete rows first line of white space characters (including Spaces, tabs, form-feed character, etc.), very useful expressions) 31 tencent QQ number: [1-9] [0-9] {4} (tencent QQ number since 10000) 32 China zip code: [1-9]\d{5}(? ! \ d) (China postal code is 6 digits) 33 IP address: \ d + \ \ d + \ \ d + \ \ d + 34 IP address (to extract the useful IP address) : ((? : (? :25[0-5]|2[0-4]\\d|[01]? \\d? \\d)\\.) {3} (? :25[0-5]|2[0-4]\\d|[01]? \\d? \\d)) (provided by @Feilongsanshao, thanks for sharing)Copy the code
Write in the back
Regular expressions are a very powerful and very common programming technique, and this article is just a primer on some of the most common. It is so vast that regular expressions for each programming language could be described in its own book.
If you have any questions or ideas, please give feedback in the comments section, your feedback is the best reviewer! Due to my limited technology and ability, if there are mistakes or deficiencies in this blog, please understand and give your valuable suggestions!
= = = = = = = = = = = = = = = = = = = = = = = = programming thought series review = = = = = = = = = = = = = = = = = = = = = = = =
Iterators for programming ideas
Recursion of programming ideas
Callbacks to programming ideas