Concern public number: IT elder brother, read a dry goods technical article every day, a year later you will find a different self.

= = =

Case introduced

= = = = = = =

Before we get to regular expressions, let’s start with a scenario.

You may have experienced this: we go to some website to sign up for our account, and when you set up your password, the website will prompt you about the length range of your password and the corresponding rules (see picture below).

According to the figure above, we can describe the password setting rule as two conditions:

(1) The length is 6-16 bits;

(2) The password must contain digits, uppercase letters, lowercase letters, and special characters.

Now, assuming we don’t know regular expressions, how would you, as a programmer, implement such a password validation?

Here is a check method (sample) I wrote:

/** * Verify whether the user password meets the setting rules ** @param password Password entered by the user * @return true- Yes; False - does not meet the * / public static Boolean checkPassword (String password) {/ / password cannot be empty if (password = = null | | password.isEmpty()) { return false; } // Check password length(6-16 bits) int len = password.length(); if (len < 6 || len > 16) { return false; } // Define four combination conditions Boolean hasNumber = false; boolean hasSmallLetter = false; boolean hasBigLetter = false; boolean hasSpecialChar = false; Char [] chars = password.tochararray (); char[] chars = password.tochararray (); For (char c: chars) {// If (c >= '0' && c <= '9') {hasNumber = true; continue; HasSmallLetter = true; hasSmallLetter = true; hasSmallLetter = true; hasSmallLetter = true; hasSmallLetter = true; hasSmallLetter = true; hasSmallLetter = true; continue; If (c >= 'A' &&c <= 'Z') {hasBigLetter = true; continue; } // If ("~@#S%*_-+=:.?" .indexOf(c) > 0) { hasSpecialChar = true; continue; } // If a character is not in one of the four cases, the rule return false is not met. Return hasNumber && hasSmallLetter && hasBigLetter && hasSpecialChar; return hasNumber && hasSmallLetter && hasBigLetter && hasSpecialChar; }Copy the code

Is this method right? Let’s use several passwords to verify:

It can be seen that all the 8 passwords we listed have been verified, indicating that our method is OK.

But such a password setting rule verification, we almost write nearly 30 lines of code, does not feel a bit cumbersome? Is there a way to simplify our code when we write so much code for a simple rule? B: of course! And that’s where regular expressions, the star of the day, come in.

The following is a verification method based on regular expression with the same verification function:

/** * Verify whether the user password meets the setting rules using the regular expression. ** @param password Specifies the password entered by the user. * @return true- Yes; Public static Boolean checkPasswordByRegex(String password) {return Pattern. Matches ("^(? =. * ([0-9])? =.*[a-z])(? =.*[A-Z])(? =.*[~@#S%*_\\-+=:.?] )[A-Za-z0-9~@#S%*_\\-+=:.?] 8, 20} {$", "); }Copy the code

So does it get the base? Therefore, we can continue to call this method to verify with the sample data above:

As we can see from the results, he is also in line with our expectations. As a result, we found that when we did not use regular expressions, we had nearly 30 lines of code, but when we used regular expressions, our code was condensed to one line, which means that we could simplify our code when we used regular expressions.

If you don’t know regular expressions, you may not know how to read them, and if something goes wrong, you won’t be able to change them.

So it’s important to learn regular expressions so your colleagues can write them without thinking, “What the hell is this? Why don’t I get it?”

Regular expression

What is a regular expression? You might get a little bit of an idea from the example above. Yes, it describes certain rules in a single line of string (the arrow in the red box below).

Naming conventions

The name of a Regular Expression is regexp (singular) or regexps (plural), and the first few letters of the two words are used together.

Such as:

For example, in the Java String class, there are several related substitution methods, which also support regular expressions, and their parameters are named regex.

Structural components

Regular expressions are usually composed of some normal characters and some metacharacters.

Ordinary characters: when they are a character by themselves, they have no other meanings, such as uppercase and lowercase letters and numbers.

Metacharacters: metacharacters are characters that can express other meanings than themselves.

In fact, we learn regular expressions, mostly based on metacharacters learning.

Usage scenarios

Now that we’ve learned about regular expressions, what are the usage scenarios we can use?

(1) Do the rule verification of strings (for example, in the previous case introduction, we can verify whether a password conforms to the rule through regular expression).

(2) Perform string substitution (for example, remove all upper and lower case letters in a string, or replace them with specified symbols).

(3) Extract the characters needed in the string (for example, extract all the numbers in a string to form a new string).

Regular verification in Java

Regular expressions are mainly used to verify strings. In Java, you only need to use the following method to verify strings.

boolean result = Pattern.matches(regex, input);
Copy the code

Among them:

Regex is the regular expression verification rule we need to write;

Input is the string we are checking;

The returned result is the result of our verification. If true, the verification passes; if false, the verification fails.

Regular metacharacter

Re: ordinary characters

If the regular expression is a string of ordinary characters (excluding metacharacters), the check string will pass only if it is consistent with the regular.

The specific effects are as follows:

Note: the following example to save space, do not appear cumbersome, no longer paste code, only paste check results.

Regular: \ d

\ D represents a number.

Such as:

Aaa \ D: The authentication string must start with AAA and end with a number.

Aaa \ DBBB: There is a number between AAA and BBB

Aaa \ D \ D: AAA is followed by two numbers

 

Note: in Java, a \ is a string escape, so when Java defines a metacharacter with \, you need to write \, that is, \\. For other languages, you can refer to the relevant information for more information.

Regular: \ D

\D means a non-number, which is the exact opposite of the meaning \D above.

Such as:

\D\D: represents a string of length 3 without numbers.

111\D222: indicates that 111 and 222 must contain a non-digit.

Regular: \ w

\w represents a letter (case – and case-sensitive), number, or underscore.

Such as:

12\w45: indicates that 12 and 45 must contain a letter, number, or underscore.

Regular: \ W

\W as opposed to \W, the character that represents the position is neither a letter, number, or underscore.

That is, special characters (except underscores), or Spaces, etc.

Such as:

12\w45: indicates that there is a non-letter, non-digit, or non-underscore between 12 and 45.

Regular: \ s

\s means to match an invisible symbol, namely a space or Tab (Tab key)

Such as:

88\s99: indicates that 88 and 99 must be separated by a space or TAB.

(I won’t list TAB cases here because my editor replaces one TAB with four Spaces.)

Regular: \ S

\S, as opposed to \S, represents a visible symbol.

Such as:

88\S99: means there must be a visible symbol between 88 and 99.

Regular:.

The. (decimal point) represents any single character other than “\ n “and “\r”.

Such as:

. : indicates any four characters

Regular: |

| (vertical) has said or relationship, said testing string must meet one of them, to qualify.

Such as:

Aa | bb | cc: it means the input string is aa, bb, or or cc one of them.

 

Note that if we or the relationship have other characters before and after it, we need to wrap them with ().

Such as:

Xx (aa | bb | cc) yy: indicates the input string is the beginning of the xx, yy, and the middle is aa, bb, or or cc one of them.

Regular: [ABC]

[] matches any character.

Such as:

A [BCD]e: indicates that the middle of a and E must be b, C, or D

Note: use | said one of them, he can be a character, also can be a string. When brackets are used, only one character is represented.

Regular: [ABC] ^

[^] does not match any character in brackets.

Such as:

A [^ BCD]e: Indicates that a and E must contain all characters except B, C, and D.

Regular: [a-z]

[value 1- value 2] means that all characters between value 1 and value 2 are satisfied (including value 1 and value 2). This re is often used to represent ranges of upper and lower case letters and numbers.

Such as:

A [b-d]e: is the same as a[BCD]e, because b-d is actually b, C,d.

A [0-9]e: means a number between a and e, which is equivalent to a\de (previously said that d means a number)

 

Regular: [^ a-z]

[^ value 1- value 2] indicates that all characters except values 1 and 2 can be satisfied.

Such as:

A [^1-3]e: Indicates the intermediate characters between a and E. All characters are valid except 1, 2, and 3.

Regular: \ num

When followed by a number, the result matches the number of brackets.

For example, if we have the string abcd, after wrapping c in parentheses, we write \1 at the end of the string, that is, ab(c)d\1, \1 here refers to C because \1 represents the result in the first parentheses.

Ab (c) D \1: equivalent to abcdc.

If we continue to include the d in ab(c)d\1 and write \2 after it, that is, ab(c)(d)\1\2, then \2 here represents the character D, and since the second parenthesis results in d, the whole expression is equivalent to abcdcd.

Ab (c)(d)\1\2: equivalent to abcdcd, also equivalent to ab(CD)\1.

Regular:?

? Matches the preceding subexpression zero or once.

Such as:

abc? De: indicates that the string that can be matched is abDE (0 times C) or abcDE (1 times C).

Regular: +

Match the previous subexpression once or more (times >= 1, i.e., at least once)

Such as:

ABC +de: Ab and de are preceded by at least one C.

Regular: {n}

N here is a non-negative integer. Matches the preceding subexpression specified n times.

Such as:

ABC {3}de: indicates that there are three CS between AB and de.

Xx | ab (yy) {3} DE: ab and DE between xx and yy number, total three together.

Regular: {n, m}

Both m and n are non-negative integers, where n<=m. At least n times and at most m times are matched.

Such as:

ABC {2,3}de: indicates that there are two or three cs between ab and de.

Regular: *

Matches the preceding subexpression any number of times

Such as:

ABC *de: indicates that there is any number (including 0) c between AB and de.

If the article is helpful to you, please give the leader a free like bar, thank you.

Concern public number: IT elder brother, read a dry goods technical article every day, a year later you will find a different self.