Regular expression

The regular expression itself is a pattern of matching, using computer language to describe the structure we need to match

preface

Regular expressions are a powerful technology, especially in dealing with the text to have effective cleansing, this article is after reading a lot of materials, to their introduction to regular material, mainly refer to the old yao’s regular mini book, thank you very much for the selfless dedication, bosses are recommended to read the original book, I just written, and injected some understanding of regular, suitable for entry

Regular expression syntax

Regular expressions can match characters or positions in terms of their matching form. The following sections will learn about the two

The matching characters

Transverse fuzzy matching

You can use quantifiers to specify the number of times a fragment occurs. The number of times affects the length of the string, because this is called horizontal fuzzy matching

The sample

var regex = / ab} {2 and 5 c/g ;   //g contains one A, 2-5 b, and one C

var string = "abc abbc abbbc abbbbc abbbbbc abbbbbbc";
console.log( string.match(regex) );

// Only strings with 2 to 5 b's in between are matched
// => ["abbc", "abbbc", "abbbbc", "abbbbbc"]
Copy the code

quantifiers

Specify the number of characters to be matched. Some common number expressions can be replaced by equivalent symbols, as shown in the following figure:

Longitudinal fuzzy matching

Indicates that the same position can match more than one possible character

The sample

var regex = /a[123]b/g;  // Matches a string that starts with a and ends with b and contains one of 123 characters

var string = "a0b a1b a2b a3b a4b";
console.log( string.match(regex) );

// Match the result
// => ["a1b", "a2b", "a3b"]
Copy the code

Character groups

Common vertical fuzzy matching set alias

Character groups

Character matching case analysis

1. Match the date

Matching the 2017-06-10

Analysis of the

[0-9]{4}

Month, divided into 0 s and 1 0 [1-9] beginning | 1 [2-0]

, divided into 0, 1, 2, 3, 0 start | [1-9] [12] [1-9] | 3 [01]

regular

var regex = / ^ [0-9] {4} - (0 [1-9] | [0-2] 1) - (0 | [1-9] [12] [0-9] [01]) | 3 $/;
console.log( regex.test("2017-06-10"));// => true
Copy the code

2. The match id

Analysis of the

Id =”.” but because it is a greedy match, it will match up to the last double quote

id=”.*?” Lazy matching can be used to solve this problem, but it is inefficient

Id = “[^”] * “best

regular

var regex = /id="[^"]*"/
var string = '<div id="container" class="main"></div>';
console.log(string.match(regex)[0]);
// => id="container"
Copy the code

Position matching

Position represents the position between adjacent characters in the re, with the following stroke points

^

Represents the beginning of a string, and multi-line characters represent the beginning of a line

$end

Multi-line characters represent the beginning of a string and the end of a line

Below we can replace the beginning and end of the string with ‘#’

A single

var result = "hello".replace(/^|$/g.The '#');
console.log(result);

// => "#hello#"
Copy the code

Multiple lines

var result = "I\nlove\njavascript".replace(/^|$/gm.The '#');
console.log(result);

/*
#I#
#love#
#javascript#
*/
Copy the code

\b Word boundary

\b is the boundary between \w and \w, including \w and ^, and \w and $.

var result = "[JS] Lesson_01.mp4".replace(/\b/g.The '#');
console.log(result);
// => "[#JS#] #Lesson_01#.#mp4#"
Copy the code

\B Non-word boundary

var result = "[JS] Lesson_01.mp4".replace(/\B/g.The '#');
console.log(result);
// => "#[J#S]# L#e#s#s#o#n#_#0#1.m#p#4"
Copy the code

(? =p) preemptive assertion

Such as? =l), indicating the position before the “L” character, excluding p-pattern matching characters

var result = "hello".replace(/ (? =l)/g.The '#');
console.log(result);
// => "he#l#lo"
Copy the code

(? ! (is p)? =p) take negative antecedent negation assertion

Positions other than those in front of the matching P pattern

var result = "hello".replace(/ (? ! l)/g.The '#');
console.log(result);
// => "#h#ell#o#"
Copy the code

(? <=p) after assertion

The position must be preceded by characters matching the P mode, excluding characters matching the P mode

var result = "hello".replace(/ (? <=l)/g.The '#');
console.log(result);
// => "hel#l#o"
Copy the code

(? <! P) subsequent negative assertion

The position must be preceded by matching p mode, and the rest of the position

var result = "hello".replace(/ (? <=l)/g.The '#');
console.log(result);
// => "#h#e#llo#"
Copy the code

Case study of position matching

1. Digit thousands separator notation

For example, change “12345678” to “12,345,678”.

Analysis of the

This match looks like it matches the first 3 digits, and can be matched using a preemptive assertion. Okay? =\d{3}+

regular

var result = "12345678".replace(/ (? =(\d{3})+$)/g.', ')
console.log(result);
// => "12,345,678"

// If you try a multiple of 3, you'll see that it starts with,
var result = "112345678".replace(/ (? =(\d{3})+$)/g.', ')
console.log(result);
/ / = > ", 112345678"

// Restrict this position to not be the beginning
var regex = / (? ! (^)? =(\d{3})+$)/g;
result = "123456789".replace(regex, ', ');
console.log(result);
// => "123,456,789"
Copy the code

1. Implement the string trim method

Analysis of the

Trim is used to trim the whitespace between the first and last parts of a string

regular

function trim(str) {
  return str.replace(/^\s+|\s+$/g.' ')}console.log(trim(' foobar '))
Copy the code

Is the realization of fore and aft trim method to remove whitespace, so new trimStart trimEnd how to implement it, hee hee, you can think about it

Metacharacter escape problem

The escape problem is that some symbols have special meanings in the re. For example, ^ indicates the beginning of the string. How to express the string ^

Regular expression metacharacters

^, $., *, +,? , |, /, /, (,), [and], {and}, =,! , :, -,Copy the code

Matching behavior – greedy matching and lazy matching

The matching behavior here is determined by the state machine that converts our re into a computer language. The common ones are DFA and NFA. The most common one is NFA, and JavaScript is also a re engine implemented by NFA. Traversal possible matching string, once the next match fails, then back to the previous state, sounds like holding a story line of the ball of the labyrinth, backdating from intuitive thought can learned that will affect the efficiency, in JavaScript regular, common back form for greed, inert quantification, branch structure, the following will introduce in turn

Greed match

Maximum range matching

var regex = / \ d {2, 5} / g;
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) );
// => ["123", "1234", "12345", "12345"]
Copy the code

In the example, the numbers are matched as closely as possible, so the matched numbers are separated by whitespace

Lazy matching?

Minimum matching range

var regex = / \ d {2, 5}? /g;
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) );
/ / = > [" 12 ", "12", "34", "12", "34", "12", "34", "56"]
Copy the code

In the example, we add? After the quantifier of, the re is stopped after matching 2 numbers

Multiple branch

The subpattern either Belongs to the inertia matching specific form is as follows: (p1 | p2 | p3), p1, p2 and p3 is sub mode, use | (pipe) separated, said one of any of them.

var regex = /good|nice/g;
var string = "good idea, nice try.";
console.log( string.match(regex) );
// => ["good", "nice"]
Copy the code

parentheses

In many language grammars, parentheses are most commonly used to represent priorities. What are the special uses of parentheses in regular expressions?

Produce the whole

/ab+/ => a joins at least one bCopy the code

Branching structure

Express the possibility of branching

Represents either a P1 or p2 expressionvar regex = /^I love (JavaScript|Regular Expression)$/;
console.log( regex.test("I love JavaScript"));console.log( regex.test("I love Regular Expression"));Copy the code

Grouping reference

Use to capture the result of matching parentheses

1. Extract data

The matching string in parentheses can be referenced directly for a particular scenario

Extract year month day

var regex = /(\d{4})-(\d{2})-(\d{2})/g;
var string = "2017-06-12";
console.log( string.match(regex) );
console.log( RegExp. $1 ); / / 2017
console.log( RegExp. $2 ); // 06
console.log( RegExp. $3 ); / / 12
Copy the code

2. Replace

Date replacement format

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, "$2 / $3 / $1");
console.log(result);
/ / = > "06/12/2017"
Copy the code

backreferences

You can refer to the grouping results that appeared earlier

Sometimes you need to reference the result of a previous match, such as the following to require consistent date separators

// 1 represents the result of a match in a group that occurs
var regex = /\d{4}(-|\/|\.) \d{2}\1\d{2}/;
var string1 = "2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // false
Copy the code

Non-capture parenthesis

No matching results are captured

var regex = / (? :ab)+/g;
var string = "ababa abbb ababab";
console.log( string.match(regex) );
// => ["abab", "ab", "ababab"]
Copy the code

Parentheses case

/ / the hump
function camelize (str) {
return str.replace(/[-_\s]+(.) ? /g.function (match, c) {
return c ? c.toUpperCase() : ' ';
});
}
console.log( camelize('-moz-transform'));// middle line
function dasherize (str) {
return str.replace(/([A-Z])/g.'- $1').replace(/[-_\s]+/g.The '-').toLowerCase();
}
console.log( dasherize('MozTransform'));Copy the code

Operator priority

Regular expression visualization

Emmmm, although it has a priority, sometimes it is difficult to understand. It is a good idea to use visualization tools to help us understand. Here is a regular expression visualization website, which can help you read regular expressions that you do not understand.

Regular expression modifier

The modifier describe
g Global matching, that is, finding all matches
y Global matching, that is, all matches are found, but this match requires that each substring be consecutively subscripted
i Ignore letter case
m Multi-line matching, used when matching multiple lines with ^ and $

g

The default is to stop at the first matching character. Adding the G modifier will find all matching characters in the string, as shown in the following example

var regex = /\d/;
var string = "123";
console.log( string.match(regex) );  // [ '1', index: 0, input: '123' ]

var regex = /\d/g;
var string = "123";
console.log( string.match(regex) );  // ['1', '2', '3']
Copy the code

y

Consistent with the behavior of G, find a globally matched substring, but y has a special behavior that requires each matched substring to start at the end of the previous substring. Take a look at the following example

var s = 'aaa_aa_a';
var r1 = /a+/g;
var r2 = /a+/y;

r1.exec(s) // ["aaa"]
r2.exec(s) // ["aaa"]

r1.exec(s) // ["aa"]
r2.exec(s) // null
Copy the code

i

The meaning of I is relatively simple

var regex = /\A/i;
var string = "a";
console.log( string.match(regex) ); // [ 'a', index: 0, input: 'a' ]

var regex = /\A/i;
var string = "A";
console.log( string.match(regex) ); // [ 'A', index: 0, input: 'A' ]
Copy the code

m

M is just to make the ^ and $the heads and the ends of the lines

var regex = /^A$/g;
var string = "A\nA";
console.log( string.match(regex) ); // null

var regex = /^A$/mg;
var string = "A\nA";
console.log( string.match(regex) ); // [ 'A', 'A' ]

Copy the code

Regular expression programming

After we use regular expression matching, JavaScript provides a few operations for us to use, which are described below

Starting the API exec

Exec is the most basic API in regular expression programming, which has the ability of string matching and iterating. The subsequent API can be understood as the EXEC-encapsulated API of a specific scenario. By default, it returns the string matched for the first time

/ * * returns regular matching string * @ param string string * @ the return of execution {RegExpExecArray | null} array or null * / regular execution
exec(string: string): RegExpExecArray | null;
Copy the code
let reg = /\d/g;
let s = "123456"
console.log(reg.exec(s)); // [ '1', index: 0, input: '123456' ]
console.log(reg.exec(s)); // [ '2', index: 1, input: '123456' ]
Copy the code

validation

Check if the target string has a substring that matches. The most common test is test, which returns a Boolean indicating the result of the test

 The /** method returns a Boolean value indicating whether there is a string matching the re in the given string * @param string the string being tested * @return {Boolean} Whether there is a matching substring */
test(string: string): boolean;
Copy the code

You can think of it as code

// If there are one or more matches, the first match will match the result, otherwise return null
RegExp.prototype.test = function (str) {
  return!!!!!this.exec(str)
}

let reg = /\d/;
let s = "123456"
console.log(reg.test(s)); // true
Copy the code

The instance

var regex = /\d/;
var string = "abc123";
console.log( regex.test(string) ); // true
Copy the code

segmentation

Shard at the matching identifier position

/** Method returns the delimited array of substrings according to the given re * @param separator separator, You can make a string or a regular expression * @param limit return the length of the resulting array * @return {string []} split string array */
split(separator: string | RegExp, limit? : number): string[];Copy the code

This can be interpreted as the following code

String.prototype.split = function(reg, limit) {
  let curIndex = - 1
  // This is a String object and needs to be unpacked
  let str =  this.valueOf()
  let splitArr = []
  // Execute the exec method
  while(result = reg.exec(str)) {
    let findIndex = result.index 
    // The delimiter is adjacent
    const isadjoin =  curIndex + 1 === findIndex
    // The characters between the delimiters
    let splitMiddleStr = str.substring(curIndex + 1, findIndex)
    // Split the character
    let splitedStr = isadjoin ? "" :  splitMiddleStr
    splitArr.push(splitedStr)
    curIndex = findIndex
  }
  return splitArr.slice(limit)
}
Copy the code

The instance

var regex = /, /;
var string = "html,css,javascript";
console.log( string.split(regex) );
// => ["html", "css", "javascript"]

var regex = /, /;
var string = "html,css,javascript";
console.log( string.split(regex, 1));// => ["html"]
Copy the code

Pay attention to the point

1. Use the re shard. If the re contains capture parentheses, the result will have the re matching part

var string = "html,css,javascript";
console.log( string.split(/ / (,)));// =>["html", ",", "css", ",", "javascript"]
Copy the code

extract

Extract some data after matching. Match is commonly used

The /** * method matches the string with the given re, and returns an array of matches, Otherwise it returns null * @ param regexp string or regular object * @ return {RegExpMatchArray | null} to match the result array or null * /
match(regexp: string | RegExp): RegExpMatchArray | null;
Copy the code

This can be interpreted as the following code

var string = "2017.06.27";
var regex1 = /\b(\d+)\b/;
var regex2 = /\b(\d+)\b/g;

String.prototype.match = function (reg) {
  let str = this.valueOf()
  // Whether the g modifier is included
  let isGlobal = reg.global
  let result = []
  let curString = ""
  if(isGlobal) {
    // Returns an array of all matching strings
    while(curString = reg.exec(str)) {
      result.push(curString[1])}}else {
    // Returns an array of strings matched for the first time
    result = reg.exec(str)
  }
  return result
}
console.log(string.match(regex1)); / / [' 2017 ', '2017', the index: 0, input: '2017.06.27]
console.log(string.match(regex2)); // ['2017', '06', '27']
Copy the code

The instance


var regex = /^(\d{4})\D(\d{2})\D(\d{2})$/;
var string = "2017-06-26";
console.log( string.match(regex) );
// =>["2017-06-26", "2017", "06", "26", index: 0, input: "2017-06-26"]

Copy the code

Pay attention to the point

1. Match converts the string of the first argument into a re

var string = "2017.06.27";
console.log( string.match("."));// => ["2", index: 0, input: "2017.06.27"]
// Need to be modified to one of the following forms
console.log( string.match("\ \."));console.log( string.match(/ /. /));// => [".", index: 4, input: "2017.06.27"]
// => [".", index: 4, input: "2017.06.27"]
Copy the code

2. Match the return value

var string = "2017.06.27";
var regex1 = /\b(\d+)\b/;
var regex2 = /\b(\d+)\b/g;
console.log( string.match(regex1) );
// => ["2017", "2017", index: 0, input: "2017.06.27"] without g returns the first matched string, the captured contents of the group, the first matched subscript of the whole, and the input target string

console.log( string.match(regex2) );  // return all matches with g
// => ["2017", "06", "27"]

Copy the code

replace

Replace the matching information for processing, using replace

/** *; /** *; Replace the matching string with the replacement value * @param searchValue The string to be matched or the re * @param replaceValue Replacement string * @return {string} Replacement string */
replace(searchValue: string | RegExp.replaceValue: string): string;

/** * Method according to the matching rules, * @param searchValue needs to match the string or re * @param replacer replacement * @return {string} replaced string */
replace(searchValue: string | RegExp.replacer: (substring: string, ... args: any[]) = > string): string;
Copy the code

Replace is an API for a specific scenario

var string = "2017.06.27";
var regex2 = /\d+/g;

String.prototype.replace = function (reg, replaceValue) {
  // Where this is a String object, valueOf is used to fetch the String value
  let str = this.valueOf()
  // Split the string with the re first
  let strArr = str.split(reg)
  // Fill the replacement with replaceValue
  str = strArr.join(replaceValue)
  return str
}
console.log(string.replace(regex2, "1")); / / 1.1.1
Copy the code

The instance

var string = "2017-06-26";
var today = new Date( string.replace(/-/g."/"));console.log( today );
// => Mon Jun 26 2017 00:00:00 GMT+0800
Copy the code

Pay attention to the point

1. If the second parameter is a string, it has the following special characters

2. When the second argument is a function, the function passes the argument

// From left to right are matched strings, capture groups, match the start position of the string, and enter the string
[match, $1, $2, index, input]

let  a = "1234, 2345, 3456".replace(/(\d)\d{2}(\d)/g.function (match, $1, $2, index, input) {
  console.log([match, $1, $2, index, input]);
});
// => ["1234", "1", "4", 0, "1234 2345 3456"]
// => ["2345", "2", "5", 5, "1234 2345 3456"]
// => ["3456", "3", "6", 10, "1234 2345 3456"]
Copy the code

3. Replace can be nested

In some requirements, we inevitably need multiple regex processing, such as finding a whole and replacing a part of the whole to narrow the scope of influence. We can match a substring with replace first, and then narrow the scope again

let domStr = `
      
`
// match style=" let styleRegex = /style="[^"]*"/g let result = domStr.replace(styleRegex, function(style) { console.log(style); let isoffFontFamily = /font\-family\:([^;] )*(Times New Roman)+([^;"] ) * /; return style.replace(isoffFontFamily, ""); }) console.log(result); // <div style="" class="333"> //
//
// // </div> Copy the code

Constructors and instances

1. Constructors are not used to generate regular expressions. Literals are preferred, unless regular expressions need to be generated dynamically

2. Modifiers have their corresponding object Boolean property surface enabled or not

The modifier Instance attributes
g global
y sticky
i ingnoreCase
m multiline

3. You can use the Source instance property to view dynamically built regular expression results

var className = "high";
var regex = new RegExp("(^|\\s)" + className + "(\\s|$)");
console.log( regex.source )
/ / = > (^ | \ s) high (\ | s $) the string "(^ | \ \ s) high (\ \ s | $)"
Copy the code

Regular expression construction

How to build a regular expression for a problem

The laws of

1. Check whether regex is necessary

After learning something new, it’s easy to fall into the trap of doing everything, and the same is true of regees. In many cases, we can use the string API to solve the problem without the regees. Here are some examples

var string = "2017-07-01";
var regex = /^(\d{4})-(\d{2})-(\d{2})/;
console.log( string.match(regex) );
// => ["2017-07-01", "2017", "07", "01", index: 0, input: "2017-07-01"]

var result = string.split("-");
console.log( result );
// => ["2017", "07", "01"]
Copy the code

In the year month day example, we use the re to get the year month day, which adds complexity to the code compared to using delimiters.

2. Whether a strict match is necessary

Generally speaking, due to the complexity of the re, it is difficult to strictly match, should be combined with the scene, enough, or you can do some preprocessing strings, so that the matching difficulty is reduced.

The efficiency of 3.

1. Use specific character groups instead of wildcards to reduce backtracking

/.*/ Can match any character, but because of greedy nature, it is easy to backtrack behavior, such as the following match "ABC" using /".*"/ // 123" ABC "456Copy the code

In the matching process, backtracking was carried out for 4 times. Since backtracking needs to save the previous state, extra space is needed. In addition, backtracking behavior can be intuitively seen to affect efficiency, and /”[^”]*”/ can be avoided

2. Determine the character independently to speed up the matching judgment speed

/a+/ // If you can determine that the character a exists, you can rewrite the following re to speed up the matching judgment speed /aa+/Copy the code

3. Extract the common part of multiple selection branches to reduce the duplication that can be eliminated in the matching process

/^abc|^def/ / / modify/ ^ (? | : ABC def) /.// Multiple selection branches are lazy. When a branch is not matched, other branches will be tried. The public part will also be matched
Copy the code

4. Use non-capture parentheses

When we do not need to capture the content in parentheses, we can use non-capture parentheses to save the memory where the user would have saved the captured content

practice

1. Quickly find all the re’s in Vue source code

Try a

/ / /. * \//

// This matches comments, paths, and domain names

/ / / * * /
Copy the code

The results of

Try 2

/\/(\S)+\//

// A non-empty string is matched and must occur at least once

/ / / * * /
Copy the code

The results of

That’s a lot less, but still matching non-regular/GGG/is probably just part of the path, but the amount we need to check is acceptable.

conclusion

Regular expressions can saying is very complicated, this article is just after I read the information on the regular knowledge itself, as for to really use, still need more practice, to process the text, regular will be more handy, this article is in their face a business need to deal with complex text cases, is regular, so leads to produce the product of, At the same time, I also hope that you can not be afraid of regular after understanding the general appearance of regular, can understand, and write some simple, finally I am not a big man oh, but sometimes the brain, want to learn more, ha ha, so really ask me how to write regular, I can only slip away (cross out).