Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems. Jamie Zawinski

The previous learning process of regular expressions is a process of repeated learning and forgetting. I feel that a certain point needs regular expressions to solve, but I have to look them up again and rethink them again when writing regular expressions. This paper reviews regular expressions to deepen my understanding and memory of regular expressions.

What is a regular expression

A regular expression is a tool for matching strings. When we need to match a specified character, we may use a regular expression. I still remember when I was a child searching for files on Windows, I would type *. Exe to find games on the computer, and unconsciously used * as a regular expression to match any character.

Imagine we need to extract the following number to achieve string.prototype.split (‘-‘)

0101- 1232.- 2123.
Copy the code

We need three steps

  1. First match the model, that is, four consecutive numbers, separated by -.
/\d{4}\-\d{4}\-\d{4}/.test('0101-1232-2123')
Copy the code
  1. Extract three sets of numbers
/(\d{4})\-(\d{4})\-(\d{4})/.exec('0101-1232-2123')
Copy the code
  1. Convert the result to an array
const result = Array.from(/(\d{4})\-(\d{4})\-(\d{4})/.exec('0101-1232-2123')).slice(1);
/ / [0101, 1232, 2123]
Copy the code

With that in mind, let’s extract a time, such as 8:17:23

If it is 6 o ‘clock, we can say it is 06 or 6. It may also start with 1, such as 10 o ‘clock, or it may start with 2, such as 21 o ‘clock. We need to use or | to list all the time, so we match the 8 patterns for the above

/ [0-9] [0-9] | | 0 1 [0-9] [0, 3] / | 2Copy the code

Once the pattern is matched, we use () to extract the matched characters and use ^ and $as header and tail qualifiers

const reg = / ^ (0 [0-9] [0-9] | 2 | 1 | [0, 3] [0-9]) \ [0-9] : [0 | 1 | 2 [0-9] [0-9] [0-9] | 3 | 4 [0-9] [0-9] | | 5 \ [0-9]) : (0 [0-9] [0-9] | 1 | 2 | 3 [0-9] [0-9] [0-9] | | 4 | [0-9] [0-9]) $/;
Copy the code

Finally, we convert it to an array

Array.from(reg.exec('11:17:23')).slice(1)

/ / [8, 17, 23]
Copy the code

This is not very convenient to extract and use. ES2018 supports the function of named extraction, for example, we can directly use result.group. Second to obtain 23. Named extract syntax is used to prefix extracted groups with?

const reg = / ^ (? 
      
       0[0-9]|1[0-9]|2[0-3]|[0-9])\:(? 
       
        0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])\:(? 
        0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])$/
       
      ;

const result = reg.exec('8:17:23') 

result.groups // {h: "11", m: "17", s: "23"}
Copy the code

The named and extracted contents are combined into objects that can be accessed directly through result.groups.h.

In addition to exec, we can also use the match method for “extract” operations, such as we extract the {{var}} variable in the following string

var str = `name: {{name}}, age: {{age}}, sex: {{sex}}`

str.match(/\{\{\w+\}\}/g) // output: ["{{name}}", "{{age}}", "{{sex}}"]
Copy the code

A template engine

Template engines play an important role in Web frameworks. Various template engines emerge in an endless stream. Pug, Handlebars, EJS, and template engines are used to make strings express logically. I am {name}, and {age} years old. What if I want to replace {name} and {age}?

function tpl2string(tpl, data){
  let reg = /(\{\w+\})/g
  return str = str.replace(reg, function(){
    return data[arguments[1].slice(1.- 1)]
  })
}

tpl2string('I am {name} and {age} years old', {
  name: 'max'.age: '15'
})

// output: "I am max and 15 years old"
Copy the code

Match the behavior

A regular expression defaults to greedy matching, which means matching as many characters as possible. Here’s an example:

The sentence your time is limited has 3 Spaces, if we want to match the character before the space

/.+\s/.exec('your time is limited') [0]

// your time is
Copy the code

It matches the character your time is before the last space by default, so if we don’t want it to be greedy, we just want to match the character your before the first space we need to use? To make a non-greedy match

/. +? \s/.exec('your time is limited') [0]

// your 
Copy the code

Let’s define a match, if you want to match the blog of the top-level domain is extracted netlify in https://evle.netlify.com/netlify-usage.

const s = `https://evle.netlify.com/netlify-usage`
/(\w+)\.com/.exec(s)[1]

// output: netlify
Copy the code

We can also use more precise qualification: positive negative lookup, the string must be followed by.com, and the syntax is x(?=y).

/\w+(? =\.com)/.exec(s)[0] output: netlify
Copy the code

We can also match the number before and after the decimal point in 3.1415926

/\d+(? = \.) /.exec("3.1415926") [0]  / / before/\d+(? ! \.) /.exec("3.1415926") [0]  //. After the syntax x(? ! y)
Copy the code

replace

Replacing a regular expression match with a specified character is a common scenario, such as

'papa'.replace(/p/g.'m'); // output: mama
Copy the code

For example, to replace the position of 2 characters, we can temporarily store the matched variables with $1, $2 and then swap them

"Liskov, Barbara\nMcCarthy, John\nWadler, Philip"
    .replace(/(\w+), (\w+)/g."$2 $1"));

//output:
// Barbara Liskov
// John McCarthy
// Philip Wadler
Copy the code

We can also use functions to add some logic to the substitution, as we did in the template engine earlier

let stock = "1 lemon, 2 cabbages, and 101 eggs";
function minusOne(match, amount, unit) {
  amount = Number(amount) - 1;
  if (amount == 1) { // only one left, remove the 's'
    unit = unit.slice(0, unit.length - 1);
  } else if (amount == 0) {
    amount = "no";
  }
  return amount + "" + unit;
}
console.log(stock.replace(/(\d+) (\w+)/g, minusOne));
// → no cabbage, 1 cabbage, 100 eggs
Copy the code

Regular expressions can also be created dynamically. For example:

let name = "harry";
let text = "Harry is a suspicious character.";
let regexp = new RegExp("\\b(" + name + ")\\b"."gi");
console.log(text.replace(regexp, "_ _ $1"));
// output: _Harry_ is a suspicious character.
Copy the code

The lastIndex

The lastIndex attribute in the regular expression is often confusing. The purpose of lastIndex is to allow us to choose the starting point of the matching character. We can change lastIndex to adjust the starting point of the matching character

let reg = /\d+/g
reg.lastIndex = 3;
Copy the code

Note that lastIndex only takes effect when the regular expression is in G or Y mode, indicating that the search starts at the third subscript, but it has a side effect

var str="JavaScript";
var reg=/JavaScript/g;

console.log(reg.test(str));  // true
console.log(reg.test(str));  // false
Copy the code

The reason is that reg.lastIndex is already 10 if a match is found. If you start with the index 10, you can’t find it, so you need to manually set lastIndex to 0

reg.test(str)
reg.lastIndex = 0
reg.test(str)
Copy the code

From this weird use, we should realize that the use scenario is wrong. Yes, elegant

var str="JavaScript";
var find = "JavaScript"

str.includes(find)
Copy the code

Parse the file

Ini is a traditional configuration file on Windows. Let’s write an INI file parser using regular expressions to parse the data we need to use

function parseINI(string) {
  let result = {};
  let section = result;
  // Parse line by line
  string.split(/\r? \n/).forEach(line= > {
    let match;
    // Match key=value
    if (match = line.match(/^(\w+)=(.*)$/)) {
      section[match[1]] = match[2];
    // Matches the [address] type
    } else if (match = line.match(/ ^ \ [(. *) \] $/)) {
      section = result[match[1]] = {};
    } else if (!/^\s*(; . *)? $/.test(line)) {
      throw new Error("Line '" + line + "' is not valid."); }});return result;
}

// Test the file
const config = ` name=Vasilis [address] city=Tessaloniki`

parseINI(config)

// output
// {name: "Vasilis", address: {city: "Tessaloniki"}}
Copy the code

Write it at the end

Regular expressions are very important both in the front end and in operation and maintenance. Although the pure text parsing of regular expressions may have some performance problems, it is undoubtedly a powerful tool for the use of scenarios. The core of grep, AWk, sed and other powerful text processing tools are regular expressions.

Now that you’ve seen it, give it a thumbs up 💗