Chapter 5, basic reference types
This is the 25th day of my participation in the August Challenge
5.3.3 String
2. The normalize () method
Some Unicode characters can be encoded in multiple ways. Some characters can be represented by either a BMP character or a proxy pair. Such as:
// U+00C5: uppercase Latin letter with A circle above A console.log(string.fromCharcode (0x00C5)); // A // U+212B: length unit "angstrom" console.log(string.fromCharCode (0x212B)); // A // U+212B: length unit "angstrom" console.log(string.fromCharCode (0x212B)); // A // U+004: Uppercase Latin A // U+030A: top with A circle console.log(string.fromCharcode (0x0041, 0x030A)); / / ACopy the code
The comparison operator does not care what characters look like, so the three characters are not equal.
let a1 = String.fromCharCode(0x00C5),
a2 = String.fromCharCode(0x212B),
a3 = String.fromCharCode(0x0041, 0x030A);
console.log(a1, a2, a3); // Å, Å, Å
console.log(a1 === a2); // false
console.log(a1 === a3); // false
console.log(a2 === a3); // false
Copy the code
To solve this problem, Unicode provides four normalized forms that normalize characters like the one above into a consistent format, regardless of the code of the underlying character. The four normalized forms are: There has been much discussion about the following aspects of the university: Normalization (Normalization), Normalization (Normalization), Normalization (Normalization), Normalization (Normalization), Normalization (Normalization), Normalization (Normalization), Normalization (Normalization), Normalization (Normalization) and Normalization (Normalization). You can apply the above normalized form to the string using the normalize() method, passing in a string representing what form: “NFD”, “NFC”, “NFKD”, or “NFKC”.
Note that the details of the four normalized Forms are beyond the scope of this book. Interested readers can refer to section 1.2, “Normalization Forms,” in UAX 15#: Unicode Normalization Forms.
You can tell if the string has been normalized by comparing it with the return value of a call to normalize() :
let a1 = String.fromCharCode(0x00C5), a2 = String.fromCharCode(0x212B), a3 = String.fromCharCode(0x0041, 0x030A); // U+00C5 is the result of normalizing 0+212B with NFC/NFKC console.log(a1 === a1.normalize("NFD")); // false console.log(a1 === a1.normalize("NFC")); // true console.log(a1 === a1.normalize("NFKD")); // false console.log(a1 === a1.normalize("NFKC")); // true // U+212B is unnormalized console.log(a2 === a2. Normalize ("NFD")); // false console.log(a2 === a2.normalize("NFC")); // false console.log(a2 === a2.normalize("NFKD")); // false console.log(a2 === a2.normalize("NFKC")); // false // U+0041/U+030A is the result of normalizing 0+212B with NFD/NFKD console.log(a3 === a3. Normalize ("NFD")); // true console.log(a3 === a3.normalize("NFC")); // false console.log(a3 === a3.normalize("NFKD")); // true console.log(a3 === a3.normalize("NFKC")); // falseCopy the code
Choosing the same normalized form causes the comparison operator to return the correct result:
let a1 = String.fromCharCode(0x00C5),
a2 = String.fromCharCode(0x212B),
a3 = String.fromCharCode(0x0041, 0x030A);
console.log(a1.normalize("NFD") === a2.normalize("NFD")); // true
console.log(a2.normalize("NFKC") === a3.normalize("NFKC")); // true
console.log(a1.normalize("NFC") === a3.normalize("NFC")); // true
Copy the code
3. String operation methods
This section describes several methods for manipulating string values. First is concat(), which is used to concatenate one or more strings into a new string. Consider the following example:
let stringValue = "hello ";
let result = stringValue.concat("world");
console.log(result); // "hello world"
console.log(stringValue); // "hello"
Copy the code
In this example, calling the concat() method on stringValue results in “Hello World”, but stringValue remains the same. The concat() method can accept any number of arguments, so it can concatenate multiple strings at once, as follows:
let stringValue = "hello "; let result = stringValue.concat("world", "!" ); console.log(result); // "hello world!" console.log(stringValue); // "hello"Copy the code
This modified example takes the strings “world” and “!” It’s appended to “hello “. Although the concat() method can concatenate strings, the more common way is to use the plus operator (+). And in most cases, it’s more convenient to concatenate multiple strings.
ECMAScript provides three methods for extracting substrings from strings: slice(), substr(), and substring(). Each of these three methods returns a substring of the string from which they were called, and each takes one or two arguments. The first argument represents where the substring starts, and the second argument represents where the substring ends. For slice() and substring(), the second argument is the location where the bundle is extracted (that is, the characters before that location are extracted). For substr(), the second argument represents the number of substrings returned. In any case, omitting the second parameter means extracting to the end of the string. Like the concat() method, slice(), substr(), and substring() do not modify the string from which they were called, but only return the extracted original new string value. Consider the following example:
let stringValue = "hello world"; console.log(stringValue.slice(3)); // "lo world" console.log(stringValue.substring(3)); // "lo world" console.log(stringValue.substr(3)); // "lo world" console.log(stringValue.slice(3, 7)); / / "lo w" is the console. The log (stringValue. Substring (3, 7)); // "lo w" console.log(stringValue.substr(3, 7)); // "lo worl"Copy the code
In this example, slice(), substr(), and substring() are called in the same way, and in most cases return the same value. If you pass only one argument, 3, then all methods will return “lo world” because the “L” position in “hello” is 3. If two arguments 3 and 7 are passed, slice() and substring() return “lo w” (because “o” in “world” is at position 7, not included), while substr() returns “lo worl” because the second argument to it represents the number of characters returned.
When a parameter is negative, the three methods behave differently. For example, the slice() method treats all negative arguments as string length plus negative argument values.
The substr() method, on the other hand, takes the first negative argument value as the string length and adds it, converting the second negative argument value to 0. The substring() method converts all negative argument values to 0. Look at the following example:
let stringValue = "hello world";
console.log(stringValue.slice(-3)); // "rld"
console.log(stringValue.substring(-3)); // "hello world"
console.log(stringValue.substr(-3)); // "rld"
console.log(stringValue.slice(3, -4)); // "lo w"
console.log(stringValue.substring(3, -4)); // "hel"
console.log(stringValue.substr(3, -4)); // "" (empty string)
Copy the code
This example clearly demonstrates the differences between the three approaches. Slice () and substr() return the same result when they are passed negative arguments. This is because -3 is converted to 8 (length plus negative arguments), actually calling slice(8) and substr(8). The substring() method returns the entire string, because -3 is converted to 0.
The three methods differ when the second argument is negative. The slice() method converts the second argument to 7, which is actually equivalent to calling slice(3, 7) and therefore returns “lo w”. The substring() method converts the second argument to 0, which is equivalent to calling substring(3, 0), which is equivalent to substring(0, 3), because it starts with smaller arguments and ends with larger ones. For substr(), the second argument is converted to 0, meaning that the returned string contains zero characters and therefore returns an empty string.