This is the 15th day of my participation in the August Text Challenge.More challenges in August
Originally feel nothing, and see the little Red book see a lot of things about Unicode, to this knowledge point or relatively unfamiliar, so summed up the JavaScript character notes ~
JavaScript character
1. length
This is probably the most commonly used method; JavaScript strings consist of 16-bit codes, each of which corresponds to one character. That is, the length attribute of a string indicates how many 16-bit codes the string contains.
let nickname = "mannqo";
console.log(nickname.length); / / 6
Copy the code
2. charAt()
The charAt() method returns a character at the given index position, as specified by the integer argument passed to the method;
let nickname = "mannqo";
console.log(nickname.charAt(2)); // "n"
Copy the code
3. charCodeAt()
JavaScript strings use a hybrid strategy of two Unicode encodings: UCS-2 and UTF-16. For characters that can use 16-bit encoding (U+0000 ~ U+FFFF); Each letter or character has its own Unicode encoding; The charCodeAt() method displays the character encoding of the specified code element;
let message = "abcde";
// the corresponding code of c is U+0063
console.log(mes.charCodeAt(2)); // 99 (decimal 99 equals hexadecimal 63
Copy the code
4. fromCharCode()
andcodePointAt()
The fromCharCode() method is used to create a string of characters based on the given UTF-16 code element. This method can take any number of numeric values and return a string concatenating the corresponding characters of all numeric values;
console.log(String.fromCharCode(0x61.0x62.0x63.0x64)); // abcd
Copy the code
For characters in the U+0000 to U+FFFF range, the above methods return the same results as expected; This is because in this scope class, each character is represented in 16 bits, and these methods operate on 16 bits of code. As long as the size of the character encoding is the same as the size of the symbol, it can work normally. And 16 bits can only uniquely identify 65536 characters;
To identify more characters, Unicode uses a strategy of selecting a supplementary plane for each character using an additional 16 bits. That is, a policy that uses two 16-bit codes is called a proxy pair. For example, emoji codes:
let message = "Ab 😊 CD";
console.log(message.length); / / 6
Copy the code
But these methods still treat the 16 code as a single character
console.log(message.charAt(2)); / /
console.log(message.charAt(3)); / /
console.log(message.charCodeAt(2)); / / 55357
console.log(message.charCodeAt(3)); / / 56842
Copy the code
But in fact, the codes corresponding to indexes 2 and 3 should be treated as a proxy pair, corresponding to only one character,fromCharCode()
The method still returns the correct result because it is actually composed directly into a string based on the supplied binary representation. Browsers can correctly parse a proxy pair of two codes and correctly recognize it as a Unicode smiley character.
To properly parse strings that contain both singletons and proxy pairs, charCodeAt() can be replaced by codePointAt(), which recognizes the full codepoint from the specified codepoint position.
console.log(message.codePointAt(2)); / / 128522
console.log(message.codePointAt(3)); / / 56842
Copy the code
Note: If the index passed is not at the beginning of the proxy pair, the wrong code point is returned. For example, index 3 is not at the beginning of the proxy pair for smiley characters, so the wrong code point is returned.
CharCodeAt () has a corresponding codePointAt(), and fromCharCode() has a corresponding fromCodePoint(). This method can take any number of codepoints and return a string of the corresponding characters.
console.log(String.fromCharCode(97.98.55357.56842.100.101));/ / ab 😊 CD
console.log(String.fromCodePoint(97.98.128522.100.101)); / / ab 😊 CD
Copy the code