A few days ago, PM put forward a demand, hoping to limit the length of rich text input by users, such as the maximum limit of 400, including Chinese and English, that is, 400 on the visual level. The rich text editor used in this project is Tinymce. Perhaps other editors have different implementations from Tinymce. I will only elaborate my approach here and provide an idea.
Ignore the Tag
First, tags are definitely not needed, including styles and possible attributes. With tags, it’s easy to think of checking by the <> tag, not counting when < is encountered, and counting again when > is encountered, so you can write the following code
const getPlainTextLen = richText= > {
let count = 0
for (let i = 0; i < richText.length; i++) {
if (richText[i] === '<') {
while(richText[i] ! = ='>') {
i++
}
} else {
count++
}
}
return count
}
Copy the code
In the above code, we managed to ignore the effects of tags, but the implementation is rather ugly. We can use the re to implement the <… > is replaced by an empty string, and the length is obtained. In order not to affect the original string, the function must be pure
const getPlainTextLen = richText= > richText.replace(/<[^>]+>/g.' ').length
Copy the code
Handles blank strings and newlines
Obviously, white space strings are also undesirable. In HTML, white space strings can be represented as Spaces or Or & ensp; \n and \r\n, which are represented as newlines in JavaScript, also appear as Spaces. String is still a string after replace, so extend the above function
const getPlainTextLen = richText= > richText
.replace(/<[^>]+>/g.' ')
.replace(/ |\n|\r\n| | /g.' ')
.length
Copy the code
Some editors store blank strings as simple Spaces and nothing else. For performance reasons (to reduce judgment), you don’t have to exactly copy the second replace, as opposed to something like Also represents a blank string. The code above lists only two, so it is best to write accordingly.
Processing HTML Entities (HTML character Entities)
If you type 1 + 1< 3, how does the editor store it, and why does
show up correctly if I type
1234?
1 + 1< 3 will be stored as 1 + 1 < 3, < is stored as HTML entities, possibly to avoid HTML strings that the user might enter. For HTML entities, see W3Schools, HTML Entities (w3schools.com), and if you can’t turn it on, use a little magic. This is also the case, but because it appears as a blank string, it needs to be processed in advance. HTML Entities also includes emojis such as 😀, which many people love. Eg. As mentioned above. , the original 5 lengths of 1 + 1< 3 become 8 lengths, 😀 corresponds to 😀 , but the visual level should be regarded as a length, so it needs to be processed.
HTML Entities has two representations: &entity_name; OR &#entity_number; Based on this, the corresponding re is written.
{2,5} and {1,6} indicate the length to be matched
const getPlainTextLen = richText= > richText
.replace(/<[^>]+>/g.' ')
.replace(/ |\n|\r\n| | /g.' ')
.replace(/ & ([a-z] {2, 5} | # [0-9] {1, 6}); /g.' ')
.length
Copy the code
Finally, there are special glyph, which are phonetic notes we learned in primary school, such as a -> a, which corresponds to à Let’s write the last re, which needs to be processed before the last re in the code above, otherwise “A” will become “A” and will be counted as two strings.
const getPlainTextLen = richText= > richText
.replace(/<[^>]+>/g.' ')
.replace(/ |\n|\r\n| | /g.' ')
.replace(/[a-zA-Z]([6-7][0-9]); /g.' ')
.replace(/ & ([a-z] {2, 5} | # [0-9] {1, 6}); /g.' ')
.length
Copy the code
It is not necessary to add glyphs, as not every editor will display them correctly, such as tinymce, which I use. Again, it is up to you to decide if you need to add glyphs.
validation
Finally, let’s verify that this function is correct
5 + 8 + 4 + 10 + 11 + 5 + 3 + 4 + 1 = 51
const html =
`Title
subtitle
H́ è H́ è
- First block
- Second block
- 1 + 1 < 3
- 5 > 3
- 1 00 & # 8364;
- 😀
`
// The visual level is also 51, which is correct
console.log(getPlainTextLen(html)) / / 51
Copy the code
If this article has helped you, please give me a like and leave a comment below