There is a feature that highlights keywords in a web page.

What we thought would be a simple operation with innerHTML Replace ran into a lot of problems. This article records these problems and the final perfect solution, hoping to have the same experience of small partners to help. If you’re only interested in the result, skip the process and see the result

Common practice: regular replacement

To highlight an element, extract the keyword and wrap it in a tag, then style the tag. Use innerHTML or outHTML instead of innerText or outText.

const regex = new RegExp(keyword,"g")
element.innerHTML = element.innerHTML.replace(regex,"<b class="a">"+keyword+"</b>")
element.classList.add("highlight")
Copy the code

The pitfalls of this approach are as follows:

  • Keyword if it is(),Such a re object keyword will fail to build the re object. (Can be solved by escaping)
  • Keyword If it is an HTML tag such asdivThe innerHTML will be incorrectly replaced
  • Keyword If it has the same name and value as some DOM attributes, exception substitution will also occur. If the keyword is test, the class name will be replaced incorrectly:
  <div id="parent">
    <div class="test">test</div>
  </div>
Copy the code
  • Element, the parent node of the keyword, performs background dyeing through the class, polluting the original DOM to a certain extent, which may affect element’s repositioning. (As a plug-in, you want to change the original DOM as little as possible)

Regular optimization one: Only elements within a tag are processed

var formatKeyword = text.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\ \ $&') // Escape handles special characters contained in the keyword, such as /.var finder = new RegExp(">. *?"++". *? <") / / extraction in the tags inside the text, to avoid wrong operation element class, id, etc. The innerHTML = element. InnerHTML. Replace (finder,function(matched){
        return matched.replace(text,"<br>"+text+</br>)})// Replace the extracted text within the tag with a keywordCopy the code

To solve most of the problems, but there is still a problem is that as long as there is a symbol like < in the tag attribute, it will break the matching rules and lead to the error of extracting the content of the regular. The HTML5 dataset can customize any content, so these special characters are unavoidable.

  <div dataset="p>d"Replacement > < / div >Copy the code

Regular optimization 2: clear labels that may be affected

  <div id="keyword">keyword</div> = replaces the closing tag with the variable [replaced1]keyword[replaced2]// Closing tag id="keyword"Will not be processed = [replaced1]<b>keyword</b>[replaced2] = replaces the temporary variable with the original tag <div id="keyword"><b>keyword</b></div>
Copy the code

This idea and source code comes from here, but the problem is:

  • If [replaced1] contains the keyword, an exception occurs during replacement
  • Most importantly, this method does not extract the tag correctly when the tag value contains the <> symbol

All in all, after more than N attempts, the regex failed to deal effectively with various cases. And then, instead of going through strings, you go through nodes. Element. childNodes is the most effective way to remove unwanted tags.

The perfect solution is handled through DOM nodes

 <div id="parent">
    keyword 1
  <span id="child">
    keyword 2
  </span>
 </div>
Copy the code

Get all childNodes from parent. ChildNodes. The child node can be replaced with innerText. Replce (keyword,result) to get the desired highlighting as follows: keyword 2 Replace when child has no children.

However, keyword 1 is a text node and can only modify the text content, not add HTML, let alone control its style. Text nodes cannot be converted to normal nodes, which is the most annoying thing.

Finally, the focus of this article comes, because this feature gives me the first serious exposure to text nodes. This is where Text is found and highlighted by slicing and replacing Text nodes.

Source code and restore highlighted source code

const reg = new RegExp(keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\ \ $&'))
highlight = function (node,reg){
    if(node.nodeType == 3) {const match = node.data.match(new RegExp(reg));if (match) {
          const highlightEl = document.createElement("b");
          highlightEl.dataset.highlight="y"const wordNode = node.splitText(match.index) wordNode.splitText(match[0].length); Const wordNew = document.createTextNode(wordNode.data); const wordNew = document.createTextNode(wordNode.data); highlightEl.appendChild(wordNew); / / build successful wordNode highlight nodes. ParentNode. ReplaceChild (highlightEl wordNode); // Replace the text node}}else if(node.nodeType == 1 && node.dataset.highlight! ="y"
    ) {
        for (var i = 0; i < node.childNodes.length; i++) {
            highlight(node.childNodes[i], reg);
            i++
        }
    }  
}
Copy the code

Finally, leave an egg, the above method is also a small bug, interested in can be found.