This article is written based on my personal understanding of the design idea of ProseMirror. Currently, there are relatively few documents of ProseMirror in China, hoping to give you some inspiration. Here is the documentation for Prosemirror.

prosemirror.net/docs/ prosemirror.net

background

There are three solutions for rich text editors on the market

  • Textarea

  • contentEditable

  • Google Doc

Textarea is typically used to implement simple rich text functions (@ and #, basically a comment implementation) and, if used as a creator tool, to combine other form functions. Examples include Instagram, Shopee’s KOL authoring tool. ContentEditable is actually the most popular implementation of a rich text editor, with some of the basic functionality implemented by the browser. Countless young front-end developers happily used it, only to find it was a bottomless sinkhole. At present, there are more popular schemes in the market: Quill, wangEditor, UEditor, SLATE, draft-JS. Google Doc in 2010 changed the implementation of rich text solution (for some reason you can refer to: drive.googleblog.com/2010/05/wha…). From contentEditable to listening for user interactions while drawing on the DOM with tags such as divs.

In terms of implementation difficulty, Textarea’s solution is relatively simple. In fact, it is not even a rich text editor, but the new solution of Google Doc is more than the average team can implement. The entire interface UI and even the cursor flashing are redrawn with div tags. User interaction is extremely complex and boundary cases are extremely numerous, which requires a lot of effort to implement. So our focus will be on adapting the native implementation of the browser, Which is contentEditable, to make it more usable.

trouble

First, let’s think about what an editor contains. The basics: an editable DOM, and an API for external DOM modification

Luckily, the browser provides both of these elements, respectively, ContentEditable and Document. ExecCommand

Developer.mozilla.org/en-US/docs/… In HTML, any element can be editable. By using some JavaScript event handlers, you can transform your web page into a full and fast rich text editor. This article provides some information about this functionality.

ContentEditable is actually an implementation of a rich text editor provided by browser vendors. A DOM can be made editable by setting the contenteditable property of the DOM to true. What is Document. ExecCommand

Developer.mozilla.org/en-US/docs/… When an HTML document has been switched to designMode , its document object exposes an execCommand method to run commands that manipulate the current editable region, such as form inputs or contentEditable elements. Most commands affect the document’s selection (bold, italics, etc.), while others insert new elements (adding a link), or affect an entire line (indenting). When using contentEditable, execCommand() affects the currently active editable element.

When a document is converted to designMode (contentEditable and Input boxes), the command for Document. execCommand acts on the cursor selection (bold, italic, etc.) and may insert a new element. It might also affect the current line, depending on what command is. With these two things, a contentEditable DOM with Document. execCommand can modify the DOM’s tags, adjust the DOM’s background colors, and more. For a Demo

<div id="contentEditable" contenteditable style="height: 1000px;" ></div> <script> function handleClickTool (tool) { const $editor = document.getElementById('contentEditable') const {name, command = 'formatblock'} = tool $editor.focus() document.execCommand(command, false, name) } window.onload = function () { const $editor = document.getElementById('contentEditable') const $toolbar = document.createElement('div') const tools = [ {name: 'h1', text: 'h1'}, {name: 'h2', text: 'h2'}, {name: 'h3', text: 'h3'}, {name: 'h4', text: 'h4'}, {name: 'h5', text: 'h5'}, {name: 'p', text: 'p'}, ] tools.map(tool => { const $btn = document.createElement('div') $btn.classList.add('toolItem') $btn.addEventListener('click', () => handleClickTool(tool)) $btn.innerText = tool.text $toolbar.appendChild($btn) }) $toolbar.classList.add('toolbar') document.body.insertBefore($toolbar, $editor) } </script>Copy the code

It doesn’t seem too hard to implement a rich text editor just by calling the browser API (I can do it too!!). . It sounds wonderful, but the reality is often cruel. If this makes a good rich text editor, then it doesn’t deserve to be called one of the great sinkholes in the front end.

Look at what other people ridicule contentEditable: www.oschina.net/translate/w… I think this article uses too many academic words to make it difficult to understand, but some ideas are very practical

To the pit

Vendor implementation differences Browser vendors have different implementations of the same standard (which is a relatively imperfect one, contentEditable). For a simple 🌰, what is the expected performance when we hit a return in an empty contentEditable DOM? A newline. So what’s the tag that carries this new row?

  • Chrome/Safari is a div tag

  • Firefox prior to version 60 added one to the current line-level TAB

  • Firefox, after version 60, is similar to Chrome/Safari and is a div tag

  • IE/Opera is the P tag

Want to semantically express a document structure? Want to unify styles through label pickers?

Of course this problem can be solved by adding to the empty contentEditable DOM



You can solve it. When users type \n (enter) in a block-level label, the next line of labels will be created based on the current block-level label. In fact, most editors solve the problem of new CCB label through this solution.

Unpredictable performance

<div contenteditable>
  test rich text editor
</div>
Copy the code

What do you think happens when you type a few carriage returns in the middle of this text?

<div contenteditable>
  test
  <div>
    <br/>
  </div>
  <div>
    <br/>
  </div>
  <div>
     rich text editor
  </div>
</div>
Copy the code

Ok, Ok, but the first text does not have a label, then you try to delete this several enter

<div contenteditable>
  test
  <span style={xxxxx}>
    rich text editor
  </span>
</div>
Copy the code

Surprise, Surprise, Surprise, Surprise, Surprise, Surprise, Surprise, Surprise, Surprise

Inline tag nesting is well known, and the following tags end up being the same

<strong><em>aaaa</em></strong>
<em><strong>aaaa</strong></em>
<b><i>aaaa</i></b>
<i><b>aaaa</b></i>
<strong><em>aa</em><em>aa</em></strong>
...
Copy the code

This raises the question of whether the new text should be em, strong, or em + strong as the user continues typing

All of this is just the tip of the iceberg in the contentEditable pit, where it is possible for users to write structures in the contentEditable DOM without clear rules. And that brings us to the problem

  • Visually equivalent, but not structurally equivalent in the DOM.

  • DOM generated by contentEditable is not always what we expect.

As for document.execCommand, MDN explicitly states that this is a Obsolete feature that browser vendors can no longer support (even though it currently supports poorly).

broken

Problems arise, and we try to solve the problem of how to avoid these potholes. Those of you who have developed on a modern front-end framework will know the following formula

f(state) = View

Manipulating a simple JS object is always easier than manipulating the DOM, which shields browsers from differences and avoids DOM’s complex nature.

State

First, we introduce a state between the View and command. At the data storage level, we do not need to maintain complex DOM structure, can use a JS Object structure to maintain the current structure

const state = [{
  type: 'p',
  style: '',
  children: []
}]
Copy the code

OK, you can see that this is a tree structure, so for the following structure

<p>
  text <span>span text</span>
</p>
Copy the code

How do we do that in our Editor State? We need to change the state up below

const state = [{
  type: 'p',
  style: '',
  children: [
    { type: 'textNode', style: '', content: 'text '},
    { type: 'span', style: '', children: [
      {type: 'textNode', style: '', content: 'span text'}
    ]}
  ]
}]
Copy the code

Now, can we take this state and map it to a full DOM structure? But if you look at this, you might think there’s no point in adding another layer, so let’s move on.

Let’s make things a little more complicated by talking about inline tag nesting

<p>
  text <strong>strong<em>italic text</em></strong>
</p>
Copy the code

Convert to state

const state = [{
  type: 'p',
  style: '',
  children: [
    { type: 'textNode', style: '', content: 'text '},
    { type: 'strong', style: '', children: [
      {type: 'textNode', style: '', content: 'strong'}
      {type: 'em', style: '', children: [
              {text: 'textNode', style: '', content 'italic text'}
      ]}
    ]}
  ]
}]
Copy the code

React’s v-DOM is the same as React’s V-DOM. What if we changed the DOM structure a little bit?

<p>
  text <strong>strong</strong><strong><em>italic text</em></strong>
</p>
Copy the code

Again, the

<p>
  text <strong>strong</strong><em><strong>italic text</strong></em>
</p>
Copy the code

Switch it to state, and you’ll see that the UI is the same, but the structure of how we describe this document is changing all the time. What’s the problem with that?

Our path to italic text is changing all the time, From the state [0]. Children [1]. The children [1]. The children [1] to the state [0]. Children [2]. The children [0]. Children [0] This tree-like structure makes it very inconvenient to manipulate the DOM across hierarchies (updating the state is more difficult and determining boundaries is difficult).

So let’s think about it, how else can this text be interpreted? For a moment, inline tags don’t actually prevent us from interpreting the full DOM structure, so we can actually treat them as style. For example, strong can be equivalent to font-weight: bold, and em can be equivalent to font-style: italic.

Of course, this is just a simple example of 🌰, we still want to keep the document semantic (Chrome sometimes uses span + style to implement bold… Semantically), let’s add a property called marks to indicate the inline tags used for these decorations.

const state = [{
  type: 'p',
  style: '',
  marks: [],
  children: [
    { type: 'textNode', style: '', content: 'text '},
    { type: 'textNode', style: '', marks: ['strong'], content: 'strong'},
    { type: 'textNode', style: '', marks: ['strong','em'], content: 'italic text'}
  ]
}]
Copy the code

In fact, this is more consistent with how we humans perceive the document, and we interpret the DOM structure as the same state no matter how it is nested.

At the same time, the path of italic text can be state[0]. Children [3] or even state[0] + offset to represent this textNode. While state[0] is still an eyesore, it would be much simpler if I added a parent attribute to each Node.


digest

“> < span style =” max-width: 100%; clear: both;

const state = [{
  type: 'p',
  attrs: {},
  children: [
    { type: 'textNode', attrs: {}, content: 'text '},
    { type: 'textNode', attrs: {}, marks: ['strong'], content: 'strong'},
    { type: 'textNode', attrs: {}, marks: ['strong','em'], content: 'italic text'}
  ]
}]
Copy the code

The strong tag and the em tag can also have a class. What if the product needs to be bold and red

const NodeAndMarkGen = (nodeType, attrs) => {
  return {
    type: nodeType,
    attrs: attrs
  }
}
const paragraph = NodeAndMarkGen('p', {})
const textNode = NodeAndMarkGen('textNode', {})
const strong = NodeAndMarkGen('strong', {})
const em = NodeAndMarkGen('em', {})

const state = [{
  ...p
  children: [
    {...textNode, content: 'text '},
    {...textNode, marks: [strong], content: 'strong'},
    {...textNode, marks: [strong, em], content: 'italic text'},
  ]
}]
Copy the code

Above we described a relatively simple DOM structure in which State is represented using a JS object. In real life, there are more types of DOM, including UL, OL, Li, IMG, blockquote, HR, etc. It is impossible to make DOM a completely flat structure. So we can define the following rule

  • The state of an editor is still a tree, but it simplifies the DOM tree by changing some of the modifier nodes (strong, em, a) to node attributes.

  • Most leaf nodes of this tree are text nodes (above textNode), and there are leaf nodes such as Image, Video and HR.

  • Text nodes are not allowed to contain child nodes

  • .

These rules were set in this article, but in actual development process, can set their own rules, the objective is to make the State more clearly, to include the current State of complete, can make modification on the structure in the form of a relatively simple to complete, this is the State of a design more reasonable

At this point, we’ve solved the problem of how to store editor state, and the structure looks reasonably clear. However, State is a simple representation of DOM tree data structures, and it hasn’t solved any real problems, such as carriage returns and messy tag nesting.

Schema

First, the problem of tag nesting chaos; The behavior of users is unpredictable, and any structure within the Editable Dom is possible. That’s obviously not what we want. Orderly, regular, and resolvable structures are the ones we like to develop.

Since we can’t predict user behavior, we can set rules that constrain user input by specifying what DOM tags can appear under what DOM tags, and what DOM tags can have what marks, known here as schemas. If the user’s input produces a tag structure that doesn’t fit our schema. We ignore it (or translates to a tag we agree with)

To continue from 🌰, the current type is just a simple string, which does not represent much information. Let’s expand it (mark is just for decoration, it does not carry content and does not allow subsets, so we will make a distinction here).

Interface NodeType {// Note that textNode is also a Node tag: string, content: string, marks: string, inline: } interface MarkType {tag: string}Copy the code

Note that we declare NodeType (node.type) instead of node, marks and content have different meanings.

We can see that NodeAndMarkGen has one more content and one more marks than NodeAndMarkGen. In this content, we can declare with some method (such as re), what nodes can be rendered under this node and what marks can be used

const paragraph = {
  tag: 'p',
  content: 'header1|textNode',
  marks: "em|strong"
}
const header1 = {
  tag: 'h1',
  content: 'textNode',
  marks: "em"
}
const textNode = {
  inline: true,
  marks: '-'
}

const em = {
  tag: 'em'
}
const strong = {
  tag: 'strong'
}

const schema = new Schema({
  nodes: {
    paragraph, header1, textNode
  },
  marks: {
    em, strong
  }
})
Copy the code

The rules of this editor are declared above. (Prosemirror contains a ready-made implementation)

  • There is a Node type called textNode, which is inline

  • There is a header1 type that allows textNode inside (children) and allows em decoration

  • There is a Paragraph type, which allows either header1 or textNode to exist inside, and allows em and strong decorations.

By making this declaration and applying it in some form to the generation of EditorState to remove non-compliant labels, our editor should only appear content that conforms to the rules we just defined in the Schema.

At the same time, we should also be able to use this schema to resolve the corresponding state from the existing DOM structure.

View

OK, now that the structure has a regular Schema constraint, and the EditorState has EditorState representation, so

f(state) = View

The f in this equation is a simple mapping, and from that we get View.

Let’s see what we have now. We have the DOM State EditorState (constrained by Schema) that represents nodes at different times during the edit process, and we have methods that map from State to View. So what’s bad in this chain is going from StateA to StateB.

Transform

We all know that an immutable object is good for traceability, for state management, for making a history call stack, and so on. We can stipulate that EditorState is immutable. How do we represent updates?

Here we introduce a concept called Transform (TR), which describes a change, either generated in code or automatically generated by a user interacting with contentEditable.

When we apply a TR to a state, it should generate a new state, which generates a new View. So what information should this TR contain?

  1. Current selection information

  2. The current document object

  3. Describes the steps in a sequence of actions

  4. Marks that is currently in use

The first two are easy to understand

  • Selection information is something that browsers actually do very well, so let’s use the browser selection and range. But mapping it to represent the location information in our editorState is a bit more complicated, so I won’t expand it out for now.

  • The current document object is essentially the EditorState, which is used as a reference.

Start with the steps that describe a sequence of actions

This is actually a bit like the concept of Batch, where each change is not applied directly to the UI, but is applied after a complete event. We call a Step a Step

Step is actually the implementation of document.execCommand on EditorState that we mentioned earlier, which is equivalent to the interface of our editor, and generally has a large set of functions to implement, such as

  • Replaces the label of the current row-level element

  • Replace selection elements

  • Remove elements

  • The new element

  • Add mark to the current selection element

  • Remove mark from the current selection element

  • , etc.

This part of the story is really about EditorState, and that’s what developers need to focus on. All of the bold, line-level tag substituting, keyboard shortcuts, and so on are made up of these basic steps. We will not expand on the specific implementation, just to say the idea.

What is marks currently in use

Just take the scene

  1. When we type after the text wrapped in a tag, what do we expect to type?

  2. What do we expect to type when we click the toolbar bold button?

1: select Node marks from the cursor selection.

But in the second case, there is no existing Node for storing this information, and storedMarks actually provides a place to store this data, AddStoredMarks and removeStoredMarks should also be included in the Step implementation function set described above.

Write in the last

This article describes only the simplest editor implementation ideas, after we implement a complete enough feature set, the entire editor should be able to be assembled from these basic things like a stack of wood into a complex editor.

For example, how can a complex, indivisible DOM structure like a card be represented in State in a rich text editor? What if the user selects a selection across Node for formatting or even copy and paste? Support for these extensions is actually more complex and requires further extensions to the above structure before the convention can be implemented.

In addition, the complexity of the editor also lies in the unpredictable user behavior and too many boundary cases. It is relatively difficult to think of a complete logic that includes all possible logic, which requires us to gradually improve the function of our editor