This is the 15th day of my participation in the August More Text Challenge

Document Basic structure

The ProseMirror Document is a tree structure.

A Porsemirrordocument Is aThe node type, it contains oneFragments objectThe Fragment object contains 0 or more child nodes.Diagrams are all layers of nesting

      

Prosemirror, like DOM, is a recursive tree structure. However, Prosemirror stores inline elements (such as, text, strong, em, and so on) a little differently than DOM.

In HTML, a paragraph and the tags contained within it behave like a tree, such as the following HTML structure:

<p>This is <strong>strong text with <em>emphasis</em></strong></p>
Copy the code

In Prosemirror, however, inline elements are represented as oneflatThe model of themNode tagBe used asmetadataThe information is attached to the corresponding node.

For example, the text, strong, and em tags are all mounted to the nearest parent block level element P, resulting in a flat structure

 

Instead of the path of a tree node, we could use the offset of a character to indicate its position in a paragraph, and make it easy to do things like splitting content or changing content style

This also means that there is only one data structure representation per document. Adjacent marks that are identical in a text node are merged together, and empty text nodes are not allowed. The order of marks is specified in the schema.

Because inline elements are mounted flat to block-level elements, a Prosemirror Document is a tree of Block Nodes, most of whose leaf nodes are of type TextBlock, which is a block-level section containing text. You can also have simple Leaf Nodes with nothing, such as a horizontal partition HR element, or a video element.

       

Document feature – split

For now, because I don’t know what to call him. It’s kind of like react state

Another difference between DOM trees and ProseMirror Document is the way they represent Nodes objects. In the DOM, Nodes are mutable objects with IDS, which means that a node can only appear under its parent node (if it appears elsewhere, it’s not there, because it has a unique ID, so it’s unique). When a node is updated, it is mutated. Indicates to modify on the original basis, before and after modification is always an object).

In Prosemirror, nodes are simply values mutable, representing a node just like a character X, which can appear in different data structures at the same time. It is not bound to the current data structure, and if you add y to it, you will get a new value: xy without making any changes to the original x.

So that’s how Prosemirror Document works. Its value does not change and can be used as a raw value to evaluate a new document. The Document nodes do not know what data structure it is in, because they can exist in multiple structures, and can even be repeated multiple times within a structure. They are values, not objects that have state

This means that every time you update the document, you’ll get a new document. The new document shares all the values of the old document’s child nodes that have not changed in this update, making it cheap to create a new document.

This mechanism has many advantages. It keeps the editor available when state is updated, because the new state represents the new Document. (If the update isn’t complete, state doesn’t appear, so document doesn’t, The editor is still state + Document, and the old and new states can be switched instantly. This state switching can be done with simple mathematical reasoning — which can be very difficult if your values are constantly changing behind you. This mechanism of Prosemirror makes collaborative editing possible and makes it very efficient to update the DOM by comparing the document previously drawn on the screen with the current Document algorithm.

Because Nodes are represented as normal JavaScript objects, freezing their attributes explicitly (to prevent mutate) is very performance limiting, so Prosemirror’s Document actually runs in a non-mutating mechanism, But you can still modify them manually. Prosemirror doesn’t support this, however, and if you mutate these data structures, the editor will crash because they are always used in multiple places (modifying one, affecting other places you don’t know). So be careful!! Also keep in mind that the same is true for arrays and objects stored on node objects, such as node Attributes objects, or child Nodes that exist on fragments..

Because Nodes and fragments are immutable data structures, you should never modify them directly. If you need to manipulate document, it should always be the same.

In most cases, you need to update documents using transformations instead of modifying nodes directly. It is also convenient to keep a record of the changes that are necessary for the document as part of the editor state.

If you must manually update a document, Prosemirror provides some useful helper functions on nodes and Fragments to create a new version of the document. You may often use the Node.replace method, which replaces the contents of a specified document range with a slice containing the new content. If you want to update a node lightly, you can use the copy method, which creates the same node, but can specify new content. Fragments also have some methods for updating the document. Such as replaceChild and Append.

       

The Node type

This class represents the nodes that make up the ProseMirror document tree. Therefore, Document is an instance of Node, and its children are instances of Node.

Nodes are persistent data structures. You can’t change them, but create a new Node with the content you want. The old one always points to the old document shape. By sharing the structure between the old and new data as much as possible, the cost is reduced, and such a tree structure (with no reverse Pointers) makes it easier to implement.

The entire document is a Node. The content of the document is the child nodes of a top-level node. In general, the children of these top-level nodes are a series of block Nodes, some of which may contain Textblocks, which contain inline content. However, a top-level node can also be just a TextBlock, in which case the entire document contains only inline Content.

Which nodes are allowed in which locations is determined by the Document’s schema. To create Nodes programmatically (instead of typing directly into the editor), you must iterate over the schema, such as using the Node and text methods below.

import {schema} from "prosemirror-schema-basic"

// The position of the null argument is used to specify attributes if necessary
let doc = schema.node("doc".null, [
  schema.node("paragraph".null, [schema.text("One.")]),
  schema.node("horizontal_rule"),
  schema.node("paragraph".null, [schema.text("Two!")]])Copy the code

Attribute is introduced

  • Type: NodeType NodeType. The Type attribute tells you the name of the node, the attributes it can use, and so on. Node types(and mark types) are created only once for each schema, and they know which schema they belong to.

  • attrs: Object

· Allowed and required attribute types. For example, an image node might use attrs to store Alt text and URL information.

  • Content: All child nodes of the Fragment node. The content of a node is stored in a field pointing to the Fragment instance, which is an array of Nodes. This is true even for nodes that have no content or are not allowed to have content, and those nodes that do not or are not allowed to have content are replaced by shared Empty fragments.

    Like Nodes, fragments are unmutable data structures and should not be mutated against them or their contents. Instead, new instances are created as needed.

  • marks: [Mark]

    For example, tokens like emphasis or link.

  • text: ? ⁠string For text nodes, this field holds text content

  • NodeSize: number specifies the nodeSize. For text nodes, this field represents the number of characters; For leaf nodes, size 1; For non-leaf nodes, size is content size plus 2 (start and end tags)

  • inlineContent :? boolean

A value of true indicates that the node accepts only inline elements as content. You can determine whether to add an Inline node next

  • isTextBlock :? boolean

A value of true indicates that the node is a Block Nodes with inline content.

Thus, a typical "Paragraph" node is a node of type TextBlock, and a blockquote(reference element) is a block element whose content may be composed of other block elements. The Text node, carriage return, and inline images are all inline Leaf Nodes, while the horizontal split line (HR element) node is a typical block Leaf nodes. Indicates that it can no longer contain child nodes; Leaf nodes, as mentioned above, can be inline or block.Copy the code
  • isLeaf :? booleaTrue indicates that the node is not allowed to contain any content.

Methods to introduce

  • Child: (index number) - > Node

Find the node by index. For example, view.state.doc.child(0) finds a 1

  • Descendants (f: fn(node: node, pos: number, parent: node) →? ⁠ bool)

    Iterate over all descendant nodes

  • copy(content: ? ⁠Fragment = null) → Node creates a new Node with the same mark as this Node, which contains the given content (empty if none is given).

  • slice(from: number, to: ? ⁠number = this.content.size) → Slice Given start and end position, a fragment is intercepted and a Slice object is returned

  • Replace (from: number, to: number, slice: slice) → Node replaces parts of a document between a given location (from,to) with a given slice. The slice must be “fit,” which means its ‘open’ face must be able to connect to the surrounding content, and its Content nodes must be valid children of the nodes they are in. If any of these are violated, an error of type ReplaceError is thrown.

  • - > ResolvedPos resolve (pos: number)

– Parses the contents of the given position in the document and returns an object containing position information

     

To find the node

Prosemirror Nodes supports two types of indexes — they can be treated as tree types because they use offsets to distinguish each node; It can also be thought of as a flat structure with a series of tokens (tokens can be understood as a unit of counting).

  1. The first kind, Index allows you to interact with individual nodes as you would with DOM, access child Nodes directly using Child Method and childCount, and write recursive functions to iterate over document(if you want to iterate over all nodes, Use descendants and nodesBetween).

  2. Second, index is more useful when locating a specified position in a document. It can represent any location in the document as an integer — the integer is the order of the tokens. These token objects don’t actually exist in memory — they’re just for counting purposes — but the document tree structure and the fact that each node knows its own size makes it cheap to access them by location.

    The starting position of the Document, at the beginning of all content, is 0.

Entering or leaving a node that is not a Leaf node (such as a node that can contain content, which is considered a non-leaf node) counts as a token. So if document begins with a paragraph(labeled P), position at the beginning of the paragraph is 1. Each character of Text Nodes is counted as 1 token. So if the paragraph at the beginning of the document contains the word “hi”, then position 2 comes after “h” and position 3 comes after “I”, Position 4 a token is used for leaf nodes that do not allow content (e.g., image nodes) after the entire paragraph. So, if you have a document, and you represent it as HTML, it looks like this:

<p>One</p>
<blockquote><p>Two<img src="..."></p></blockquote>
Copy the code

The Token order and position look like this:

Each node has a nodeSize property to indicate the size of the entire node. You can also get the size of the node’s content by using.contentSize. Note that for the outer nodes of the Document (that is, the nodes of the contenteditable property in the DOM, which are the root of the entire document), The start and close tokens are not considered part of the document (because you can’t place the cursor outside the document), so the size of the document is doc.contentSize, Instead of doc.nodesize (although the document switch tag is not considered part of the document, it still counts. The latter is always greater than the former.

If calculating these positions manually involves a fair amount of computational effort, you can get a more detailed description of the data structure of a position by calling Node.resolve. This data structure will tell you what the parent node of the current position is, what its offset in the parent node is, what ancestor Nodes of the parent node are, and other information.

Note the node index(as per childCount), the document position, and the node offset (sometimes used in a recursive function to represent the node position being processed). This is the difference between node offsets.

     

Copy and paste and drag — slice

For user copy-paste and drag-and-drop operations, a concept called slice of document is involved. For example, the content between two positions is a slice. Unlike a complete node or fragment, slice may be “open” (meaning that a slice may contain tags that are not closed, as in < p>123< /p>< p>456< /p>, A slice might be 23< /p>< p>45).

For example, if you select from the middle of one paragraph to the middle of another with the cursor, you select a slice that contains two paragraphs, the first open at the beginning and the second open at the end, Then if you select a paragraph node using the interface (instead of interacting with the view), you select a close node. If slice is treated like normal Node content, its content may not comply with schema constraints, because certain nodes(such as the slice content is a complete node tag, At the beginning of the example above

And the closing part

) fell outside of Slice.

Slice data structures are used to represent such data. It stores a fragment with open depth information on both sides. You can use the Slice method on Nodes to “slice” out of the Document.

// Suppose the document has two p tags. The first p tag contains a and the other p tag contains b, i.e.
// <p>a</p><p>b</p>
let slice1 = doc.slice(0.3) // The first paragraph
console.log(slice1.openStart, slice1.openEnd) / / - > 0 0
let slice2 = doc.slice(1.5) // From start of first paragraph
                            // to end of second
console.log(slice2.openStart, slice2.openEnd) / / - 1 to 1
Copy the code