If you find it too long, you can skip repo: github.com/huozhi/html…

inspiration

Busy task more than one thing, the design hopes that we can have a very complete FAQ sub-site, can provide users with very detailed help information.

Design: First there will be a search 🔍 to retrieve any page, each page is rich text help content FE: Well, it looks good, should be better Design: rich text needs to support video, GIF, inline images, block level images blablabla… We want to be consistent with all the interactions and styles on our main site FE: But this is a new project, and none of those things are fully componentized yet… FE: Where does the text come from? Design: We would like to have an editor that supports these uploads as well… FE: Is it in a hurry? Design: Yeah, as soon as possible!

WHAT THE HELLLLL…

Sounds is a what you have do not over of work, the front is a have a support complex interactions within the team video, Gif component, and an editor, but rely on too many things, and each dot operation may have data, the development of the people are not modular degree to the degree of “random barrier-free transplant”, Because displaying videos, gifs, and so on in RichText requires using the old RichText component, it’s too much.

We just want a static page

Pages with these interactions are fine, and components can be introduced separately, but without the need for more complex editor rich text…

Editor & rich text

When you simply type some text into zhihu’s answer box, bold it, italicize it, and insert a picture, you’ve done a rich text edit. Because these things are no longer just plain text, they need more sophisticated HTML and CSS to work with, perhaps inline style, or js interactions.

There are also types of editors:

  1. Ejbateful Editor: ejbateful editor: ejbateful Editor, ejbateful Editor, ejbateful Editor, ejbateful Editor, ejbateful Editor, ejbateful Editor, ejbateful Editor
  2. Non-stateful editor: no state was needed, such as a simple dependency on contenteditable and a layer of encapsulation, such as media.js, on top of it

There are two general ideas for saving edits:

  1. Ejbateful Editor: ejbateful Editor: ejbateful Editor: ejbateful Editor: ejbateful Editor: ejbateful Editor: ejbateful Editor: ejbateful Editor
  2. With any editor, save the HTML while editing and display it directly when displaying it

Saving state can sometimes be a problem when migrating an editor, such as Google Closure Editor to DraftJS. If there is no state in draftJS, then there is a state in draftJS. What if DraftJS is replaced with another one someday? A bit of a risk

What if I save my HTML and need to interact with it when I present it? Draftjs and SLATE are stateful, and a serializer + Deserializer (serializer + deserializer) is used to convert HTML and state. While draft-Convert may require you to use a library like draft-convert, SLATE has a built-in HTML Serializer + Deserializer, which is very convenient.

What does all this have to do with our scene?

The first time I tried the SLATE editor, I was very comfortable with its HTML transformations

const rules = [ { deserialize(el, next) { if (el.tagName.toLowerCase() == 'p') { return { kind: 'block', type: 'paragraph', nodes: next(el.childNodes) } } }, // Add a serializing function property to our rule... serialize(object, children) { if (object.kind == 'block' && object.type == 'paragraph') { return <p>{children}</p> } } } ] import { Html }  from 'slate' // Create a new serializer instance with our `rules` from above. const html = new Html({ rules }) state = { state: html.deserialize(html), } const string = html.serialize(someState)Copy the code

Isn’t that interesting? We define a serialization/deserialization rule, and then we’re free to convert between state and HTML. COOL!!!!!

See here, see that? What we really want is something like this, something that strips out the editor, serializes and deserializes HTML, to help us do more complex state presentations.

LETS DOT IT

Remember how compilers work? To convert a code string into machine code:

  • Tokenizer: Resolves special tokens
  • Parse: Converts to the AST
  • Transform: Converts the AST into Dest code

Similarly, our HTML parsing and intermediate state transitions are similar to this process. Dest Code is actually the final form we want. It can be a component, it can be another HTML string, it can be a JSON, whatever you want

So what we’re going to do is we’re going to do three things:

  1. Parse HTML to appropriate HTML tags
  2. Converted to a tree, each node is an HTML tag that contains its own information
  3. Walk through the tree and replace each node as you want

Introduce you html2any

Take a look at my final implementation github.com/huozhi/html…

Run on React Native

See how React Native works:

A piece of HTML containing bold text and images was converted to Native form by us. This is a screenshot on iOS

Of course, the React Native component has many limitations when nested. For example, the size of the View in Text needs to be specified. The Text under Text has no style inheritance, unlike CSS.

Run on Web with React

Click Here! Click here to see

You can check it out. We made a simple substitution rule:

  1. Br is replaced with an HR tag
  2. The GIF image is replaced with a loaded GIF player
  3. Native video was replaced with a React video player

If you wanted more rule substitution, you could have written more complex rules (rule functions) and left the rest to HTML2any

Reference and contrast

In fact, HTML Parser itself has many forms on the market, the most familiar ones are Parse5, HTMLParser2, etc. Even Cheerio uses HTMLParser2. Why do you have to rebuild the wheel and write another one?

The reasons are as follows:

  1. Html2any is really small and handy if you’re dealing with HTML generated by a stateful editor. If you’re using SLATE, if you’re using Draft, if you’re presenting content, try it.
  2. Many Parsers are SAX and parsed down, giving you a lot of APIS for intermediate processes and phases that we don’t really need, and they’re compatible with a lot of cases that we probably don’t need

  3. The most important reason — a lot of Parsers are built specifically for the Web, and eventually want to do HTML, or DOM trees, which is not what we want. See the chestnuts up there? We’re doing Universal HTML! Render Everywhere! Ha ha

Finally, attach my Slide