In recent years, there have been a lot of tech blogs and tech communities, and a lot of tech folks are starting to create their own blogs. We can sync our blogs to different tech platforms, but as the number of tech platforms increases, it takes more and more time to sync our posts. Is there a tool to quickly publish to different platforms? Or is there a tool that can translate HTML directly into a “language” that the technology platform can recognize?

We all know that the most popular blogging “language” for programmers is makeDown, and most of the tech community now supports the MakeDown syntax, so with MakeDown we can quickly synchronize to different technology platforms.

Some might say, why don’t we just write our blog in makedown syntax? This is fine, but the downside is that we have to keep a local MakeDown file, and if the blog content involves images, we also need to maintain an IMG directory, which can be a hassle to post to a different technical community every time. So we developed a tool that automatically crawls HTML content and converts it into a makedown with one click so that we can blog “uncontrollably”.

You will reap

  • turndownUsing skills of
  • vue + nuxtProject development mode
  • nodejsCrawler related applications

Github address will be attached at the end of the article, interested friends can build together, learn and explore.

Results demonstrate

The client

Train of thought

Let’s get this straight:

  • Enter a link address
  • Gets the value returned by the serverhtml
  • willhtmlString into amd
  • Synchronize the display preview to the editor

Why chooseturndown

The most important step for the client is HTML to MD, where we use turndown. The reasons for using turndown are as follows:

  • Talk is cheap, Show me the code. One of the key functions of writing technical articles isThe code blockAn article without code has no soul. I’ve compared a fewhtml2mdPlugin.turndownCode block display effect and compatibility of the best.
  • turndownAlso supports custom rules, flexible, you can customize a variety of syntax labels and matching rules.
  • turndownThird-party plug-ins are also supportedturndown-plugin-gfmSupport integrationGFM(MDA superset ofGitHub Flavored Markdown),table,strikethroughSuch as grammar.

The specific implementation

  // Introduce third-party plug-ins
  import { gfm, tables, strikethrough } from 'turndown-plugin-gfm'

  const turndownService = new TurndownService({ codeBlockStyle: 'fenced' })
  // Use the gfm plugin
  turndownService.use(gfm)

  // Use the table and strikethrough plugins only
  turndownService.use([tables, strikethrough])

  /** * custom configuration (rule name cannot be the same) * here we specify the 'pre' label as the code block, and add a newline before and after the code block to prevent abnormal display */
  turndownService.addRule('pre2Code', {
    filter: ['pre'],
    replacement (content) {
      return '```\n' + content + '\n```'}})Copy the code

Additional functionality

Support automatic access to linked article titles, no need to manually copy the original.

The service side

Here we use the server is Node.js, with the front-end framework to write the server, experience the bar.

Train of thought

Let’s get this straight:

  • Gets the address of the link passed by the front end
  • Obtain by requesthtml
  • Obtain different domain names according to different platformsdom
  • Convert the relative paths of images and links to absolute paths
  • htmlAdd the reprint source statement at the bottom
  • Get the title of the articletitle
  • returntitlehtmlTo the front

The specific implementation

  1. Gets the address of the link passed by the front end

    Here we use node’s native syntax directly, we use get form pass, use query can be

    const qUrl = req.query.url
    Copy the code
  2. Obtain by requesthtml

    Here we’re using request

     request({
       url: qUrl
     }, (error, response, body) = > {
       if (error) {
         res.status(404).send('Url Error')
         return
       }
       // The body here is the 'HTML' of the article
       console.log(body)
     })
    Copy the code
  3. Obtain different domain names according to different platformsdom

    Due to the large number of technology platforms, each platform will have different content tags, style names or IDS, which need to be compatible.

    First, jS-DOM is used to simulate dom manipulation, encapsulating a method

     /** * get the exact content of the article *@param {string} HTML HTML string *@param {string} Selector CSS selector *@return {string} htmlContent* /
     const getDom = (html, selector) = > {
       const dom = new JSDOM(html)
       const htmlContent = dom.window.document.querySelector(selector)
       return htmlContent
     }
    Copy the code

    Compatible with different platforms, using different CSS selectors

     // For nuggets, the style of the content block is.markdown-body. The content will contain the style tag and some extra copied code text, which will be deleted by native DOM manipulation
     if (qUrl.includes('juejin.cn')) {
       const htmlContent = getBySelector('.markdown-body')
       const extraDom = htmlContent.querySelector('style')
       const extraDomArr = htmlContent.querySelectorAll('.copy-code-btn')
       extraDom && extraDom.remove()
       extraDomArr.length > 0 && extraDomArr.forEach((v) = > { v.remove() })
       return htmlContent
     }
    
     // For osChina, the format of the content block is.article-detail, and there is extra.ad-wrap content in the content
     if (qUrl.includes('oschina.net')) {
       const htmlContent = getBySelector('.article-detail')
       const extraDom = htmlContent.querySelector('.ad-wrap')
       extraDom && extraDom.remove()
       return htmlContent
     }
    
     // Finally matches the generic label. The article tag is preferred, not the body tag
     const htmlArticle = getBySelector('article')
     if (htmlArticle) { return htmlArticle }
    
     const htmlBody = getBySelector('body')
     if (htmlBody) { return htmlBody }
    Copy the code
  4. Convert the relative paths of images and links to absolute paths to facilitate future source path searches

     // Use native api-url to get the source domain name of the link
     const qOrigin = new URL(qUrl).origin || ' '
    
     // Get the absolute path of the image and link. Convert 'path + source domain name' to absolute path through URL, students who are not familiar with it please understand by yourself
     const getAbsoluteUrl = p= > new URL(p, qOrigin).href
    
     // Convert the relative path of images and links. Different platforms have different image lazy loading attribute names, which need to be specially compatible
     const changeRelativeUrl = (dom) = > {
       if(! dom) {return '
            
    content error ~
    '
    } const copyDom = dom // Get all the images const imgs = copyDom.querySelectorAll('img') // Get all links const links = copyDom.querySelectorAll('a') // Replace all paths and return a new DOM imgs.length > 0 && imgs.forEach((v) = > { /** * handle lazy load path * simple book: data-original-src * digg: data-src * segmentfault: data-src */ const src = v.src || v.getAttribute('data-src') || v.getAttribute('data-original-src') | |' ' v.src = getAbsoluteUrl(src) }) links.length > 0 && links.forEach((v) = > { const href = v.href || qUrl v.href = getAbsoluteUrl(href) }) return copyDom } // Apply the changeRelativeUrl method in the getBody method to get the article content from different platforms const getBody = (content) = >{... .return changeRelativeUrl(htmlContent) } Copy the code
  5. Add reprint source statement at the bottom to prevent infringement

    I don’t have to explain this much, it’s very simple.

     // Add the source statement at the bottom
     const addOriginText = (dom) = > {
       const html = dom.innerHTML
       const resHtml = html + '<br/><div>${qUrl}" target="_blank">${qUrl}</a>, if there is infringement, please contact to delete. </div>`
       return resHtml
     }
    
     // Apply the addOriginText method to the getBody method to get the article content for different platforms
     const getBody = (content) = >{... .return addOriginText(changeRelativeUrl(htmlContent))
     }
    Copy the code
  6. Get the title of the articletitle

     // Get the title of the article
     const getTitle = (content) = > {
       const title = getDom(content, 'title')
       if (title) { return title.textContent }
       return 'Failed to get title ~'
     }
    Copy the code
  7. returntitlehtmlTo the front

     request({
       url: qUrl,
       headers: {}},(error, response, body) = > {
       if (error) {
         res.status(404).send('Url Error')
         return
       }
       // Sets the JSON response type
       res.type('text/json')
       const json = {
         code: 1.title: getTitle(body),
         html: getBody(body)
       }
       res.status(200).send(json)
     })
    Copy the code

The practical application

This open source tool can be used in a wide range of scenarios. We can convert almost any web link to MD content and synchronize it to our blog or content management platform, but we need to be aware of copyright and be a law-abiding “netizen”.

Supporting environment

Modern browsers and IE11.



IE / Edge


Firefox


Chrome


Safari


Opera
IE11, Edge last 2 versions last 2 versions last 2 versions last 2 versions

Participate in the contribution

We welcome your contributions, and you can help us build it together at 😃

  • Report bugs through Issue.
  • Submit Pull Request for improvement together.
  • Github address: portal