Taskhub. Work /article/730… It has been reprinted with authorisation

Front-end development technologies are changing rapidly. Angular/Vue/React frameworks have become standard for development due to modern construction and user experience requirements. Most applications are SPA, but many new problems have also been introduced:

  • SEO is not friendly
  • The first screen rendering is slow

There are many solutions in the open source community to address these issues, and this article focuses on comparing them.

I. Client rendering (CSR) scheme

SPA developed by React is a CSR solution. As shown in the figure, there is no content in the HTML page before it reaches the browser. The interface is displayed only after the browser performs the corresponding asynchronous request to obtain data filling.

advantages

  • Advantages of SPA (good user experience)

disadvantages

  • SEO is not friendly (if the crawler does not have the ability to execute JS, such as Baidu, the page obtained is empty, which is not conducive to website promotion)
  • Slow loading on the first screen (loading data after reaching the browser increases the user’s waiting time)

Second, server rendering (SSR) scheme

Basic Principles:When the browser arrives, it intercepts the asynchronous JS request to execute part of it and fills the data into the HTML page in advance and returns it to the browser. This crawler crawls to the page is with data, conducive to SEO

Problems to be solved:

  1. Most applications are developed with a state management scheme (Vuex, Redux). SPA applications are empty before arriving at the browser. Using SSR means that data needs to be filled into the store on the server in advance
  2. The corresponding hooks (created for Vue, componentDidMount for React) need to be intercepted and wait for the asynchronous data request to complete to confirm that the rendering is complete

The community also has a framework for addressing these issues:

The framework The solution Github star
Vue Nuxt.js 28.4 k.
React Nextjs 50.8 k.
Angular

React and Vue render methods can be implemented using frameworks.

advantages

  • SEO friendly
  • Fast rendering of the first screen (can cache the page on the server side, request coming directly to the HTML)

disadvantages

  • Large code changes, specific SSR framework changes need to be made (after our practice, the original SPA code changes greatly)
  • Lost part of the SPA experience
  • Node can easily become a performance bottleneck

Iii. Pre-rendering scheme during construction

Solution Github Star
prerender-spa-plugin 6k
puppeteer 63.2 k.
phantomjs 1.4 k.

Basic Principles: In view of the problem that there is only one index. HTML file entry after SPA application development, the aforesaid pre-rendering middleware is used to obtain page data in advance during front-end project construction and generate multiple pages, such as about, Help, contact and so on. Optimized first screen rendering and partial page SEO

advantages

  • Code is less intrusive

disadvantages

  • Cannot be used in scenarios with large number of dynamic path pages (the generated HTML page data is large and the page data is updated. E.g. /article/123, article page)
  • The front-end should synchronize the updated version when the background request data changes

4. Server-side dynamic rendering (using User-Agent)

Back to the original requirements, SPA technology was used to improve user experience, SSR, pre-rendering and other technologies were used for SEO. There is a certain gap between different technical schemes and advantages cannot be taken into account. SPA is for ordinary browser users, SSR is for web crawlers, such as Googlebot and Baiduspider. Then why can’t we give different “users” different pages? Dynamic rendering on the server is such a scheme.

Basic principle: the server side to request the user-agent to judge, the browser side directly to the SPA page, if it is crawler, to the HTML page after dynamic rendering

PS: You may ask, is giving a different page to a crawler considered web cheating?

Google replied:

Dynamic rendering is not cloaking

Googlebot generally doesn’t consider dynamic rendering as cloaking. As long as your dynamic rendering produces similar content, Googlebot won’t view dynamic rendering as cloaking.

When you’re setting up dynamic rendering, your site may produce error pages. Googlebot doesn’t consider these error pages as cloaking and treats the error as any other error page.

Using dynamic rendering to serve completely different content to users and crawlers can be considered cloaking. For example, a website that serves a page about cats to users and a page about dogs to crawlers can be considered cloaking.

That is to say, if we do not deliberately cheat, but use dynamic rendering scheme to solve SEO problems, crawlers will not be considered as cheating after comparing website content and there is no obvious difference.

advantages

  • Take into account the advantages of SPA while solving SEO problems

disadvantages

  • Server application required (but dynamic rendering is only for crawlers and is not a performance bottleneck)

Conclusion: After the practice, advantages and disadvantages of other schemes in the early stage, we finally choose scheme 4 dynamic rendering as THE SEO scheme of SPA.

Implementation details

The final implementation is shown above. (There are optimization points: CDN integration on the right, Node can be considered to replace some functions of Nginx and simplify the architecture)

Community programmes:

plan github star describe
puppeteer 63.2 k. Can be used for dynamic rendering, front-end testing, operational simulation. The API is rich
rendertron 4.9 k. Dynamic rendering
prerender.io 5.6 k. Dynamic rendering

Puppeteer was used as the dynamic rendering scheme.

Rely on:

{
  "dependencies": {
    "bluebird": "^ 3.7.2." "."express": "^ 4.17.1"."puppeteer": "^ 5.2.0." "."redis": "^ 3.0.2." "."request": "^ 2.88.2"}}Copy the code

Code reference Google official Demo for transformation, the following is the basic code:

server.js

import express from 'express';
import request from 'request';
import ssr from './ssr.js';

const app = express();

const host = 'https://www.abc.com';

app.get(The '*'.async (req, res) => {
    const {html, ttRenderMs} = await ssr(`${host}${req.originalUrl}`);
    res.set('Server-Timing'.`Prerender; dur=${ttRenderMs}; desc="Headless render time (ms)"`);
    return res.status(200).send(html); // Serve prerendered page as response.
});

app.listen(8080.() = > console.log('Server started. Press Ctrl + C to quit'));

Copy the code

ssr.js

import puppeteer from 'puppeteer';

// In-memory cache of rendered pages.
const RENDER_CACHE = new Map(a);async function ssr(url) {
    if (RENDER_CACHE.has(url)) {
        return {html: RENDER_CACHE.get(url), ttRenderMs: 0};
    }
    const start = Date.now();

    const browser = await puppeteer.launch({
        args: ['--no-sandbox'.'--disable-setuid-sandbox']});const page = await browser.newPage();
    try {
        // networkidle0 waits for the network to be idle (no requests for 500ms).
        await page.goto(url, {waitUntil: 'networkidle0'});
        await page.waitForSelector('#root'); // ensure #posts exists in the DOM.
    } catch (err) {
        console.error(err);
        throw new Error('page.goto/waitForSelector timed out.');
    }

    const html = await page.content(); // serialized HTML of page DOM.
    await browser.close();

    const ttRenderMs = Date.now() - start;
    console.info(`Puppeteer rendered page: ${url} in: ${ttRenderMs}ms`);

    RENDER_CACHE.set(url, html); // cache rendered page.

    return {html, ttRenderMs};
}

export {ssr as default};
Copy the code

The Demo code has the following problems:

  • The page is rendered and returned to the browser, sometimes making an asynchronous request to retrieve the data again (repeated requests)
  • Map is used for page caching, which will be lost when node service crashes. No timeout limit, large memory consumption as time grows (caching mechanism)
  • Repeat request for React/Vue static file, SSR function will render as one page (error render)

Let’s tackle each of these questions one by one

Repeat request:

The root cause is the React/Vue code lifecycle functions executing repeatedly. Generally we created/componentDidMount requests for asynchronous data hook the hook in dynamic rendering when carried out once, at the time of the HTML returned to the browser, dom mount perform it again, this problem also is mentioned in Google Support. You can perform asynchronous requests by tweaking the front-end code to determine if the page has been rendered dynamically. May refer to:

componentDidMount() {
    const PRE_RENDERED = document.querySelector('#posts');
    if(! PRE_RENDERED) {// Asynchronous request
        // Insert a DOM element with #posts ID}}Copy the code

Caching mechanisms

In order to solve the problem of Map cache, we use Redis to modify the Map cache, add timeout mechanism, and avoid the node crash cache breakdown problem

redis/index.js

import redis from 'redis';
import bluebird from 'bluebird';

bluebird.promisifyAll(redis);

const host = 'www.abc.com';
const port = 6379;
const password = '123456';

const client = redis.createClient({
    host,
    port,
    password,
    retry_strategy: function(options) {
        if (options.error && options.error.code === "ECONNREFUSED") {
            return new Error("The server refused the connection");
        }
        if (options.total_retry_time > 1000 * 60 * 60) {
            return new Error("Retry time exhausted");
        }
        if (options.attempt > 10) {
            return undefined;
        }
        return Math.min(options.attempt * 100.3000); }}); client.on("error".function(e) {
    console.error('dynamic-render redis error: ', e);
});

export default client;
Copy the code

ssr.js

import puppeteer from 'puppeteer';
import redisClient from './redis/index.js';

async function ssr(url) {
    const REDIS_KEY = `ssr:${url}`;
    const CACHE_TIME = 600; // 10 minutes cache
    const CACHE_HTML = await redisClient.getAsync(REDIS_KEY);

    if (CACHE_HTML) {
        return { html: CACHE_HTML, ttRenderMs: 0 };
    }
    const start = Date.now();

    const browser = await puppeteer.launch({
        args: ['--no-sandbox'.'--disable-setuid-sandbox']});const page = await browser.newPage();
    try {
        // networkidle0 waits for the network to be idle (no requests for 500ms).
        await page.goto(url, {waitUntil: 'networkidle0'});
        await page.waitForSelector('#root'); // ensure #posts exists in the DOM.
    } catch (err) {
        console.error(err);
        throw new Error('page.goto/waitForSelector timed out.');
    }

    const html = await page.content(); // serialized HTML of page DOM.
    await browser.close();

    const ttRenderMs = Date.now() - start;
    console.info(`Puppeteer rendered page: ${url} in: ${ttRenderMs}ms`);

    redisClient.set(REDIS_KEY, html, 'EX', CACHE_TIME); // cache rendered page.
    return {html, ttRenderMs};
}

export {ssr as default};
Copy the code

Error rendering

/static/1231234sdf.css. These paths will be rendered as a page path instead of a static resource, resulting in rendering errors. Solution: Add path matching interception, resource files directly request the original domain name

import express from 'express';
import request from 'request';
import ssr from './ssr.js';

const app = express();

const host = 'https://www.abc.com';

app.get('/static/*'.async (req, res) => {
    request(`${host}${req.url}`).pipe(res);
});

app.get('/manifest.json'.async (req, res) => {
    request(`${host}${req.url}`).pipe(res);
});

app.get('/favicon.ico'.async (req, res) => {
    request(`${host}${req.url}`).pipe(res);
});

app.get('/logo*'.async (req, res) => {
    request(`${host}${req.url}`).pipe(res);
});

app.get(The '*'.async (req, res) => {
    const {html, ttRenderMs} = await ssr(`${host}${req.originalUrl}`);
    res.set('Server-Timing'.`Prerender; dur=${ttRenderMs}; desc="Headless render time (ms)"`);
    return res.status(200).send(html); // Serve prerendered page as response.
});

app.listen(8080.() = > console.log('Server started. Press Ctrl + C to quit'));

Copy the code

Dynamic rendering has several obvious advantages over SSR:

  • With the same SEO effect as SSR, puppeteer can further customize SEO solutions

  • Node application has low load pressure and only needs to deal with crawler request, which is equivalent to doing SSR only when crawler comes to the page

  • From the overall architecture is equivalent to a plug-in, can be plugged at any time, no side effects

  • SPA code doesn’t need to be changed a lot (just use a flag bit to identify duplicate requests, but also ignore the problem)

(Repeated requests only occur when the crawler is capable of js execution, and it is generally ok to request data again.)

The appendix

Common crawler user-agent

The main body user-agent use
Google googlebot Search engine
Google google-structured-data-testing-tool Testing tools
Google Mediapartners-Google When the Adsense AD page is visited, the crawler visits
Microsoft bingbot Search engine
Linked linkedinbot In-app search
baidu baiduspider Search engine
Qihoo 360 360Spider Search engine
sogou Sogou Spider Search engine
Yahoo Yahoo! Slurp China Search engine
Yahoo Yahoo! Slurp Search engine
Twitter twitterbot In-app search
Facebook facebookexternalhit In-app search
rogerbot
embedly
Quora quora link preview
showyoubot
outbrain
pinterest
slackbot
vkShare
W3C_Validator

Simulated crawler test

#Return to SPA page without user-Agent, no data in HTMLCurl The full path to your website#Simulation crawler, return page should have title, body and other data, convenient SEOCurl -h 'user-agent :Googlebot' the full path to your websiteCopy the code

The resources

[1] Pre-rendering during construction: optimizing the first frame of web pages

[2] Implement dynamic rendering

[3] Overview of Google Scraping tools (user agents)