Dynamic website SEO solution summary
First of all, a few concepts:
- SPA: Single page applications. Many projects based on the VUE framework are single page applications.
- SSR: Server side Rendering, server side rendering.
- SEO: Search engine optimization refers to the process of improving your site’s keyword ranking and visibility by optimizing your site, fixing it, and optimizing your site off-site.
- IO is a Node.js based application that enables your JavaScript website to support search engines, social media, and it is compatible with all JavaScript frameworks and libraries. It uses PhantomJS to render JavaScript pages and render them as HTML. In addition, we could implement a prerender service layer to cache visited pages, which would greatly improve performance. (Save trouble)
- Nuxt is a general application framework based on vue.js, which presets the configuration required by vue.js application development and server rendering. It can provide the function of generating corresponding static sites for vue.js applications.
- Next: The React generic application framework presets the configuration required to develop server-rendered applications in React.
Technology selection
- Make appropriate evaluation based on existing project framework selection, time cost and learning cost
- If there are too many things that need to be handled on the server side, consider using prerender at the operations layer
- For business application scenarios, it is recommended that you perform PrerenderIo deployment for complex business lines and use your own server to cache crawler pages.
The advantages and disadvantages of the three technology selection comparison
- Next => React documents are mostly in English. The configuration items are easy to use and easy to deploy. Large official website projects are suitable for Next project development when user interaction is complex.
- Nuxt => Vue is basically a copy of Next, and the syntax is also Next syntax, the big hole is that in most stable projects it is 1.4.2 and the existing 2.x version is basically completely incompatible with the older version.
- The rendering efficiency is low and the compilation speed is very slow when the business is complex. Very slow
- Version span is suitable for low compatibility.
- PhantomJS works by forwarding search engine crawler requests to a Node server through Nginx configuration, and parsing full HTML through PhantomJS.
- Can be used as a set of universal services, all SPA pages basically do not need a second reconstruction.
- The disadvantage is that it is constrained by network fluctuation.
- Suitable for complex projects to be collected in a short time
- Permissions at the network layer need to be communicated with O&M.
Overall or combined with the current requirements of the scene and its own conditions to choose, a short time to efficiently complete the requirements.
Related articles:
Use PhantomJS to do SEO optimization for AJAX sites #### A PhantomJS task script First, we need a file called spider.js, which is used by PhantomJS to parse the site.
"use strict";
// Wait time for a single resource to avoid loading other resources after the resource is loaded
var resourceWait = 500;
var resourceWaitTimer;
// Maximum waiting time
var maxWait = 5000;
var maxWaitTimer;
// Resource count
var resourceCount = 0;
// PhantomJS WebPage module
var page = require('webpage').create();
// NodeJS system module
var system = require('system');
// Get the second parameter from the CLI as the destination URL
var url = system.args[1];
// Set the PhantomJS window size
page.viewportSize = {
width: 1280.height: 1014
};
// Get the mirror
var capture = function(errCode){
// External access to page content via stdout
console.log(page.content);
// Clear the timer
clearTimeout(maxWaitTimer);
// Exit as normal
phantom.exit(errCode);
};
// Resource requests and counts
page.onResourceRequested = function(req){
resourceCount++;
clearTimeout(resourceWaitTimer);
};
// The resource is loaded
page.onResourceReceived = function (res) {
// HTTP packet return in chunk mode will trigger the resourceReceived event for several times. You need to check whether the resource is end
if(res.stage ! = ='end') {return;
}
resourceCount--;
if (resourceCount === 0) {// When all the resources on the page have been loaded, the current rendered HTML is intercepted
// Since onResourceReceived is called immediately after the resource is loaded, we need to give JS some time to run the parsing task
// By default, 500 milliseconds are reservedresourceWaitTimer = setTimeout(capture, resourceWait); }};// Resource loading timed out
page.onResourceTimeout = function(req){
resouceCount--;
};
// Failed to load the resource
page.onResourceError = function(err){
resourceCount--;
};
// Open the page
page.open(url, function (status) {
if(status ! = ='success') {
phantom.exit(1);
} else {
// When the initial HTML of the page is returned successfully, start the timer
// When the maximum time is reached (5 seconds by default), intercept the HTML rendered at that point
maxWaitTimer = setTimeout(function(){
capture(2); }, maxWait); }});Copy the code
To test => Phantomjs spider.js ‘https://www.baidu.com/’
Command servitization
In response to a search engine crawler request, we need to servitize this command to create a simple Web service through Node
var express = require('express');
var app = express();
// Import NodeJS child process module
var child_process = require('child_process');
app.get('/'.function(req, res){
/ / a complete URL
var url = req.protocol + ': / /'+ req.hostname + req.originalUrl;
console.log(req,req.hostname)
// The prerendered page string container
var content = ' ';
// Start a child phantomjs process
var phantom = child_process.spawn('phantomjs'['spider.js', url]);
// Set the stdout character encoding
phantom.stdout.setEncoding('utf8');
// Listen to Phantomjs' stdout and concatenate it
phantom.stdout.on('data'.function(data){
content += data.toString();
});
// Listen for child process exit events
phantom.on('exit'.function(code){
switch (code){
case 1:
console.log('Load failed');
res.send('Load failed');
break;
case 2:
console.log('Load timeout:'+ url);
res.send(content);
break;
default:
res.send(content);
break; }}); }); app.listen(3002)
Copy the code
Now that we have a pre-rendered Web service running Node Server.js, all we have to do is forward the search engine crawler’s request to the Web service, and finally return the rendered results to the crawler. To prevent the Node process from hanging, you can start it with nohup, nohup node server.js &. With Nginx configuration, we can easily solve this problem.
Upstream spider_server {server localhost:3000; $Host :$proxy_port; $Host :$proxy_port; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # when the UA contains Baiduspider, it can also add other header information to forward traffic in the form of reverse proxy. Spider_server if ($http_user_agent ~* "Baiduspider") {proxy_pass http://spider_server; }}Copy the code
Reference links:
www.mxgw.info/t/phantomjs… Imweb. IO/topic / 560 b4… Icewing. Cc/Linux – insta… www.jianshu.com/p/2bbbc2fcd…