Search engine - Web crawler principle and SEO optimization

Recently, the company decided to rebuild the official website, the main reason is to solve the SEO problem (improve the ranking of the website, according to the keywords to accurately locate the site), although I have learned about Vue SSR (Vue SPA for SEO solution), but really let me explain the principle clearly, for me, it is still a little difficult. The following review of many relevant articles and summed up a set of system to convince myself, such as resource infringement, please inform.

Search Engine -Search Engine

Want to understand SEO solutions and principles, the fundamental is the search engine, for the browser is not too understanding of the students may say, search engines who do not know, Baidu, Sogou, Google… Yes, the purpose of our website is to allow users to use search engines when the search keywords can be accurately positioned to the site and ranked high.

The technical architecture of search engines is extremely complex. It is impossible for me to introduce the architecture made by enterprises like Baidu for many years in a few words. Here I only introduce the general architecture of search engines, as shown in the figure:

A glance at the architecture diagram reveals that the whole thing can be roughly divided into three parts:

Crawlers for the Internet — left
Data – oriented storage System
User – oriented search system — right

Please refer to the figure for details. The general structure can be summarized in one sentence:

The search engine captures the website information on the Internet through the crawler system, backs up the captured website image to the database, and users can directly take the data in the database through keywords when searching.

It should also be mentioned here that the search system and crawler system are not simply parallel or sequential, but need to be analyzed in detail according to the actual situation. In fact, crawler system is running every minute and every second theoretically, constantly capturing keywords related to users for users to search, so that users can search for them. (As for when the user searches a keyword that does not exist in the database, will it open a crawl task in parallel to the crawl this argument is not known, nor is it the scope of our discussion, here put forward just to give you a space to think)

Note: Since our topic is SEO, and SEO is mainly for crawling web crawlers, we only cover the left side here

Web crawler -Internet Worm

Simply put, ** web crawler is a content collection tool for search engines to visit your website and then collect your website. ** mentioned a crawler so what is a crawler?

Web crawler is the most basic part of search engine. The following is the basic architecture of an ordinary web crawler:

Let’s analyze this architecture diagram, which is mainly divided into the following parts:

Web page = (download) => Web Page Library (database)
Web page = (read internal link) => Extract URL = (analysis as fetched) => Fetched URL queue
Web page = (read internal link) => Extract URL = (analysis indicates that it is not captured) => URL queue to be captured => Read URL => Loop execution

Here is a special introduction to seed URL: seed URL is artificially set up some URL for the crawler to grab. It can be understood as the entry URL of the capture, and then spread through its internal links to capture. (for example, when inquiring some illegal websites, you can use this way to seal up all the unreasonable websites around it, personal understanding).

Knowing the principles of search engines and crawlers, here we make a detailed process analysis:

There will be a very huge index library in the background of the search engine website, which stores a large number of keywords, and each keyword corresponds to a lot of urls, these urls are called “search engine spider” or “web crawler” program downloaded from the vast Internet bit by bit and collected. With the emergence of various websites, these hardworking “spider crawling on the Internet every day, from one link to another link to download the contents, carries on the analysis, find the key words, if the” spider “don’t think keywords in the database and is useful for the user into the background database. On the contrary, if the “spider” thinks it is junk information or repeated information, it will abandon it and continue to crawl, looking for the latest and useful information to save for users to search. When the user searches, from the index library can be retrieved and keyword related url display to visitors. A keyword corresponding to multiple sites, so there is a sorting problem, the corresponding when the keyword most consistent with the site will be in front of the. In the “spider” crawl web content, extract keywords in this process, there is a problem: “spider” can understand. If the content is Flash, JS, etc., then it is incomprehensible, no matter how appropriate the keywords are. Accordingly, if the website content can be identified by the search engine, the search engine will improve the weight of the site, increase the friendliness of the site, and then improve the ranking of the site.

SEO-Search Engine Optimization

SEO translation called search engine optimization (SEO), the main work is through understanding of how search engines crawl the Internet page, how to index and how to determine its for a particular keyword search results ranking technology, such as the optimization of related web pages for, make its improve search engine rankings, thereby improve the traffic, Technology that ultimately improves the sales ability or promotional ability of a website. Increase site exposure, improve the weight of the whole station, so that users can search your site more easily, and then bring objective traffic.

The advantages of drainage through this strategy are:

Low cost
persistence
Don’t need to assume the risk of “invalid clicks” : the first two filibuster-proof majority, invalid clicks with you the way, in baidu, for example website according to keyword ranking is not only a creeper crawled from the result of part of this is artificial, for baidu advertising, for example, according to the user clicks on the billing, if the number of users was delayed too much can cause great economic loss.

1. Optimization direction

1) Website design optimization

Site main title keyword optimization, must choose a good keyword, generally “a core word + three or five long tail words” combined into the title.
Site layout optimization. Generally speaking, enterprise product website, mainly F type layout, various content of the site to “flat structure” layout.
Code optimization, is the plate, column code, it is best to use the corresponding simple spell or full spell.

2) Website content optimization

Analyze the column keywords, which long tail words, dig out, make a table form. Then, analyze the content of long-tailed words one by one to form the second-level long-tailed words.
According to the mining of long tail words, analysis of user needs, mining and related content, sorting out the article, published on the website, to ensure high quality articles.

2. Spa-single Page Application

SPA stands for single Page application. As we all know, SPA is the most popular front-end framework with advantages of partial refresh, separation of front and back ends, better performance, cost saving and so on. Of course, no frame is perfect for SPA, and its biggest weakness is related to SEO.

SPA is bad for SEO reasons:

1, crawl page information incomplete: data-driven view, a large amount of JS code, TDK, cannot be crawled

2. The number of pages to be crawled is limited: there is only one index. HTML file in a single page, and routing sub-pages cannot be crawled

In the final analysis, it is the incomplete information of crawler crawling SPA page that will affect a series of problems such as low ranking and low traffic.

SPA SEO solution (Vue) : pre-render and server-side render

Computations: For data processing, DOM Tree, CSSOM Tree and Render Tree are Computed

Render Class: client Render & server Render

Client rendering:

With the popularization of Ajax technology and the rise of front-end frameworks (JQ, Angular, React, Vue), they began to turn to front-end rendering, using JS to render most of the content of the page to achieve the role of local refresh.

Process: HTML is only a static file. When the client side requests it, the server side does not do any processing, but directly returns it to the client side in the form of the original file. Then, according to JavaScript on THE HTML, DOM is generated and HTML is inserted.

Server side rendering

Process: Before the server returns the HTML, it fills it with data in specific fields, symbols, and passes it to the client, who is only responsible for parsing the HTML

Plan 1: pre-render

Pre-rendering is based on prerender-SPa-plugin, which simulates a browser request with a headless browser at project build time and inserts the resulting data into the template to generate HTML that already contains the full static resources so that web crawlers can grab more information about the site.

Pre-rendering is done using the prerender-SPa-plugin module in conjunction with webpack to generate static pages corresponding to routes.

Pre-render flow: Visit the homepage of a website => simulate (configured pages) requests => pre-load => generate multiple full pages => for crawlers to crawl

Scheme 2: Vue SSR(Server Rendering)

Server rendering is backend server request data first, then generate complete first screen HTML returned to the browser, the server rendering is returned to the client has gained the asynchronous data and perform the final HTML and JavaScript web crawler can crawl to the entire page information, SSR another important role is to speed up the first screen rendering, Because you don’t have to wait for all the JavaScript to download and execute before the server-side rendered markup is displayed, the user sees the fully rendered page more quickly.

Server rendering process: Visit the homepage of the website => Server loads all data => generates a complete first screen => Returns a client => crawler crawls the complete page

Although this article is a technical soft article, but I feel that if there is no macro concept of a technology, it is very unfavorable for personal development to come up with the code.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Search engine – Web crawler principle and SEO optimization

Search Engine -Search Engine

Web crawler -Internet Worm

SEO-Search Engine Optimization

SPA SEO solution (Vue) : pre-render and server-side render

Plan 1: pre-render

Scheme 2: Vue SSR(Server Rendering)

Search engine – Web crawler principle and SEO optimization

Search Engine -Search Engine

Web crawler -Internet Worm

SEO-Search Engine Optimization

SPA SEO solution (Vue) : pre-render and server-side render

Plan 1: pre-render

Scheme 2: Vue SSR(Server Rendering)

Related Posts

The Difference between useRef and createRef

Write a complete set of VUe-based MVVM principles

Day96: Z-shape Transformation (Item No. 6)