A front end knows SEO

Seeing that the era of mobile Internet has gradually returned to dull from the top of the wave, 5G technology will bring changes and even subversion is unknown; See wechat, toutiao, Ali and other more see abundant ecological closed loop constantly encroaching on a certain degree of the search market. But there are still a lot of various kinds of business 2C websites, still have to face the problem of SEO, have to face the domestic SEO problem. Why to emphasize the domestic, because we have the world’s largest “Chinese search engine” (manual funny). This made me think that SEO is a metaphysics, you have to put up with it may be hot and cold to you wonderful attitude, but can not bear to wave goodbye, stride forward without looking back.

SEO(Search Engine Optimization) : Search Engine Optimization. Website maintainers follow the rules of search engines to optimize their own websites, in order to station content in the user’s search results to occupy the highest possible ranking. Today, it remains one of the most cost-effective and effective ways to get products to users and traffic.

Here’s a mind-blowing statement: Good content is always the best SEO. But if you’re a tech guy, it’s not up to you. So, what else can I do as a tech guy?

Writing here suddenly reminds me of the interview two years ago. Q: What are the advantages of HTML5 tags? A: Good for SEO, Balabala…

HTML tags

The new semantic tags brought about by the latest HTML standard do make business code easier to read for both humans and machines. With tags like

Imagine how clean and comfortable the entire HTML business code section would look if the various semantic tags were properly used. Imagine if search engines’ spiders knew exactly what these tags meant, as I did.

The previous point is certain, if you can accurately use a variety of labels, it is duty-bound to use. The latter point, however, is less certain, as html5 is not fully supported by modern browsers, let alone the extent to which search engines support so-called semantics. Web developers have always been accompanied by

and other “ugly” older brothers, so whether or not you can use the new semantic tag, do not lose the original “ground”. On this basis, “intrepid progress”.

First of all, the h tags are weighted from 1 to 6, so a page only holds one < H1 > tag, which represents the main title of the page, and the other headline tags are arranged according to the actual content hierarchy.

Then for the tag, the search engine crawler will dig into all corners of the site based on the path provided by its href attribute. The tags form a wide web, and you want to make sure the Spider has fun playing around with the web, but there are inevitably places you don’t want it to explore, such as links to personal centers, outbound links, etc. Instead of using tags, consider binding events such as Click to the element to achieve actual page hopping. Or consider adding a special attribute to the tag home to tell the Spider “Inside road, no visitors”.

Also don’t forget the tag. Give IMG an Alt attribute that describes the meaning of the image and the content of the page. On the one hand, it can give the user a certain hint when the picture fails to load. On the other hand, it can tell the search engine what this picture means and facilitate its index.

Finally, wrap your keywords in a & tag to give the search engines a sense of emphasis.

In short, let search engines “see at a glance” where the keywords are on the page.

These are basically semantic tags that appear within the tag, the body of the page, and can be seen by machines as well as by humans. And in the tag there are many tags to look at the machine, do SEO optimization students most familiar with the “SEO three swordmen” TDK.

SEO three Musketeers: TDK

What is meant by TDK?

tags,
tags, and
tags. As the name implies, they represent the title, summary, and keywords of the current page, respectively, of which title is the most important for SEO.

First of all to see`<title>`The label

From the user’s point of view, its value is the title that the user sees in search engine results and browser tabs, as shown below:

Title usually consists of the title of the current page plus a few keywords, while keeping it short and concise. In short, let the other person know what you’re going to say in a minimum of 40 words.

w3Cschool – programming lion, anytime and anywhere to learn programming ; For other pages, it is a short underline _ link page title, keywords and the form of the name, such as: Web micro lesson _ programming micro lesson _w3Cschool .

A good title not only lets the user know what the page is about, but also determines whether there is anything I need in advance, as well as for search engines. Therefore, when setting the title, not only should pay attention to the above points, but more importantly, do not repeat! Don’t repeat yourself! Don’t repeat yourself!

Then come to see`description`

It is not usually indexed or ranked by search engines, but it can be one of the alternative targets for a search engine to display a summary of a page in the search results page, as well as other content, such as the beginning of the body of the page. Take the page corresponding to the sample graph in the title section. Its description would look something like this: < meta name = “description” content = “Web front-end development engineer, main responsibility is to use (X) HTML/CSS/JavaScript/Flash client of various Web technologies such as product development. Complete the development of client program (namely browser side), develop JavaScript and Flash module, at the same time combine background development technology to simulate the overall effect, enrich the Web development of the Internet, committed to improve user experience through technology. >. As you can see, this is exactly what the search results summary shows.

Therefore, the value of description should be as clear as possible about the content of the page, so that the user can better understand whether the page he is going to is valuable to him. At the same time, the number of words in the best control 80-100 words, do not repeat between the pages! Don’t repeat yourself! Don’t repeat yourself!

Finally,`keywords`

. It mainly provides the keywords of the current page for the search engine. The keywords are separated by English commas. It is usually recommended that three or five words are enough to express the key information of the page clearly. Avoid by all means a large number of keywords, after all, in today’s changing technology, search engines if you find your title information irrelevant topic may take the initiative to help you modify, not to mention find you want to spend on the more insignificant keywords (there seems to be no causal relationship, manual dog head ~).

These are the three most important tags in SEO, all of which are found in the HEAD tag. Let’s take a look at other SEO related tags that exist in the tag.

Meta information tags and other tags

All three of SEO’s “TDK” are meta information tags. Meta information tags are used to describe the HTML document information of the current page. In contrast to semantic tags, they usually do not appear in the user’s view, so they are only for the machine to see the information, such as browsers, search engines, etc. (of course, also for our code farmers to see ~).

1. The meta: robots tag

Aside from TDK, there is a
tag associated with SEO. By default, the tag attribute is set like this:
It is somewhat similar to the A tag with rel attributes mentioned above.

At this time, if the page is a girl, will tell the search engine to come to chat up: you can leave my contact information (grab the page), and I also introduce you to my relatives, friends, uncles, aunts, brothers and sisters (continue to climb other pages inside the station), and finally allow you to take photos for me (generate the current page snapshot).

CONTENT	meaning
INDEX	Allows fetching the current page
NOINDEX	Do not crawl the current page
FOLLOW	Allows you to crawl down from links on the current page
NOFOLLOW	Do not crawl down from links on the current page
ARCHIVE	Allow snapshot generation
NOARCHIVE	Do not create snapshots

Through the combination of the above three sets of values, you can express a lot of useful information to the search engine. For example, a blog whose article list page makes no sense for search engines to list, but has to crawl through the list page to retrieve a specific article page, can try as follows: < meta name = “robots” content = “index, follow, noarchive” >.

2. Canoncial and alternate tags

<link rel="canoncial" href="https://www.xxx.com" />
<link rel="alternate" href="https://m.xxx.com" />

Let’s start with the canoncial tag. When there are multiple pages with the same or similar content within the site, you can use this tag to point to one of them as the canonical page. Keep in mind that not only are the main routes different, but even slight differences in HTTP protocols (HTTP/HTTPS) and query strings will be treated as completely different pages/links by search engines. (The same origin policy of the browser)

If there are a lot of similar pages, its weight will be ruthless dilution. For example, there are many article list pages, for example, the same product page links contain different business parameters, etc. In the latter case, assume the following link:

www.shop.com/goods/xxxx
www.shop.com/goods/xxxx?…
www.shop.com/goods/xxxx?…

At this point, we can add a link tag to the head for the latter two: If the search engine complies with the convention of the label, it will largely avoid the dispersion of page weight, not to affect the search engine’s collection and ranking. This means something similar to HTTP 301 permanent redirection, except that a user visiting a page tagged with canonical does not actually redirect to another page.

Canoncial tag was originally proposed and put into practice by Google and other foreign companies, Baidu officially announced support for this tag in 2013, details can be seen in the article of Baidu search resources platform: Baidu has supported Canonical tag.

Now look at the alternate tag. At the beginning of the article also mentioned that the mobile Internet era at the top of the wave has begun to show a lonely state, but anyway, it is still on the top of the mountain. So, if you have separate sites for mobile and PC devices, this tag might come in handy. Take a look at the example at the beginning of this section, which has two links:

www.xxx.com
m.xxx.com

They are the PC side and the mobile side of the homepage of a website respectively, so you can provide the following tag in their head tag to mark their corresponding relationship:

<link rel="canoncial" href="https://www.xxx.com" />
<link rel="alternate" href="https://m.xxx.com" media="only screen and (max-width: 750px)"/>

The former on the mobile end of the page, said my PC page brother follow the lead; The latter is placed in the corresponding page on the PC side, indicating that when the screen size is less than 750px, it is time for me to serve the mobile side of the page!

About the routing

If you are responsible for the routing design of the whole station, then the heavy responsibility falls on your shoulder. Once the site is up and growing, it will be difficult to adjust the routing structure. Therefore, it should be considered and arranged properly from the beginning.

Let’s say we’re going to make a fruit and vegetable website, and all the sub-pages are distributed from the front page down. So the home page has a simple category: fruits and vegetables. Then there are more subdivided categories from fruits: tropical fruits, seasonal fruits, out-of-season fruits and so on. Other sections on the home page may have more personalized categories such as hot products, guess what you like, and so on. Under the classification, corresponding to the list results belonging to this category, from the list results to enter a fruit and vegetable details page.

|
|-- home
|    |-- classify|list
|    |     |-- list|detail 
|    |     |    |-- detail
|    |     |    |

Copy the code

Above, basically through four pages at most can arrive from the home page product details page. In the routing design should ensure that users reach the final target page with as few routing levels as possible, should not be incontinently in-depth, users lost in the deep as the sea of links in the station; In the same way, search engines have limited resources to allocate to a site, which also ensures that search engines can efficiently crawl to useful pages.

Also be careful not to produce isolated pages. (This has nothing to do with routing design.)

Let’s say we add a new section selling fruit and vegetable products, such as POTS and pans, detergents, etc., but neither the home page nor any other page points to the entrance of this section. Thus, the plate becomes an island in the middle of a vast ocean, and no one knows the “course” to reach it. Search engines don’t know either. (Unless high quality outer chain ~)

At this point, a new section classification entry can be added to the home page, or the entry of peripheral products related to the fruit and vegetable, such as apples and peelers, can be added to the original fruit and vegetable details page. And other reasonable practices…

In a word, this long winded nonsense is intended to express, as few hierarchies as possible organized routing and clear meaning; Build sensible categories so that goods (or other things like blog posts) with the same attributes fall under the same level of routing; Any pages that want to be found by search engines should be added to the “route” to the page to avoid “islands”.

Finally, the chain is properly arranged, the website development is about the same, to prepare for the official launch. There are two new friends: robots.txt and sitemap.

robots.txt

Robots.txt, robots Exclusion Protocol(REP), it and the meta information tag mentioned in the previous chapter
can be regarded as “original”. The meta information tag exists in a single page and is used for the page on which it exists; Robots protocol exists in the website root directory (www.xxx.com/robots.txt can…) , acting on the whole station.

This protocol is not a formal specification, but has been established over the course of the Internet’s long history. It is followed by most search engines and used and relied upon by countless websites. Even so, do not comply with the “hidden rules” of the “worm” is still difficult to guard against, even if your robots.txt prepared as follows:

User-agent: *
Disallow: /

Copy the code

Flouting the rules will still sneak into your site and run amok, like the infamous YisouSpider of years past (and probably still is). Therefore, the robots protocol can only be used as a code of conduct to guide Spider “good people” in the station, just like “Keep off the lawn”, and do not expect to use it to protect the privacy of the station. If you want to resist the harassment of rogue crawlers, you still have to rely on your own technical means on the server side.

The good news is that in July, Google announced that it would push the Robots protocol to become a new Internet standard. But it reminds me of the HTTP protocol, which is fine, but there are plenty of developers who don’t follow the protocol, and business development can still go on without completely following the protocol. After all, if you don’t follow the protocol, you get the data you want

Let’s look at the robots protocol specification.

First, the file is required to be plain text in UTF-8 format.

:

<#optional-comment> Whitespace and comments are optional, just for readability.

Among them, the field value generally have the user-agent | Disallow | Allow three, there is an additional item a Sitemap, used to point to the site where the Sitemap, whether to support depending on the specific search engine. We make a set of rules by combining the first three fields. There can be more than one set of rules. A set of rules, could begin by more than one to the user-agent, then Allow | Disallow field specifies the specific rules, at least one. Groups are separated by blank lines, for example:

# first group
User-agent: Baiduspider
User-agent: Googlebot
Disallow: /article/

# second group
User-agent: *
Disallow: /

Sitemap: https://www.xxx.com/sitemap.xml

Copy the code

Above:

Allow Baidu and Google search engines to access all files/pages in the website except those in article directory (eg: article. HTML is ok, article/index.html is not);
Don’t allow other search engines to visit the site;
Specifies where the site map is located.

If you allow the whole site to be accessible, you can not add a robots file to the root directory. More detailed use specifications can be seen in the Google Robots specification and baidu resources platform article what is a robots file.

sitemap

Sitemap files are another tool (protocol) that helps search engines access your site. Sitemap files do not necessarily mean that your site will be indexed, but they do make it easier for search engines to access your site faster and more purposefully.

The sitemap document lists the links to all the pages on your site that you want the search engine to visit. It will include the url, the last update date (lastmod), the update frequency (Changefreq), and the weight (priority) of the page. The last three are optional. Documents are usually in XML format. Of course, they can also be in TXT format or HTML format. Here, XML format is used as an example.

First of all, sitemap documents, like robots, are in UTF-8 format, and all data in the files must be translated. Take a look:

<?xml version="1.0" encoding="UTF-8"? >
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.xxx.com/</loc>
      <lastmod>2019-12-17</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.5</priority>
   </url>
   <url>
      <loc>http://www.xxx.com/detail/xxx</loc>
      <lastmod>2019-12-17</lastmod>
   </url>
</urlset>

Copy the code

The beginning of the document declares the XML version and character encoding format
In order to<urlset>Tag as the top-level tag and specify the XML namespace
Use one url for each page<url>Parent label package
Contains aWill choosethe<loc>Subtags wrap page links
The rest of the<lastmod> | <changefreq> | <priority>Three child tags are optional

Then, the url must be “translated,” which means that if the URL contains the characters in the table below, it must be replaced with the corresponding character entities.

Character	Escape Code
Ampersand – &	&
Single Quote – ‘	'
Double Quote – “	"
Greater Than – >	>
Less Than – <	<

For more information about Sitemap, see the Sitemap protocol.

Based on that, I combined nodejs, Axios, Cheerio, and javascript template strings to create a “little toy” that generates sitemap files and filters out links to pages that don’t want to be included in Sitemap. Currently, however, only sitemap files containing LOC tags are supported. If you were lucky enough to see me, I might be able to play with it.

In addition, there is a free sitemap generation tool that is widely available on the Internet. You should find resources by searching tiger Sitemap, which is a free version of Lingli Tiger long ago. If you need to automatically generate and update sitemap files under certain conditions (such as article updates), you will need to customize your sitemap generator to fit the project itself.

Prepared the file, you can put it together with robots files in the root directory of the site to wait for the search engine to come “lucky”. You can also be proactive. Major search engines provide sitemap file submission channels to help search engines get to sites faster and more purposefully.

There are active push, natural also have passive way, for example, many websites facing the domestic market need to rely on baidu search engine, it provides automatic push code.

Baidu automatically pushes the code

Simply insert the following JAVASCRIPT script into the current page and push the link to the page to the search engine when the user visits.

(function() {
  let bp = document.createElement('script');
  const curProtocol = window.location.protocol.split(':') [0];
  if (curProtocol === 'https') {
    bp.src = `https://zz.bdstatic.com/linksubmit/push.js`;
  }else {
    bp.src = `http://push.zhanzhang.baidu.com/push.js`;
  }
  let s = document.getElementsByTagName('script') [0]; s.parentNode.insertBefore(bp, s); }) ();Copy the code

To save trouble, it is usually thrown directly into a common code block for global loading, but this also introduces some problems:

Some pages don’t want to be pushed, but are pushed anyway because they are globally public
Some pages may already be indexed, but once the page is visited, the link will still be pushed over and over again

It is a pity that so many years came over, also did not see Baidu to the relevant code has the update or explanation on the surface (beware of playing face, is I did not notice? :)). I really don’t know how to avoid the second problem, but I can still avoid the first problem, so I wrote a method to determine whether to load the automatic push script, which is roughly as follows (under Vue) :

function canSubmit(toObj) {
  const toPath = toObj.path;
  let canSubmit = false;
  let isInScope = false;

  if(
    toPath === '/' / / home page| | -/\/search/u.test(toPath) && Number(toObj.query.page) === 1) // Search the first page
    || /\/detail\/(article|news)/u.test(toPath) / / page for details
    // ...
  ) {
    isInScope = true;
  }
  
  if(isInScope && process.env.TEST_ENV === 'prod') {
    canSubmit = true;
  }
  
  return canSubmit;
}

Copy the code

A few years ago, Baidu posted on the Baidu Statistics Bar that the Baidu statistics code also had an automatic push function.

So, this is tragic. If so, once using Baidu statistics, for the above two problems of any circumvention will have no egg use…

In the end,

The question about redirection

The Canonical tag, mentioned in the previous article, acts like a redirect (301), but the user can still access the page. In the real world, there are a lot of scenarios that actually require redirection, but here are two:

Page reformatted with new links enabled, the original page no longer exists (404)
The web site supports both HTTP and HTTPS access. You need to redirect HTTP to HTTPS

In this case, you need to deal with it in the server configuration file, usually using 301 permanent redirection. Although I heard 302, 307 and other search engine recognition, but, stability…… : (

The above. They are just some of the things I know, and there are probably a lot of details I don’t know waiting to be explored

Put your hands up, like, pay attention, communicate, criticize… ~

HTML tags

SEO three Musketeers: TDK

First of all to see<title>The label

Then come to seedescription

Finally,keywords