Introduction to the

SEO, robot.txt, search engine optimization

In the Internet world of Haohai:

  • The Internet is the universe
  • Sites like galaxies
  • The web is a planet
  • Web content is like anything

And the search engine crawler spider roaming the Internet is like a space rover, which is pretty romantic to think about. Each galaxy has its own rules, and if you don’t follow them, be careful that the automatic defense will destroy the wanderer

I once imagined that the world was made up of codes, which was quite interesting. Many supernatural events could be explained as bugs. Once, I had an evening chat with my classmates with rich imagination, and I had the opportunity to find some time to build a code world view.

Rover rule

At the entrance to each galaxy, the root directory of the website, there is a robot.txt, also known as the Wanderer Rules, which records the rules that the wanderer should obey. The Wanderer rule is more of a protocol, and it’s not written that all crawlers will follow it.

When there is no content output, many companies or individuals tend to crawl to other people’s site data through crawlers. If they obey the rules, they can also be called wanderers, but those who are not allowed to crawl unscrupulously are called pirate ships. Sites that are being crawled make certain judgments about these pirate ships, or access rating limits to protect themselves.

List of rules

In robot.txt, user-agent is used to specify which rules should be obeyed by those rovers. The * asterisk is used to indicate that all rovers should be obeyed, such as user-agent: *. Restrictions can also be imposed on specific roaming devices, such as Baidu’s user-agent: Baiduspider. Below the list rule are the corresponding permit and reject rules:

  • Allow the law to passAllow:With the path rulebotsWhich links areShould beCrawl access to.
  • Rejection rule passedDisallow:With the path rulebotsWhich links areShould not beCrawl access to.

Path rules

The path that forms query for pathName can be pieced together with the * and $symbols to form a website path rule. Here are a few examples:

  • List of usershttps://pushme.top/usersExpress by path/users
  • This article reviewshttps://pushme.top/posts/1/commentsExpress by path/posts/*/comments
  • The style filehttps://pushme.top/assets/styles/main.cssExpress by path/assets/styles/*.css$

For more details on URLS, see URL Explosion

Galaxy recommendation

The Sitemap web map is introduced to tell the wanderer which sites and pages are worth visiting. Through a Sitemap: to specify a Sitemap: https://pushme.top/sitemap.xml.

Odd and even rule

Websites, like real life, have odd and even numbers, with rovers and pirate ship crawlers taking up server resources. If too many resources are occupied, normal users will not be able to access the website, so the odd-even rule is used to limit the visiting frequency of the roaming device:

  • Crawl-delay: nEach grab interval n seconds.
  • Request-rate: x/nCrawl X pages in n seconds.

The Golden Rover rule

After talking about the overall structure of the Rover rule, let’s read the Golden Rover rule together. Visit https://juejin.im/robots.txt you will see the following content:

User-agent: * Request-rate: 1/1 Crawl-delay: 5 Disallow: /timeline Disallow: /submit-entry Disallow: /new-entry Disallow: /edit-entry Disallow: /notification Disallow: /subscribe/subscribed Disallow: /user/settings Disallow: /reset-password Disallow: /drafts Disallow: /editor Disallow: /user/invitation Disallow: /user/wallet Disallow: /entry/*/view$ Disallow: /auth Disallow: /oauth Disallow: /zhuanlan/*? sort=newest Disallow: /zhuanlan/*? sort=comment Disallow: /search Disallow: /equationCopy the code

It can be seen that the rule of gold digger is relatively loose, limiting the access rating rate and should not visit the web page. There are no restrictions on specific Baidu and Google wanderers, so students can also write a rover to crawl part of the gold digger content. You can see it in today’s boiling point:

SEO related content

  • Little secret of H1 Mason
  • At the beginning of SEO experience
  • Img large
  • A thousand miles of marriage
  • Throw herself
  • Rover rule

other

Robots file generation is simple and easy to use.

Small two here only discussed some of the SEO content and easy to do, about SEO related content is discussed here. Although the semantic tag part of the content is also helpful to SEO, but it is very difficult to do in practice, if small two want to simple and easy to understand the method then fill this article.

Grow up together

In the confused city, there is always a partner to grow up together.

  • You can click on this if you want more people to see the articlegive a like.
  • If you want to inspire your mistress thereGithubGive aLittle stars.
  • If you want to communicate more with small two add wechatm353839115.

PushMeTop originally contributed to this article