The robots.txt file tells search engines how to crawl your site, making it the focus of search engine optimization efforts. In this article, I will show you how to create a perfect robots.txt file for SEO optimization.

What is a robots.txt file?

Robots.txt is a text file that website owners can create to tell search engines how to crawl and index pages on their site. It is usually stored in the root directory of a website, known as the home folder. The basic format of the robots.txt file is as follows:

User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
 
User-agent: [user-agent name]
Allow: [URL string to be crawled]
  
Sitemap: [URL of your XML Sitemap]
Copy the code

You can have multi-line instructions to allow or disallow specific urls and add multiple sitemaps. If you don’t ban a URL, then search engine bots think they’re allowed to crawl it. Here’s what a robots.txt sample file might look like:

User-Agent: *
Allow: /wp-content/uploads/
Disallow: /wp-content/plugins/
Disallow: /wp-admin/
 
Sitemap: https://example.com/sitemap_index.xml
Copy the code

In the robots.txt example above, we have allowed the search engine to crawl and index files in the WordPress Uploads folder. After that, we prohibited search engines from scraping Plugin and WordPress administration folders. Finally, we provide the URL for the XML sitemap.

Why do I need to provide the robots.txt file?

If you don’t have a robots.txt file, search engines will still crawl and index your site. However, you won’t be able to tell search engines which pages or folders they shouldn’t crawl. It doesn’t have much of an impact when you’re starting out, because new sites don’t have much content.

However, as sites grow and content increases, we need to have better control over how sites are crawled and indexed. Here’s why:

Search bots have a crawl quota for each site. This means that search engines only crawl a certain number of pages in a crawl session. If they don’t crawl all the pages on your site, they’ll come back in the next session to crawl again. This may slow down our site index rate.

Therefore, we can solve this problem by preventing search engines from fetching unnecessary pages, such as your WordPress administration page (WP-admin), plugins and Themes folders. By disabling unnecessary pages, we can save on crawl quotas. This helps search engines crawl more pages on our site and index them as quickly as possible.

Another reason to use a robots.txt file is when we want to prevent search engines from indexing a post or page on your site. While this isn’t the safest way to hide content from the public, it will help you prevent it from appearing in search results.

What would an ideal robots.txt file look like?

Many large websites use a very simple robots.txt file. Their content may vary, depending on the needs of the particular site.

User-agent: *
Disallow:
  
Sitemap: http://www.example.com/post-sitemap.xml
Sitemap: http://www.example.com/page-sitemap.xml
Copy the code

This robots.txt file allows all search engines to index all content and provides them with an XML sitemapmap linked to the site. For WordPress sites, I recommend using the following rules in the robots.txt file:

User-Agent: *
Allow: /wp-content/uploads/
Disallow: /wp-content/plugins/
Disallow: /wp-admin/
Disallow: /readme.html
Disallow: /refer/
 
Sitemap: http://www.example.com/post-sitemap.xml
Sitemap: http://www.example.com/page-sitemap.xml
Copy the code

This tells search engines to index all WordPress images and files. It does not allow search engines to index WordPress plug-in files, WordPress administration area, WordPress readme files, and affiliate Links. By adding a site map to your robots.txt file, you can make it easy for search engines to find all the pages on your site.

Now that we know what an ideal robots.txt file looks like, let’s take a look at how to create one in WordPress.

How to create a robots.txt file in WordPress?

Many WordPress SEO plug-ins support dynamic generation of robots.txt files, such as All in One SEO or Yoast SEO. In addition, we can manually create the robots.txt file directly through FTP.

How to test the robots.txt file?

Once we have created the robots.txt file, we need to test it using the robots.txt test tool. There are many robots.txt test tools on the market, the most recommended test tools are Google Search Console and Baidu resource Search platform.

Take the Google Search Console example. First, we need to link our website to the Google Search Console. We can then test using the Google Search Console Robots testing tool.

Just select the site we want to test from the drop down list. The tool will automatically retrieve your site’s robots.txt file and highlight any errors and warnings if found.

Some personal thoughts

The purpose of optimizing the robots.txt file is to prevent search engines from crawling private pages. For example, pages in the WP-Plugins folder of a website or pages in the WordPress administration folder.

A common misconception is that preventing crawling of WordPress categories, tags, and archiving pages will increase crawling rates and lead to faster indexing and higher ranking. This is actually wrong and violates Google’s webmaster guidelines.

I hope this article will help people who are new to WordPress understand how to optimize the WordPress robots.txt file.

To read more articles, please follow my public account: Undefined Variables