Correct Robots.txt for WordPress in 2021. Several versions for different needs: simple basic and advanced – with elaboration for each search engine.
Robots.txt is considered one of the most important things when creating and optimizing a website for search engines. A small file containing the indexing rules for search robots.
If the file is configured incorrectly, then the site can be indexed incorrectly and lose large portions of traffic. A competent setting, on the contrary, allows you to improve SEO, and bring the resource to the tops.
Today we are going to talk about setting up Robots.txt for WordPress. I will show you the correct option, which I myself use for my projects.
What is Robots.txt
As I said, robots.txt is a text file that contains rules for search engines. The standard WordPress robots.txt looks like this:
User-agent: * Disallow: / wp-admin / Allow: /wp-admin/admin-ajax.php
This is how it is created by the Yoast SEO plugin. Some people think that this is enough for proper indexing. I believe that a more detailed study is needed. And if we are talking about non-standard projects, then the study is needed even more so. Let’s take a look at the main directives:
Directive | Value | Explanation |
User-agent: | Yandex, Googlebot, etc. | In this directive, you can specify which particular robot we are referring to. The values that I have specified are usually used. |
Disallow: | Relative link | Ban directive. Links specified in this directive will be ignored by search engines. |
Allow: | Relative link | Permissive directive. The links that are specified with it will be indexed. |
Sitemap: | Absolute link | This is where you link to the XML sitemap. If you do not specify this directive in the file, you will have to add the map manually (via Yandex.Webmaster or Search Console). |
Crawl-delay: | Time in seconds (example: 2.0 – 2 seconds) | Allows you to specify the timeout between visits to search robots. It is needed if these very robots create an additional load on the hosting. |
Clean-param: | Dynamic parameter | If the site has parameters like site.ru/statia?uid=32, where? Uid = 32 is a parameter, then using this directive you can hide them. |
In principle, nothing complicated. I will give additional explanations for the Clean-param directives (open the tab).More about Clean-param
Parameters are typically used on dynamic sites. They can transfer unnecessary information to search engines – create duplicates. To avoid this, we must specify the Clean-param directive in Robots.txt, indicating the parameter and the link to which this parameter is applied.
In our example, site.ru/statia?uid=32 – site.ru/statia is a link, and everything after the question mark is a parameter. Here it is uid = 32. It is dynamic, which means that the uid parameter can take on different values.
For example, uid = 33, uid = 34 … uid = 123434. In theory, there can be any number of them, so we must close all uid parameters from indexing. To do this, the directive should take the following form:
Clean-param: uid / statia # all uid parameters for statia will be closed Basic Robots.txt for WordPress There are so many different features out there, and one of them is creating the perfect Robots.txt. In fact, how perfect it is - I don't know, webmasters disagree. Someone prefers to make shorter versions of the robot, specifying the rules for all search engines at once. Others prescribe separate rules for each search engine (mainly for Yandex and Google). Which of this is correct - I cannot say for sure. However, I suggest you check out the basic Robots.txt for WordPress from Clearfy Pro. I edited it a little - indicated the Sitemap directive. Removed the Host directive. User-agent: * Disallow: / wp-admin Disallow: / wp-includes Disallow: / wp-content / plugins Disallow: / wp-content / cache Disallow: / wp-json / Disallow: /xmlrpc.php Disallow: /readme.html Disallow: / *? Disallow: /? S = Allow: /*.css Allow: /*.js Sitemap: https://site.ru/sitemap.xml I cannot say that this is the best option for blogging on VP. But anyway, it's better than what Yoast SEO offers us by default. Advanced Robots.txt for WordPress Now let's take a look at the enhanced version of Robots.txt for WordPress. You probably know that all WP sites have the same structure. Same names for folders, files, etc. allow specialists to identify the most acceptable variant of the robot. In this article, I want to introduce my own version of Robots.txt to you. I use it both for my sites and for client sites. You may have seen this option on other sites as well. he has some popularity. So the correct Robots.txt for WordPress looks like this: User-agent: * # For all search engines, except Yandex and Google Disallow: / cgi-bin Disallow: /? Disallow: / wp- Disallow: *? S = Disallow: * & s = Disallow: / search / Disallow: / author / Disallow: / users / Disallow: * / trackback Disallow: * / feed Disallow: * / rss Disallow: * / embed Disallow: /xmlrpc.php Disallow: * utm = Disallow: * openstat = Disallow: / tag / # Close tags Disallow: /readme.html # Close the useless WordPress installation manual (lies at the root) Disallow: *? Replytocom Allow: * / uploads User-agent: GoogleBot # For Google Disallow: / cgi-bin Disallow: /? Disallow: / wp- Disallow: *? S = Disallow: * & s = Disallow: / search / Disallow: / author / Disallow: / users / Disallow: * / trackback Disallow: * / feed Disallow: * / rss Disallow: * / embed Disallow: /xmlrpc.php Disallow: * utm = Disallow: * openstat = Disallow: / tag / # Close tags Disallow: /readme.html Disallow: *? Replytocom Allow: * / uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: Yandex # For Yandex Disallow: / cgi-bin Disallow: /? Disallow: / wp- Disallow: *? S = Disallow: * & s = Disallow: / search / Disallow: / author / Disallow: / users / Disallow: * / trackback Disallow: * / feed Disallow: * / rss Disallow: * / embed Disallow: /xmlrpc.php Disallow: / tag / # Close tags Disallow: /readme.html Disallow: *? Replytocom Allow: * / uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php Clean-Param: utm_source & utm_medium & utm_campaign Clean-Param: openstat Sitemap: https://site.com/sitemap_index.xml # Sitemap, change site.com to the desired address. Important: Previously, Robots.txt used the Host directive. She pointed to the main mirror of the site. This is now done using a redirect. Comments (text after #) can be deleted. I specify Sitemap with https protocol, because most sites now use a secure connection. If you don't have SSL, then change the protocol to http. Please note that I am closing the labels (tags). I do this because they create a lot of takes. This is bad for SEO, but if you want to open tags then remove the disallow: / tag / line from the file. Conclusion Basically, this is what the correct Robots.txt for WordPress looks like. Feel free to copy the data to a file and use it. Note that this option is only suitable for standard information sites. Other situations may require individual elaboration. That's all. Thanks for attention. I would be grateful if you enable bell notifications and subscribe to the mailing list. It will be cool here :). Video for dessert: The Farmer Wanted To Find Water, But What Happened Surprised The Whole World