Correct Robots.txt for WordPress 2021 – Detailed Customization 🔥

💻 Development

Correct Robots.txt for WordPress in 2021. Several versions for different needs: simple basic and advanced – with elaboration for each search engine.

Robots.txt is considered one of the most important things when creating and optimizing a website for search engines. A small file containing the indexing rules for search robots.

If the file is configured incorrectly, then the site can be indexed incorrectly and lose large portions of traffic. A competent setting, on the contrary, allows you to improve SEO, and bring the resource to the tops.

Today we are going to talk about setting up Robots.txt for WordPress. I will show you the correct option, which I myself use for my projects.

What is Robots.txt

As I said, robots.txt is a text file that contains rules for search engines. The standard WordPress robots.txt looks like this:

User-agent: *
Disallow: / wp-admin /
Allow: /wp-admin/admin-ajax.php

This is how it is created by the Yoast SEO plugin. Some people think that this is enough for proper indexing. I believe that a more detailed study is needed. And if we are talking about non-standard projects, then the study is needed even more so. Let’s take a look at the main directives:

DirectiveValueExplanation
User-agent:Yandex, Googlebot, etc.In this directive, you can specify which particular robot we are referring to. The values ​​that I have specified are usually used.
Disallow:Relative linkBan directive. Links specified in this directive will be ignored by search engines.
Allow:Relative linkPermissive directive. The links that are specified with it will be indexed.
Sitemap:Absolute linkThis is where you link to the XML sitemap. If you do not specify this directive in the file, you will have to add the map manually (via Yandex.Webmaster or Search Console).
Crawl-delay:Time in seconds (example: 2.0 – 2 seconds)Allows you to specify the timeout between visits to search robots. It is needed if these very robots create an additional load on the hosting.
Clean-param:Dynamic parameterIf the site has parameters like site.ru/statia?uid=32, where? Uid = 32 is a parameter, then using this directive you can hide them.

In principle, nothing complicated. I will give additional explanations for the Clean-param directives (open the tab).More about Clean-param

Parameters are typically used on dynamic sites. They can transfer unnecessary information to search engines – create duplicates. To avoid this, we must specify the Clean-param directive in Robots.txt, indicating the parameter and the link to which this parameter is applied.

In our example, site.ru/statia?uid=32 – site.ru/statia is a link, and everything after the question mark is a parameter. Here it is uid = 32. It is dynamic, which means that the uid parameter can take on different values.

For example, uid = 33, uid = 34 … uid = 123434. In theory, there can be any number of them, so we must close all uid parameters from indexing. To do this, the directive should take the following form:

Clean-param: uid / statia # all uid parameters for statia will be closed

Basic Robots.txt for WordPress
There are so many different features out there, and one of them is creating the perfect Robots.txt. 
In fact, how perfect it is - I don't know, webmasters disagree.
Someone prefers to make shorter versions of the robot, specifying the rules for all search engines at once. 
Others prescribe separate rules for each search engine (mainly for Yandex and Google).
Which of this is correct - I cannot say for sure. 
However, I suggest you check out the basic Robots.txt for WordPress from Clearfy Pro. 
I edited it a little - indicated the Sitemap directive. 
Removed the Host directive.
User-agent: *

 Disallow: / wp-admin

 Disallow: / wp-includes

 Disallow: / wp-content / plugins

 Disallow: / wp-content / cache

 Disallow: / wp-json /

 Disallow: /xmlrpc.php

 Disallow: /readme.html

 Disallow: / *?

 Disallow: /? S =

 Allow: /*.css

 Allow: /*.js

 Sitemap: https://site.ru/sitemap.xml
I cannot say that this is the best option for blogging on VP. 
But anyway, it's better than what Yoast SEO offers us by default.
Advanced Robots.txt for WordPress
Now let's take a look at the enhanced version of Robots.txt for WordPress. 
You probably know that all WP sites have the same structure. 
Same names for folders, files, etc. 
allow specialists to identify the most acceptable variant of the robot.
In this article, I want to introduce my own version of Robots.txt to you. 
I use it both for my sites and for client sites. 
You may have seen this option on other sites as well. 
he has some popularity.
So the correct Robots.txt for WordPress looks like this:
User-agent: * # For all search engines, except Yandex and Google   

 Disallow: / cgi-bin          

 Disallow: /?                

 Disallow: / wp-              

 Disallow: *? S =              

 Disallow: * & s =             

 Disallow: / search /  

 Disallow: / author /

 Disallow: / users /      

 Disallow: * / trackback                     

 Disallow: * / feed            

 Disallow: * / rss             

 Disallow: * / embed          

 Disallow: /xmlrpc.php      

 Disallow: * utm =            

 Disallow: * openstat =  

 Disallow: / tag / # Close tags

 Disallow: /readme.html # Close the useless WordPress installation manual (lies at the root)

 Disallow: *? Replytocom

 Allow: * / uploads   

 
 User-agent: GoogleBot # For Google

 Disallow: / cgi-bin

 Disallow: /?

 Disallow: / wp-

 Disallow: *? S =

 Disallow: * & s =

 Disallow: / search /

 Disallow: / author /

 Disallow: / users /

 Disallow: * / trackback

 Disallow: * / feed

 Disallow: * / rss

 Disallow: * / embed

 Disallow: /xmlrpc.php

 Disallow: * utm =

 Disallow: * openstat =

 Disallow: / tag / # Close tags

 Disallow: /readme.html

 Disallow: *? Replytocom

 Allow: * / uploads

 Allow: /*/*.js           

 Allow: /*/*.css           

 Allow: /wp-*.png          

 Allow: /wp-*.jpg

 Allow: /wp-*.jpeg

 Allow: /wp-*.gif

 Allow: /wp-admin/admin-ajax.php

 
 User-agent: Yandex # For Yandex

 Disallow: / cgi-bin

 Disallow: /?

 Disallow: / wp-

 Disallow: *? S =

 Disallow: * & s =

 Disallow: / search /

 Disallow: / author /

 Disallow: / users /

 Disallow: * / trackback

 Disallow: * / feed

 Disallow: * / rss

 Disallow: * / embed

 Disallow: /xmlrpc.php

 Disallow: / tag / # Close tags

 Disallow: /readme.html

 Disallow: *? Replytocom

 Allow: * / uploads

 Allow: /*/*.js

 Allow: /*/*.css

 Allow: /wp-*.png

 Allow: /wp-*.jpg

 Allow: /wp-*.jpeg

 Allow: /wp-*.gif

 Allow: /wp-admin/admin-ajax.php

 Clean-Param: utm_source & utm_medium & utm_campaign                          

 Clean-Param: openstat

 
Sitemap: https://site.com/sitemap_index.xml # Sitemap, change site.com to the desired address.
Important:
Previously, Robots.txt used the Host directive. 
She pointed to the main mirror of the site. 
This is now done using a redirect. 
Comments (text after #) can be deleted. 
I specify Sitemap with https protocol, because 
most sites now use a secure connection. 
If you don't have SSL, then change the protocol to http.
Please note that I am closing the labels (tags). 
I do this because they create a lot of takes. 
This is bad for SEO, but if you want to open tags then remove the disallow: / tag / line from the file.
Conclusion
Basically, this is what the correct Robots.txt for WordPress looks like. 
Feel free to copy the data to a file and use it. 
Note that this option is only suitable for standard information sites.
Other situations may require individual elaboration. 
That's all. 
Thanks for attention. 
I would be grateful if you enable bell notifications and subscribe to the mailing list. 
It will be cool here :).
Video for dessert: The Farmer Wanted To Find Water, But What Happened Surprised The Whole World

Rate article
( 1 assessment, average 5 from 5 )
FLOOP
Add a comment