Better Robots.txt Rules for WordPress

♦ Posted by Jeff Starr in SEO, WordPress

Updated August 28, 2020 • 106 comments

[ Better Robots.txt Rules for WP ] Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file. I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt.

This post summarizes my research and gives you a near-perfect robots file, so you can copy/paste completely “as-is”, or use a template to give you a starting point for your own customization.

Robots.txt in 30 seconds

Primarily, robots directives disallow obedient spiders access to specified parts of your site. They can also explicitly “allow” access to specific files and directories. So basically they’re used to let Google, Bing et al know where they can go when visiting your site. You can also do nifty stuff like instruct specific user-agents and declare sitemaps. For just a simple text file, robots.txt wields considerable power. And we want to use whatever power we can get to our greatest advantage.

Better robots.txt for WordPress

Running WordPress, you want search engines to crawl and index your posts and pages, but not your core WP files and directories. You also want to make sure that feeds and trackbacks aren’t included in the search results. It’s also good practice to declare a sitemap. With that in mind, here are the new and improved robots.txt rules for WordPress:

User-agent: *
Disallow: /wp-admin/
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /feed/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://example.com/sitemap.xml

Only one small edit is required: change the Sitemap to match the location of your sitemap (or remove the line if no sitemap is available).

Important: As of version 5.5, WordPress automatically generates a sitemap for your site. For more information check out this in-depth tutorial on WP Sitemaps.

I use this exact code on nearly all of my major sites. It’s also fine to customize the rules, say if you need to exclude any custom directories and/or files, based on your actual site structure and SEO strategy.

Usage

To add the robots rules code to your WordPress-powered site, just copy/paste the code into a blank file named robots.txt. Then add the file to your web-accessible root directory, for example:

https://perishablepress.com/robots.txt

If you take a look at the contents of the robots.txt file for Perishable Press, you’ll notice an additional robots directive that forbids crawl access to the site’s blackhole for bad bots. Let’s have a look:

User-agent: *
Disallow: /wp-admin/
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /feed/
Disallow: /blackhole/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://perishablepress.com/wp-sitemap.xml

Spiders don’t need to be crawling around anything in /wp-admin/, so that’s disallowed. Likewise, trackbacks, xmlrpc, and feeds don’t need to be crawled, so we disallow those as well. Also, notice that we add an explicit Allow directive that allows access to the WordPress Ajax file, so crawlers and bots have access to any Ajax-generated content. Lastly, we make sure to declare the location of our sitemap, just to make it official.

Notes & Updates

Update! The following directives have been removed from the tried and true robots.txt rules in order to appease Google’s new requirements that googlebot always is allowed complete crawl access to any publicly available file.

Disallow: /wp-content/
Disallow: /wp-includes/

Because /wp-content/ and /wp-includes/ include some publicly accessible CSS and JavaScript files, it’s recommended to just allow googlebot complete access to both directories always. Otherwise you’ll be spending valuable time chasing structural and file name changes in WordPress, and trying to keep them synchronized with some elaborate set of robots rules. It’s just easier to allow open access to these directories. Thus the two directives above were removed permanently from robots.txt, and are not recommended in general.

Apparently Google is so hardcore about this new requirement¹ that they actually are penalizing sites (a LOT) for non-compliance². Bad news for hundreds of thousands of site owners who have better things to do than keep up with Google’s constant, often arbitrary changes.

¹ Google demands complete access to all publicly accessible files.
² Note that it may be acceptable to disallow bot access to /wp-content/ and /wp-includes/ for other (non-Google) bots. Do your research though, before making any assumptions.

Previously on robots.txt..

As mentioned, my previous robots.txt file went unchanged for several years (which just vanished in the blink of an eye). The previous rules proved quite effective, especially with compliant spiders like googlebot. Unfortunately, it contains language that only a few of the bigger search engines understand (and thus obey). Consider the following robots rules, which were used here at Perishable Press way back in the day.

Important! Please do not use the following rules on any live site. They are for reference and learning purposes only. For live sites, use the Better robots.txt rules, provided in the previous section.

User-agent: *
Disallow: /mint/
Disallow: /labs/
Disallow: /*/wp-*
Disallow: /*/feed/*
Disallow: /*/*?s=*
Disallow: /*/*.js$
Disallow: /*/*.inc$
Disallow: /transfer/
Disallow: /*/cgi-bin/*
Disallow: /*/blackhole/*
Disallow: /*/trackback/*
Disallow: /*/xmlrpc.php
Allow: /*/20*/wp-*
Allow: /press/feed/$
Allow: /press/tag/feed/$
Allow: /*/wp-content/online/*
Sitemap: https://perishablepress.com/sitemap.xml

User-agent: ia_archiver
Disallow: /

Apparently, the wildcard character isn’t recognized by lesser bots, and I’m thinking that the end-pattern symbol (dollar sign $) is probably not well-supported either, although Google certainly gets it.

These patterns may be better supported in the future, but going forward there is no reason to include them. As seen in the “better robots” rules (above), the same pattern-matching is possible without using wildcards and dollar signs, enabling all compliant bots to understand your crawl preferences.

Learn more..

Check out the following recommended sources to learn more about robots.txt, SEO, and more:

google robots

About the Author

Jeff Starr = Web Developer. Book Author. Secretly Important.

106 responses to “Better Robots.txt Rules for WordPress”

Jithin Johny George 2012/12/03 10:56 pm

is there any wordpress plugin firee/premium to generate a better robots.txt file

Jeff Starr 2012/12/04 1:05 pm • Post Author

I’ve not heard of one, but it’s a great idea. Hmmm… ;)

soquinn 2012/12/04 4:56 pm

hi, if the WP install is not in the root (lets say it’s in /newsite/ like in http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory ) where does the robots.txt go (root or install folder) and what are the paths to be:

Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/

Disallow: newsite/wp-admin/
Disallow: newsite/wp-includes/
Disallow: newsite/wp-content/plugins/

Jeff Starr 2012/12/05 3:02 pm • Post Author

The robots.txt file should always be located in the web-accessible root directory, never a subdirectory. Then for the actual rules, either of your examples should work, and you can test them in Google’s Webmaster tools. The difference between the two examples here is that the first set will “disallow” any URL that contains /wp-admin/ and so on. The second set of rules will only disallow URLs that contain newsite/wp-admin/ et al, which is more specific. Also, I think the rules in your second example should include an initial slash, like so:

Disallow: /newsite/wp-admin/
Disallow: /newsite/wp-includes/
Disallow: /newsite/wp-content/plugins/
- soquinn 2012/12/05 5:05 pm
  
  Thanks Jeff. Still a bit confused… so when wordpress is installed in a subdirectory (i did this basically: http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory#Moving_a_Root_install_to_its_own_directory) but runs the root the robots.txt file (in the root) should be:
  
  Disallow: /newsite/wp-admin/
  
  – point to where the physical files/folders are
  
  or
  
  Disallow: /wp-admin/
  
  – point to virtual root because the index.php tells it to go to /newsite/ anyway?
  
  webmaster tools just shows you what’s written in the robots.txt
- Jeff Starr 2012/12/05 6:35 pm • Post Author
  
  Hmm.. it looks like Google removed their Robots.txt analyzer, which is unfortunate because it was a super-useful tool. There should be something similar online, but I haven’t checked.
  
  For the two different Disallow rules, the rules both work because each path is treated as a regular expression, which means that googlebot will check your site URLs for the presence of either /newsite/wp-admin/ or /wp-admin/ (whichever is used). So for example if you use this:
  
  Disallow: /newsite/wp-admin/
  
  Googlebot will ignore the following URLs:
  
  http://example.com/newsite/wp-admin/some-file.php
  http://example.com/newsite/wp-admin/some-other-file.php
  http://example.com/subfolder/newsite/wp-admin/some-file.php
  http://example.com/subfolder/newsite/wp-admin/some-other-file.php
  http://example.com/subfolder/subfolder/newsite/wp-admin/some-file.php
  http://example.com/subfolder/subfolder/newsite/wp-admin/some-other-file.php
  
  ..and so forth. BUT /newsite/wp-admin/ won’t match these URLs:
  
  http://example.com/wp-admin/some-file.php
  http://example.com/subfolder/wp-admin/some-other-file.php
  
  So when you use Disallow: /wp-admin/, you’re basically blocking any URL that includes /wp-admin/, which is less restrictive than if you use Disallow: /newsite/wp-admin/. So either of these example robots directive will work fine, but it’s easier to just roll with /wp-admin/ because it makes the code more portable, and covers any other installations of WP that might be added in the future.

soquinn 2012/12/06 10:12 am

Thaks Jeff…

Pammi 2012/12/20 6:44 pm

What about the following (and the rest of your txt examples), when WP is installed inside a directory do they also need the directory name added before them?

Disallow: /tag/
Disallow: /category/categories
Disallow: /author/
Disallow: /feed/
Disallow: /trackback/
Disallow: /print/
Disallow: /2001/

Disallow: /wp/tag/
Disallow: /wp/category/categories
Disallow: /wp/author/
Disallow: /wp/feed/
Disallow: /wp/trackback/
Disallow: /wp/print/
Disallow: /wp/2001/

Pammi 2012/12/21 2:30 pm

I assumed everything to do with wordpress needs to have the installation directory before hand so I went with /wp before everything, hope that’s right. Is there anyway to check the robots.txt for errors besides google webmaster tools?
- Jeff Starr 2012/12/25 11:15 pm • Post Author
  
  Actually, see previous comment — there’s no need to add the /wp, but it’s also fine if you do add it. Either way.
  
  To check for robots errors, the robots.txt sites have general validation tools for syntax, etc. At the moment there is no way to check for crawler-specific errors, not even in Google Webmaster Tools (they took it down).
Jeff Starr 2012/12/25 11:12 pm • Post Author

No need to add /wp/, unless you’re running multiple sites and for some reason want to allow crawling of same-name directories in other WP installs. The reason why you don’t have to add the /wp/ is because robots rules are treated similar to regex expressions, where the disallow pattern is checked against each URL. So if you write this for example:

Disallow: /category/categories

..compliant search engines will match any URL that contains the string, “/category/categories”, such as:

http://example.com/wp/category/categories
http://example.com/wp/category/categories/whatever
http://example.com/something/category/categories/whatever
http://example.com/something/else/category/categories/whatever

I hope that helps!

sadek 2013/01/14 10:12 am

I installed WP Robots Txt Plugin in my website and it rocks. Added custom instructions in Robots.txt file.
Thank you very very much for your help.
Best Regards

Suzanne 2013/01/24 12:41 pm

I’ll preface my question with….I’m not a techie at all, so any help you can give would be appreciated.

I noticed you have the following on the robots.txt file:

Disallow: /blackhole/
Disallow: /transfer/
Disallow: /tweets/
Disallow: /mint/

We don’t have these as directories on the system. Should we still have them on the robots.txt file, just in case? Any guidance would be appreciated! Thanks

Jeff Starr 2013/01/24 4:04 pm • Post Author

Excellent question. Those rules won’t hurt anything if the directories don’t exist, but I recommend removing them if not needed.
- Suzanne 2013/01/24 4:19 pm
  
  Thanks so much Jeff for the help! It’s very much appreciated. :)

Pradip 2013/02/24 1:32 am

Hi Thanks for the explanation. I would like to know whether I should block pages, categories and archives for better ranking?

Jeff Starr 2013/02/24 1:03 pm • Post Author

If you don’t need those resources, I suggest using .htaccess to redirect to the home page.

cowboy Mike 2013/03/11 10:47 am

Hi Jeff,
I am a little confused w/ regard to the robots txt file and seo.

For example you have Disallow: /wp-content/

This seems like it would prevent for example google from indexing blog post images, gallery images and so on.

I thought, may be incorrectly, that having google index a site’s images was good for seo.

Your thoughts?

Happy trails, Mike

Jeff Starr 2013/03/11 1:09 pm • Post Author

Good point. I actually keep my images in a folder named /online/, and then allow for crawling/indexing with the following line:

Allow: /wp-content/online/

So if you’re using /uploads/, you would replace that with this:

Allow: /wp-content/uploads/

cowboy Mike 2013/03/11 1:46 pm

Thank you Jeff. Would you then have the following in the robots text?

Disallow: /wp-content/
Allow: /wp-content/uploads/

Does the Allow: /wp-content/uploads/over ride the Disallow: /wp-content/ ?

Happy trails, Mike Foate

Jeff Starr 2013/03/11 1:49 pm • Post Author

Yes that is correct, the more specific Allow directive will override the Disallow directive, so the bot will be able to crawl and index the uploads content :)

Stephen Malan 2013/03/18 2:32 pm

Here is what we have…..not very good with setting up robots.txt files….what changes would you make if any?

Thanks

# User-agent: *
# Disallow: /wp-admin/
# Disallow: /wp-includes/
# Disallow: /wp-trackback
# Disallow: /wp-feed
# Disallow: /wp-comments
# Disallow: /wp-content/plugins
# Disallow: /wp-content/themes
# Disallow: /wp-login.php
# Disallow: /wp-register.php
# Disallow: /feed
# Disallow: /trackback
# Disallow: /cgi-bin
# Disallow: /comments
# Disallow: *?s=
Sitemap: http://www.attractionmarketingdirect.com/sitemap.xml
Sitemap: http://www.attractionmarketingdirect.com/sitemap.xml

Jeff Starr 2013/03/19 11:46 am • Post Author

Hi Stephen, I recommend removing the pound signs “#” from the beginning of each line, and also remove one of the “Sitemap” directives, as duplicates may cause issues.

cowboy Mike 2013/03/19 12:24 pm

Howdy,
In Stephens robots.txt file what purpose does the following directive serve?

Disallow: *?s=

I googled it and cant find anything on it.

Happy trails, Mike

Jeff Starr 2013/03/19 12:31 pm • Post Author

That directive is to prevent compliant bots from crawling any search results, which use the “?s=” in the requesting URLs. :-)

prashant 2013/04/03 11:28 pm

this is my robots.txt file,when i update last 10 days before some search result are gone from Google search result especially tag result. i think this is not correct robots.txt file of my site

Sitemap: http://www.youthfundoo.com/sitemap.xml
Sitemap: http://www.youthfundoo.com/sitemap-image.xml

User-agent: Mediapartners-Google
Disallow:

User-Agent: *
Allow: /

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /blog/wp-admin/
Disallow: /blog/wp-includes/
Disallow: /blog/wp-content/plugins/
Disallow: /blog/wp-content/themes/
Disallow: /blog/wp-content/upgrade/
Disallow: /blog/page/
disallow: /blog/*?*
Disallow: /blog/comments/feed/
Disallow: /blog/tag
Disallow: /blog/author
Disallow: /blog/trackback
Disallow: /blog/*trackback
Disallow: /blog/*trackback*
Disallow: /blog/*/trackback
Disallow: /blog/*.html/$
Disallow: /blog/feed/
Disallow: /blog/xmlrpc.php
Disallow: /blog/?s=*
Disallow: *?wptheme
Disallow: ?comments=*
Disallow: /blog/?p=*
Disallow: /blog/search?
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /page/
Disallow: /*?*
Disallow: /comments/feed/
Disallow: /tag
Disallow: /author
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*.html/$
Disallow: /feed/
Disallow: /xmlrpc.php
Disallow: /?s=*
Disallow: /?p=*
Disallow: /search?

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Jeff Starr 2013/04/04 11:43 am • Post Author

Hey prashant, yes those robots.txt rules could be simplified. Currently there is a lot of redundancy.. such as with the trackback rules, really you only need one, such as /trackback. Until you get it sorted I would remove all of these rules and just add something simple yet effective, such as the rules presented in this article. That will give you time to sort things out while allowing google et al to crawl your content.

Comments are closed for this post. Something to add? Let me know.

Jithin Johny George 2012/12/03 10:56 pm

is there any wordpress plugin firee/premium to generate a better robots.txt file
- Jeff Starr 2012/12/04 1:05 pm • Post Author
  
  I’ve not heard of one, but it’s a great idea. Hmmm… ;)
soquinn 2012/12/04 4:56 pm

hi, if the WP install is not in the root (lets say it’s in /newsite/ like in http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory ) where does the robots.txt go (root or install folder) and what are the paths to be:

Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/

OR

Disallow: newsite/wp-admin/
Disallow: newsite/wp-includes/
Disallow: newsite/wp-content/plugins/
- Jeff Starr 2012/12/05 3:02 pm • Post Author
  
  The robots.txt file should always be located in the web-accessible root directory, never a subdirectory. Then for the actual rules, either of your examples should work, and you can test them in Google’s Webmaster tools. The difference between the two examples here is that the first set will “disallow” any URL that contains /wp-admin/ and so on. The second set of rules will only disallow URLs that contain newsite/wp-admin/ et al, which is more specific. Also, I think the rules in your second example should include an initial slash, like so:
  
  Disallow: /newsite/wp-admin/
  Disallow: /newsite/wp-includes/
  Disallow: /newsite/wp-content/plugins/
  - soquinn 2012/12/05 5:05 pm
    
    Thanks Jeff. Still a bit confused… so when wordpress is installed in a subdirectory (i did this basically: http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory#Moving_a_Root_install_to_its_own_directory) but runs the root the robots.txt file (in the root) should be:
    
    Disallow: /newsite/wp-admin/
    
    – point to where the physical files/folders are
    
    or
    
    Disallow: /wp-admin/
    
    – point to virtual root because the index.php tells it to go to /newsite/ anyway?
    
    webmaster tools just shows you what’s written in the robots.txt
  - Jeff Starr 2012/12/05 6:35 pm • Post Author
    
    Hmm.. it looks like Google removed their Robots.txt analyzer, which is unfortunate because it was a super-useful tool. There should be something similar online, but I haven’t checked.
    
    For the two different Disallow rules, the rules both work because each path is treated as a regular expression, which means that googlebot will check your site URLs for the presence of either /newsite/wp-admin/ or /wp-admin/ (whichever is used). So for example if you use this:
    
    Disallow: /newsite/wp-admin/
    
    Googlebot will ignore the following URLs:
    
    http://example.com/newsite/wp-admin/some-file.php
    http://example.com/newsite/wp-admin/some-other-file.php
    http://example.com/subfolder/newsite/wp-admin/some-file.php
    http://example.com/subfolder/newsite/wp-admin/some-other-file.php
    http://example.com/subfolder/subfolder/newsite/wp-admin/some-file.php
    http://example.com/subfolder/subfolder/newsite/wp-admin/some-other-file.php
    
    ..and so forth. BUT /newsite/wp-admin/ won’t match these URLs:
    
    http://example.com/wp-admin/some-file.php
    http://example.com/subfolder/wp-admin/some-other-file.php
    
    So when you use Disallow: /wp-admin/, you’re basically blocking any URL that includes /wp-admin/, which is less restrictive than if you use Disallow: /newsite/wp-admin/. So either of these example robots directive will work fine, but it’s easier to just roll with /wp-admin/ because it makes the code more portable, and covers any other installations of WP that might be added in the future.
soquinn 2012/12/06 10:12 am

Thaks Jeff…
Pammi 2012/12/20 6:44 pm

What about the following (and the rest of your txt examples), when WP is installed inside a directory do they also need the directory name added before them?

Disallow: /tag/
Disallow: /category/categories
Disallow: /author/
Disallow: /feed/
Disallow: /trackback/
Disallow: /print/
Disallow: /2001/

TO

Disallow: /wp/tag/
Disallow: /wp/category/categories
Disallow: /wp/author/
Disallow: /wp/feed/
Disallow: /wp/trackback/
Disallow: /wp/print/
Disallow: /wp/2001/
- Pammi 2012/12/21 2:30 pm
  
  I assumed everything to do with wordpress needs to have the installation directory before hand so I went with /wp before everything, hope that’s right. Is there anyway to check the robots.txt for errors besides google webmaster tools?
  - Jeff Starr 2012/12/25 11:15 pm • Post Author
    
    Actually, see previous comment — there’s no need to add the /wp, but it’s also fine if you do add it. Either way.
    
    To check for robots errors, the robots.txt sites have general validation tools for syntax, etc. At the moment there is no way to check for crawler-specific errors, not even in Google Webmaster Tools (they took it down).
- Jeff Starr 2012/12/25 11:12 pm • Post Author
  
  No need to add /wp/, unless you’re running multiple sites and for some reason want to allow crawling of same-name directories in other WP installs. The reason why you don’t have to add the /wp/ is because robots rules are treated similar to regex expressions, where the disallow pattern is checked against each URL. So if you write this for example:
  
  Disallow: /category/categories
  
  ..compliant search engines will match any URL that contains the string, “/category/categories”, such as:
  
  http://example.com/wp/category/categories
  http://example.com/wp/category/categories/whatever
  http://example.com/something/category/categories/whatever
  http://example.com/something/else/category/categories/whatever
  
  I hope that helps!
sadek 2013/01/14 10:12 am

I installed WP Robots Txt Plugin in my website and it rocks. Added custom instructions in Robots.txt file.
Thank you very very much for your help.
Best Regards
Suzanne 2013/01/24 12:41 pm

I’ll preface my question with….I’m not a techie at all, so any help you can give would be appreciated.

I noticed you have the following on the robots.txt file:

Disallow: /blackhole/
Disallow: /transfer/
Disallow: /tweets/
Disallow: /mint/

We don’t have these as directories on the system. Should we still have them on the robots.txt file, just in case? Any guidance would be appreciated! Thanks
- Jeff Starr 2013/01/24 4:04 pm • Post Author
  
  Excellent question. Those rules won’t hurt anything if the directories don’t exist, but I recommend removing them if not needed.
  - Suzanne 2013/01/24 4:19 pm
    
    Thanks so much Jeff for the help! It’s very much appreciated. :)
Pradip 2013/02/24 1:32 am

Hi Thanks for the explanation. I would like to know whether I should block pages, categories and archives for better ranking?
- Jeff Starr 2013/02/24 1:03 pm • Post Author
  
  If you don’t need those resources, I suggest using .htaccess to redirect to the home page.
cowboy Mike 2013/03/11 10:47 am

Hi Jeff,
I am a little confused w/ regard to the robots txt file and seo.

For example you have Disallow: /wp-content/

This seems like it would prevent for example google from indexing blog post images, gallery images and so on.

I thought, may be incorrectly, that having google index a site’s images was good for seo.

Your thoughts?

Happy trails, Mike
- Jeff Starr 2013/03/11 1:09 pm • Post Author
  
  Good point. I actually keep my images in a folder named /online/, and then allow for crawling/indexing with the following line:
  
  Allow: /wp-content/online/
  
  So if you’re using /uploads/, you would replace that with this:
  
  Allow: /wp-content/uploads/
cowboy Mike 2013/03/11 1:46 pm

Thank you Jeff. Would you then have the following in the robots text?

Disallow: /wp-content/
Allow: /wp-content/uploads/

Does the Allow: /wp-content/uploads/over ride the Disallow: /wp-content/ ?

Happy trails, Mike Foate
- Jeff Starr 2013/03/11 1:49 pm • Post Author
  
  Yes that is correct, the more specific Allow directive will override the Disallow directive, so the bot will be able to crawl and index the uploads content :)
Stephen Malan 2013/03/18 2:32 pm

Here is what we have…..not very good with setting up robots.txt files….what changes would you make if any?

Thanks

# User-agent: *
# Disallow: /wp-admin/
# Disallow: /wp-includes/
# Disallow: /wp-trackback
# Disallow: /wp-feed
# Disallow: /wp-comments
# Disallow: /wp-content/plugins
# Disallow: /wp-content/themes
# Disallow: /wp-login.php
# Disallow: /wp-register.php
# Disallow: /feed
# Disallow: /trackback
# Disallow: /cgi-bin
# Disallow: /comments
# Disallow: *?s=
Sitemap: http://www.attractionmarketingdirect.com/sitemap.xml
Sitemap: http://www.attractionmarketingdirect.com/sitemap.xml
- Jeff Starr 2013/03/19 11:46 am • Post Author
  
  Hi Stephen, I recommend removing the pound signs “#” from the beginning of each line, and also remove one of the “Sitemap” directives, as duplicates may cause issues.
cowboy Mike 2013/03/19 12:24 pm

Howdy,
In Stephens robots.txt file what purpose does the following directive serve?

Disallow: *?s=

I googled it and cant find anything on it.

Happy trails, Mike
- Jeff Starr 2013/03/19 12:31 pm • Post Author
  
  That directive is to prevent compliant bots from crawling any search results, which use the “?s=” in the requesting URLs. :-)
prashant 2013/04/03 11:28 pm

this is my robots.txt file,when i update last 10 days before some search result are gone from Google search result especially tag result. i think this is not correct robots.txt file of my site

Sitemap: http://www.youthfundoo.com/sitemap.xml
Sitemap: http://www.youthfundoo.com/sitemap-image.xml

User-agent: Mediapartners-Google
Disallow:

User-Agent: *
Allow: /

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /blog/wp-admin/
Disallow: /blog/wp-includes/
Disallow: /blog/wp-content/plugins/
Disallow: /blog/wp-content/themes/
Disallow: /blog/wp-content/upgrade/
Disallow: /blog/page/
disallow: /blog/*?*
Disallow: /blog/comments/feed/
Disallow: /blog/tag
Disallow: /blog/author
Disallow: /blog/trackback
Disallow: /blog/*trackback
Disallow: /blog/*trackback*
Disallow: /blog/*/trackback
Disallow: /blog/*.html/$
Disallow: /blog/feed/
Disallow: /blog/xmlrpc.php
Disallow: /blog/?s=*
Disallow: *?wptheme
Disallow: ?comments=*
Disallow: /blog/?p=*
Disallow: /blog/search?
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /page/
Disallow: /*?*
Disallow: /comments/feed/
Disallow: /tag
Disallow: /author
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*.html/$
Disallow: /feed/
Disallow: /xmlrpc.php
Disallow: /?s=*
Disallow: /?p=*
Disallow: /search?

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /
- Jeff Starr 2013/04/04 11:43 am • Post Author
  
  Hey prashant, yes those robots.txt rules could be simplified. Currently there is a lot of redundancy.. such as with the trackback rules, really you only need one, such as /trackback. Until you get it sorted I would remove all of these rules and just add something simple yet effective, such as the rules presented in this article. That will give you time to sort things out while allowing google et al to crawl your content.

« Previous Comments • 12345 • Newer Comments »