Save 25% on our pro plugins with coupon code: SPRING2023
Web Dev + WordPress + Security

Robots Notes Plus

About the Robots Exclusion Standard:

The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

Notes on the robots.txt Rules:

Rules of specificity apply, not inheritance. Always include a blank line between rules. Note also that not all robots obey the robots rules — even Google has been reported to ignore certain robots rules. Also, comments are allowed (and recommended) within any robots.txt file when written on a per-line basis. Simply begin each line of comments with a pound sign “#”.

Prevent Robots from Indexing the Entire Site:

User-agent: *
Disallow: /

Prevent a Specific Robot from Indexing the Entire Site:

User-agent: Googlebot-Image
Disallow: /

Prevent all Robots from Indexing Specific Pages/Directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.html

A Specific Example:

In this example, no robots are allowed to index anything except for Google, which is allowed to index everything except the specified pages/directories. Note the required blank line between the rules.

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/

Another Specific Example:

In this example, no agents are allowed to index anything except for Alexa, which is allowed to index anything. Note that there is a blank space after the colon, which enables this rule to work.

User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow: 

Prevent all Agents Except for Google:

Here is Google’s preferred way to disallow all agents anything except Google, which is allowed everything. Note that “Allow” is not a standard parameter and therefore is not recommended.

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

Notes on the “meta robots” Tag:

Certain robots rules may also be included in the head section of a web document. Examine the following examples:

<meta name="robots" content="noindex,nofollow,noarchive" />
<meta name="robots" content="noindex,nofollow" />
<meta name="googlebot" content="none" />
<meta name="alexa" content="all" />

Here is a general list of values available for the “content” attribute of the “meta robots” tag:

  • noindex, index — Determines indexing of site/pages.
  • nofollow, follow — Determines following of links.
  • nosnippet — Do not display excerpts or cached content.
  • noarchive — Do not display or collect cached content.

Additionally, Altavista supports:

  • noimageindex — Index text but not images.
  • noimageclick — Link to pages but not images.

Jeff Starr
About the Author
Jeff Starr = Web Developer. Security Specialist. WordPress Buff.
USP Pro: Unlimited front-end forms for user-submitted posts and more.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Blackhole Pro: Trap bad bots in a virtual black hole.
Thoughts
There is no end to what humans can achieve when they work together.
Excellent (and free) tool to test your site's SSL configuration.
Plugin updates! All of our free and pro plugins ready for WordPress 6.2.
Daylight savings is a complete waste of time and needs to be eliminated.
Got a couple of snow days here in mid-March. Fortunately it's not sticking.
I handle all email in real time as it comes in, perpetually clear inbox for years now.
Added some nice features to Wutsearch search engine launchpad. Now 21 engines!
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.