Plugin Sale! Save 15% on pro plugins with discount code: NEWYEAR2021
Web Dev + WordPress + Security

Robots Notes Plus

About the Robots Exclusion Standard:

The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

Notes on the robots.txt Rules:

Rules of specificity apply, not inheritance. Always include a blank line between rules. Note also that not all robots obey the robots rules — even Google has been reported to ignore certain robots rules. Also, comments are allowed (and recommended) within any robots.txt file when written on a per-line basis. Simply begin each line of comments with a pound sign “#”.

Prevent Robots from Indexing the Entire Site:

User-agent: *
Disallow: /

Prevent a Specific Robot from Indexing the Entire Site:

User-agent: Googlebot-Image
Disallow: /

Prevent all Robots from Indexing Specific Pages/Directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.html

A Specific Example:

In this example, no robots are allowed to index anything except for Google, which is allowed to index everything except the specified pages/directories. Note the required blank line between the rules.

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/

Another Specific Example:

In this example, no agents are allowed to index anything except for Alexa, which is allowed to index anything. Note that there is a blank space after the colon, which enables this rule to work.

User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow: 

Prevent all Agents Except for Google:

Here is Google’s preferred way to disallow all agents anything except Google, which is allowed everything. Note that “Allow” is not a standard parameter and therefore is not recommended.

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

Notes on the “meta robots” Tag:

Certain robots rules may also be included in the head section of a web document. Examine the following examples:

<meta name="robots" content="noindex,nofollow,noarchive" />
<meta name="robots" content="noindex,nofollow" />
<meta name="googlebot" content="none" />
<meta name="alexa" content="all" />

Here is a general list of values available for the “content” attribute of the “meta robots” tag:

  • noindex, index — Determines indexing of site/pages.
  • nofollow, follow — Determines following of links.
  • nosnippet — Do not display excerpts or cached content.
  • noarchive — Do not display or collect cached content.

Additionally, Altavista supports:

  • noimageindex — Index text but not images.
  • noimageclick — Link to pages but not images.

Jeff Starr
About the Author
Jeff Starr = Designer. Developer. Producer. Writer. Editor. Etc.
WP Themes In Depth: Build and sell awesome WordPress themes.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Blackhole Pro: Trap bad bots in a virtual black hole.
Thoughts
Simply Static is my go-to plugin for generating static HTML versions of WordPress sites. Works flawlessly.
Note to self: never, ever, ever buy any CD or DVD from eBay. Every single time the discs are scratched, damaged, missing, fake, or worse. Never again you clowns.
Find out if a plugin works with the latest version of WordPress @ plugintests.com
Going through all of my data, deleting all the chaff. Going for less than 500 GB total data storage.
Finally deleted all the cool unused placeholder Twitter accounts that I signed up for years ago. I will never use them.
After several years with Dashlane, I've moved on to a simpler, better solution.
After 10+ years, finally moved the last of my sites away from Media Temple.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.