Welcome to the new design! Please report any bugs or issues, thanks :)
Web Dev + WordPress + Security

Robots Notes Plus

About the Robots Exclusion Standard:

The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

Notes on the robots.txt Rules:

Rules of specificity apply, not inheritance. Always include a blank line between rules. Note also that not all robots obey the robots rules — even Google has been reported to ignore certain robots rules. Also, comments are allowed (and recommended) within any robots.txt file when written on a per-line basis. Simply begin each line of comments with a pound sign “#”.

Prevent Robots from Indexing the Entire Site:

User-agent: *
Disallow: /

Prevent a Specific Robot from Indexing the Entire Site:

User-agent: Googlebot-Image
Disallow: /

Prevent all Robots from Indexing Specific Pages/Directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.html

A Specific Example:

In this example, no robots are allowed to index anything except for Google, which is allowed to index everything except the specified pages/directories. Note the required blank line between the rules.

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/

Another Specific Example:

In this example, no agents are allowed to index anything except for Alexa, which is allowed to index anything. Note that there is a blank space after the colon, which enables this rule to work.

User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow: 

Prevent all Agents Except for Google:

Here is Google’s preferred way to disallow all agents anything except Google, which is allowed everything. Note that “Allow” is not a standard parameter and therefore is not recommended.

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

Notes on the “meta robots” Tag:

Certain robots rules may also be included in the head section of a web document. Examine the following examples:

<meta name="robots" content="noindex,nofollow,noarchive" />
<meta name="robots" content="noindex,nofollow" />
<meta name="googlebot" content="none" />
<meta name="alexa" content="all" />

Here is a general list of values available for the “content” attribute of the “meta robots” tag:

  • noindex, index — Determines indexing of site/pages.
  • nofollow, follow — Determines following of links.
  • nosnippet — Do not display excerpts or cached content.
  • noarchive — Do not display or collect cached content.

Additionally, Altavista supports:

  • noimageindex — Index text but not images.
  • noimageclick — Link to pages but not images.

Jeff Starr
About the Author
Jeff Starr = Designer. Developer. Producer. Writer. Editor. Etc.
WP Themes In Depth: Build and sell awesome WordPress themes.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
The Tao of WordPress: Become your own WordPress guru.
Thoughts
What's up with Plesk UI lately? Especially on Chrome it looks just awful, all kinds of broken. Come on Plesk devs get it together.
Things get stressful, I try to pray. Not always easy, but always helps to relax and regain focus.
Nice new speed checker at fastorslow.com.
Easy way to exclude certain tests from WP Site Health: Site Health Tool Manager
Excellent (and free) tool for getting tons of site SSL infos: whynopadlock.com
Everyone just stay home and hide forever. Brilliant idea.
Playing with Microsoft Edge browser on macOS. It's like 1998 all over again.