New Bookstore! Save 20% on books with discount code: LAUNCH
Web Dev + WordPress + Security

Robots Notes Plus

About the Robots Exclusion Standard:

The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

Notes on the robots.txt Rules:

Rules of specificity apply, not inheritance. Always include a blank line between rules. Note also that not all robots obey the robots rules — even Google has been reported to ignore certain robots rules. Also, comments are allowed (and recommended) within any robots.txt file when written on a per-line basis. Simply begin each line of comments with a pound sign “#”.

Prevent Robots from Indexing the Entire Site:

User-agent: *
Disallow: /

Prevent a Specific Robot from Indexing the Entire Site:

User-agent: Googlebot-Image
Disallow: /

Prevent all Robots from Indexing Specific Pages/Directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.html

A Specific Example:

In this example, no robots are allowed to index anything except for Google, which is allowed to index everything except the specified pages/directories. Note the required blank line between the rules.

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/

Another Specific Example:

In this example, no agents are allowed to index anything except for Alexa, which is allowed to index anything. Note that there is a blank space after the colon, which enables this rule to work.

User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow: 

Prevent all Agents Except for Google:

Here is Google’s preferred way to disallow all agents anything except Google, which is allowed everything. Note that “Allow” is not a standard parameter and therefore is not recommended.

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

Notes on the “meta robots” Tag:

Certain robots rules may also be included in the head section of a web document. Examine the following examples:

<meta name="robots" content="noindex,nofollow,noarchive" />
<meta name="robots" content="noindex,nofollow" />
<meta name="googlebot" content="none" />
<meta name="alexa" content="all" />

Here is a general list of values available for the “content” attribute of the “meta robots” tag:

  • noindex, index — Determines indexing of site/pages.
  • nofollow, follow — Determines following of links.
  • nosnippet — Do not display excerpts or cached content.
  • noarchive — Do not display or collect cached content.

Additionally, Altavista supports:

  • noimageindex — Index text but not images.
  • noimageclick — Link to pages but not images.

Jeff Starr
About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
WP Themes In Depth: Deep dive into WP theme development.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
WP Themes In Depth: Build and sell awesome WordPress themes.
Thoughts
Take a screenshot with Firefox (no extension required). Open Developer Tools Settings and enable the “Take a screenshot” button. Then click the button :)
Take a screenshot with Chrome (no extension required). Open DevTools, type Cmd + Shift + P, then type screenshot.
After 10 years working on my 2010 iMac, my upgrade finally arrived. Shiny new iMac shipped from Ireland :)
Too much caffeine weirds me out. But I love the taste of coffee. So once in a while I enjoy a small cup of decaf. Hits the spot.
Chris Coyier is a truly awesome person. One of the finest people I've ever worked with. Just #gottasayit
Excel won't open CSV file because SYLK format? Open it with text editor and add an apostrophe ' at the beginning of the file, save changes, done.
Displaying too many social media buttons and links all over the place imho makes you look desperate and frankly kinda sad.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.