New Bookstore! Save 20% on books with discount code: LAUNCH
Web Dev + WordPress + Security

Tell Google NOT to Index Certain Parts of Your Web Pages

There are several ways to instruct Google to stay away from various pages in your site:

..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This level of control over which pages are crawled and indexed is helpful, but what if you need to control how Google crawls the contents of a specific page? Easy. Google enables us to do this with a set of googleon/googleoff tags.

About googleon and googleoff tags

Put simply, the googleon/googleoff tags tell GoogleBot Google Search Appliance when to start and stop indexing various parts of the web document. Consider the following example:

<p>This is normal (X)HTML content that will be indexed by Google.</p>

<!--googleoff: index-->

<p>This (X)HTML content will NOT be indexed by Google.</p>

<!--googleon: index>

In this example, we see how the googleon/googleoff tags will prevent the second paragraph from being indexed by Google. Notice the “index” parameter, which may be set to any of the following:

  • index — content surrounded by “googleoff: index” will not be indexed by Google
  • anchor — anchor text for any links within a “googleoff: anchor” area will not be associated with the target page
  • snippet — content surrounded by “googleoff: snippet” will not be used to create snippets for search results
  • all — content surrounded by “googleoff: all” are treated with all attributes: index, anchor, and snippet

Cool, eh? Let’s have a look at a specific usage example..

Using googleon and googleoff tags

Example 1: Blog Comments
Let’s say your comment threads tend to stray into off-topic conversation. Keeping your pages as tightly focused on the subject at hand is a great way to improve the on-page relevancy of your targeted keywords while improving the accuracy of matching search queries. Thus, to keep the superfluous banter out of the Google index, you could add googleon/googleoff tags as follows:

<!--googleoff: all-->

<div id="comments">

   <p><strong>Nick Mason</strong> - August 2nd, 2009</p>
   <p>From Her Majesty the queen. His boots were very clean.</p>

   <p><strong>Rick Wright</strong> - August 3rd, 2009</p>
   <p>Every year is getting shorter, never seem to find the time.</p>

   <p><strong>David Gilmour</strong> - August 4th, 2009</p>
   <p>By the river holding hands roll me up and lay me down.</p>

   <p><strong>Roger Waters</strong> - August 5th, 2009</p>
   <p>And after a while, you can work on points for style.</p>

</div>

<!--googleon: all>

We definitely don’t want to see such a mindless thread in Google, and it will be interesting to see if the <pre> example gets dropped from the index..

May the Force be with You

While using this method to control how Google indexes your on-page content, there are a couple of things you should keep in mind. First, there is a difference between indexing and crawling. Google may crawl portions of your page that are demarcated with googleon/googleoff tags. So in case you were thinking that this technique may be a way to deal with Google’s new nofollow policy, forget about it — PageRank will continue to flow through any links contained within a googleoff zone.

Besides this, also keep in mind that this is a proprietary technique supported exclusively by Google Search Appliance. If you have content that you want to keep out of the index of all search engines, then you will need to find another way to do it. Eventually, Yahoo! and MSN/Live/Bing/Whatever may create proprietary “on/off tags” of their own, but chances are slim to none that they will obey the proprietary technology of the mighty Google.

For more information about googleon/googleoff tags, check out Google’s official documentation.

Jeff Starr
About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
Banhammer: Protect your WordPress site against threats.

25 responses to “Tell Google NOT to Index Certain Parts of Your Web Pages”

  1. Al Sefati 2010/07/01 8:53 am

    That seems to be good only if you have Google Search appliance installed according to this doc. http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/enterprise/pdf/gsa_datasheet.pdf

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
.htaccess made easy: Improve site performance and security.
Thoughts
Take a screenshot with Firefox (no extension required). Open Developer Tools Settings and enable the “Take a screenshot” button. Then click the button :)
Take a screenshot with Chrome (no extension required). Open DevTools, type Cmd + Shift + P, then type screenshot.
After 10 years working on my 2010 iMac, my upgrade finally arrived. Shiny new iMac shipped from Ireland :)
Too much caffeine weirds me out. But I love the taste of coffee. So once in a while I enjoy a small cup of decaf. Hits the spot.
Chris Coyier is a truly awesome person. One of the finest people I've ever worked with. Just #gottasayit
Excel won't open CSV file because SYLK format? Open it with text editor and add an apostrophe ' at the beginning of the file, save changes, done.
Displaying too many social media buttons and links all over the place imho makes you look desperate and frankly kinda sad.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.