Tell Google NOT to Index Certain Parts of Your Web Pages
There are several ways to instruct Google to stay away from various pages in your site:
- Robots.txt directives
- Nofollow attributes on links
- Meta noindex/nofollow directives
- X-Robots noindex/nofollow directives
..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This level of control over which pages are crawled and indexed is helpful, but what if you need to control how Google crawls the contents of a specific page? Easy. Google enables us to do this with a set of googleon
/googleoff
tags.
About googleon and googleoff tags
Put simply, the googleon
/googleoff
tags tell GoogleBot Google Search Appliance when to start and stop indexing various parts of the web document. Consider the following example:
<p>This is normal (X)HTML content that will be indexed by Google.</p>
<!--googleoff: index-->
<p>This (X)HTML content will NOT be indexed by Google.</p>
<!--googleon: index>
In this example, we see how the googleon
/googleoff
tags will prevent the second paragraph from being indexed by Google. Notice the “index
” parameter, which may be set to any of the following:
- index — content surrounded by “
googleoff:
index
” will not be indexed by Google - anchor — anchor text for any links within a “
googleoff:
anchor
” area will not be associated with the target page - snippet — content surrounded by “
googleoff:
snippet
” will not be used to create snippets for search results - all — content surrounded by “
googleoff:
all
” are treated with all attributes: index, anchor, and snippet
Cool, eh? Let’s have a look at a specific usage example..
Using googleon and googleoff tags
Example 1: Blog Comments
Let’s say your comment threads tend to stray into off-topic conversation. Keeping your pages as tightly focused on the subject at hand is a great way to improve the on-page relevancy of your targeted keywords while improving the accuracy of matching search queries. Thus, to keep the superfluous banter out of the Google index, you could add googleon
/googleoff
tags as follows:
<!--googleoff: all-->
<div id="comments">
<p><strong>Nick Mason</strong> - August 2nd, 2009</p>
<p>From Her Majesty the queen. His boots were very clean.</p>
<p><strong>Rick Wright</strong> - August 3rd, 2009</p>
<p>Every year is getting shorter, never seem to find the time.</p>
<p><strong>David Gilmour</strong> - August 4th, 2009</p>
<p>By the river holding hands roll me up and lay me down.</p>
<p><strong>Roger Waters</strong> - August 5th, 2009</p>
<p>And after a while, you can work on points for style.</p>
</div>
<!--googleon: all>
We definitely don’t want to see such a mindless thread in Google, and it will be interesting to see if the <pre>
example gets dropped from the index..
May the Force be with You
While using this method to control how Google indexes your on-page content, there are a couple of things you should keep in mind. First, there is a difference between indexing and crawling. Google may crawl portions of your page that are demarcated with googleon
/googleoff
tags. So in case you were thinking that this technique may be a way to deal with Google’s new nofollow policy, forget about it — PageRank will continue to flow through any links contained within a googleoff
zone.
Besides this, also keep in mind that this is a proprietary technique supported exclusively by Google Search Appliance. If you have content that you want to keep out of the index of all search engines, then you will need to find another way to do it. Eventually, Yahoo! and MSN/Live/Bing/Whatever may create proprietary “on/off tags” of their own, but chances are slim to none that they will obey the proprietary technology of the mighty Google.
For more information about googleon/googleoff tags, check out Google’s official documentation.
25 responses to “Tell Google NOT to Index Certain Parts of Your Web Pages”
Boycotting Google sounds futile, but it is something I have advocated in the past. Unfortunately, Google is too huge for anything like this to work. We will just have to wait until Google becomes fat and bloated and collapses from its own tremendous weight. Kinda like Microsoft.
In the meantime there’s always JavaScript to keep things away from teh Googlebot :)
That’s an excellent question, and one that I recently covered while pondering a somewhat-related issue: Google’s new nofollow policy. One way to workaround the new nofollow dilemma is to use a slice of external JavaScript, which is mostly inaccessible to Googlebot at this time (may change in the future). Check out this post for a discussion of this along with an easy way to use JavaScript for non-indexed content.
Religious use of Scroogle is the extent of my own personal big G boycott.
https://addons.mozilla.org/en-US/firefox/addon/12506
http://userscripts.org/scripts/show/23529
Well, that and a few angry blog posts from back when Chrome came out and they banned my Adsense account, taking ~$500 with it. ;)
Scroogle looks pretty sweet. I installed it and will be trying it out. Thanks!
Sorry to hear about your Adsense account — I had no idea they would do such a thing. Were you able to get the money back and/or get your account re-activated?
heh, no.. I joined the legions of other banned users that got no response whatsoever. I still email them once in a while, but they never reply.
Google’s in a sad state these days, doing what they can to keep the baddies from exploiting their services, while at the same time totally sacrificing their normal user experience. Did you know that you can’t create a gmail account these days without giving them your cell phone number? It only affects IP’s with an account already associated, but think about the impact on college or corporate networks… I guess data mining is yet another industry not affected by our financial depression.
You can adjust the importance of content using section targeting:
“Section targeting allows you to suggest sections of your text and HTML content that you’d like us to emphasize or downplay when matching ads to your site’s content. By providing us with your suggestions, you can assist us in improving your ad targeting. We recommend that only those familiar with HTML attempt to implement section targeting.
To implement section targeting, you’ll need to add a set of special HTML comment tags to your code. These tags will mark the beginning and end of whichever section(s) you’d like to emphasize or de-emphasize for ad targeting.
The HTML tags to emphasize a page section take the following format:
<!-- google_ad_section_start -->
<!-- google_ad_section_end -->
”
https://www.google.com/adsense/support/bin/answer.py?hl=en&answer=23168
I use this on Scamdex – I was unaware of the GSA options.
@fuzion: yes, I sense that Google is peaking these days, meaning that things are about as good as they’re going to get, and that it’s only downhill from here. I have been reading many, many negative reports about Google lately (including yours), something which was pretty scarce even just a couple of years ago. From what I can tell, they are selling out to every big business deal they can get their hands on and have shifted from a proactive strategy to a reactive one. Instead of going out of their way to do their best, they simply make more rules, lock things down, and call it good. This is only my opinion, but this general trend has been seen in many corporations and even celebrity entities, actors, rock stars, etc. With social media, it wouldn’t take much for a new lean, mean search engine to catch fire and cleave a huge chunk out of Google’s pie.
@Mark: Interesting, I was not aware of this either. So it looks like the ad_sections apply only to sites participating with Adsense? Have you seen any positive effects since using these tags? Do you think they are effective?
This is such an amazing piece of information. I would definitely try my hands at this cheeky little tip of controlling the google’s search mechanism. Thanks a lot for sharing this stuff.
Wow, googleon and googleoff tags, I never knew!
Hi Jeff,
I used googleoff to encumber certain expressions I don’t want to be indexed. For instance: In an article about the etymology of the word “w-h-o-r-e” I wanted to prevent people looking for professionals women entering my site.
Now I can see: It doesn’t work like everything Google does not work exactly.
I am turning back to span the expression and spell it backwards and adding “
unicode-bidi: bidi-override;
” and “direction: rtl;
” in CSS.Bye, Helen
Excellent tip, Helen — thanks for sharing :)