Tell Google NOT to Index Certain Parts of Your Web Pages
There are several ways to instruct Google to stay away from various pages in your site:
- Robots.txt directives
- Nofollow attributes on links
- Meta noindex/nofollow directives
- X-Robots noindex/nofollow directives
..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This level of control over which pages are crawled and indexed is helpful, but what if you need to control how Google crawls the contents of a specific page? Easy. Google enables us to do this with a set of
About googleon and googleoff tags
Put simply, the
googleoff tags tell
GoogleBot Google Search Appliance when to start and stop indexing various parts of the web document. Consider the following example:
<p>This is normal (X)HTML content that will be indexed by Google.</p> <!--googleoff: index--> <p>This (X)HTML content will NOT be indexed by Google.</p> <!--googleon: index>
In this example, we see how the
googleoff tags will prevent the second paragraph from being indexed by Google. Notice the “
index” parameter, which may be set to any of the following:
- index — content surrounded by “
index” will not be indexed by Google
- anchor — anchor text for any links within a “
anchor” area will not be associated with the target page
- snippet — content surrounded by “
snippet” will not be used to create snippets for search results
- all — content surrounded by “
all” are treated with all attributes: index, anchor, and snippet
Cool, eh? Let’s have a look at a specific usage example..
Using googleon and googleoff tags
Example 1: Blog Comments
Let’s say your comment threads tend to stray into off-topic conversation. Keeping your pages as tightly focused on the subject at hand is a great way to improve the on-page relevancy of your targeted keywords while improving the accuracy of matching search queries. Thus, to keep the superfluous banter out of the Google index, you could add
googleoff tags as follows:
<!--googleoff: all--> <div id="comments"> <p><strong>Nick Mason</strong> - August 2nd, 2009</p> <p>From Her Majesty the queen. His boots were very clean.</p> <p><strong>Rick Wright</strong> - August 3rd, 2009</p> <p>Every year is getting shorter, never seem to find the time.</p> <p><strong>David Gilmour</strong> - August 4th, 2009</p> <p>By the river holding hands roll me up and lay me down.</p> <p><strong>Roger Waters</strong> - August 5th, 2009</p> <p>And after a while, you can work on points for style.</p> </div> <!--googleon: all>
We definitely don’t want to see such a mindless thread in Google, and it will be interesting to see if the
<pre> example gets dropped from the index..
May the Force be with You
While using this method to control how Google indexes your on-page content, there are a couple of things you should keep in mind. First, there is a difference between indexing and crawling. Google may crawl portions of your page that are demarcated with
googleoff tags. So in case you were thinking that this technique may be a way to deal with Google’s new nofollow policy, forget about it — PageRank will continue to flow through any links contained within a
Besides this, also keep in mind that this is a proprietary technique supported exclusively by Google Search Appliance. If you have content that you want to keep out of the index of all search engines, then you will need to find another way to do it. Eventually, Yahoo! and MSN/Live/Bing/Whatever may create proprietary “on/off tags” of their own, but chances are slim to none that they will obey the proprietary technology of the mighty Google.
For more information about googleon/googleoff tags, check out Google’s official documentation.
That actually sounds pretty awesome! will need to add it to my ‘must needed improvements’
It seems that these tags are for Google Search Appliance, whatever that is. Do you have any evidence that they work for Googlebot as well? It would be great if they do.
I got all excited with you, Jeff. Wonder why Googlebot doesn’t have something like this. It would be especially helpful for CMS sites.
My company’s site keeps getting weird Google results that have one title but go to a page with different content, all because we have sidebar modules that have teasers, like “latest reviews” and so on, that appear on many different pages in our site. I wish I could keep Googlebot from indexing those modules.
*Sigh* – I figured it was too good to be true. In my excitement, I failed to realize that this technique only applies to the GSA. Ah well, life goes on.
I did update the post with the correct information, so thank you Jessi for pointing it out. Cheers.
Lol, yeah, sorry about the false alarm.
Google does need to implement something like this for their main index. If you think about it, the googleon/googleoff tags have been deemed worthy for enterprise use, so it seems logical that they would also benefit everyone else as well.
They are probably paranoid about potential “black-hat” abuse if it were available to the masses.
I got all excited for a moment. Alas. I asked for page section exclusion almost two years ago.
Hi Michael, my apologies for the false hope. I have known about this technique for quite awhile, but always assumed that it targeted Googlebot. In fact, I had never even heard of the “Google Search Appliance” until today.
Enabling the googleon/googleoff tags only on GSA seems like a perfectly good technique wasted on a rather obscure aspect of search. Quite a disappointment imho.
There are a few other useful features that GSA has but the bot does not, such as meta tag searches. I think their main concern is the potential for black hat SEO abuse in these cases…
Hi Jeff, I’m sure you’re right about this:
“if it threatens their bottom-line they’re not going to use it.”
And unfortunately, those of us adversely affected by lack of this tool need Google too much to try and organize a boycott.
I’ve actually been trying to convince my co-workers that we should design our pages with this problem in mind, and try not to put teaser content on a page we don’t want it associated with in Google results. Doesn’t seem like we should have to do that, but that’s life.
Hi fuzion, that’s what I’m thinking too. Another case where a small group of people ruin it for the rest of us.
Also keep in mind that Google’s main purpose is to make money. Regardless of how useful a tool happens to be (as in the case of googleon/googleoff), if it threatens their bottom-line they’re not going to use it.
Thanks so much, Jeff! I’ll check out that article for sure. Thanks for being so hugely helpful, as always. You are a treasure!