Latest TweetsOfficial Resources for #Gutenberg Block Editor: digwp.com/2018/12/resources-gu… #WordPress
Perishable Press

How to Add Meta Noindex to Your Feeds

Want to make sure that your feeds are not indexed by Google and other compliant search engines? Add the following code to the channel element of your XML-based (RSS, etc.) feeds:

<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />

Here is an example of how I use this tag for Perishable Press feeds (vertical spacing added for emphasis):

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">

<channel>
	<title>Perishable Press</title>
	<link>https://perishablepress.com/</link>
	<description>Digital Design and Dialogue ~</description>
	<pubDate>Mon, 29 Oct 2007 21:38:24</pubDate>
	<language>en</language>


	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />


	<image>
	   <link>https://perishablepress.com/</link>
	   <url>https://perishablepress.com/_/perishable-press.jpeg</url>
	   <title>Perishable Press</title>
	</image>
	<item>
	   <title>Welcome to Perishable Press</title>
	   <link>https://perishablepress.com/</link>
	   <dc:creator>Perishable</dc:creator>
	   <dc:subject>WordPress</dc:subject>
	   .
	   .
	   .

Of course, other meta elements may be added as well, including this one that disallows Yahoo! Pipes from processing your feed:

<meta xmlns="http://pipes.yahoo.com" name="pipes" content="noprocess" />

While we’re at it, what do you think are some other useful meta elements to add to XML/RSS feeds?

Jeff Starr
About the Author Jeff Starr = Designer. Developer. Producer. Writer. Editor. Etc.
Archives
15 responses
  1. Another solution is to use the robot.txt file to forbid the indexing of feeds and co.

    In robot.txt :

    Disallow: /wp-
    Disallow: /feed
    Disallow: /comments/feed
    Disallow: /feed/$
    Disallow: /*/feed/$
    Disallow: /*/feed/rss/$
    Disallow: /*/trackback/$
    Disallow: /*/*/feed/$
    Disallow: /*/*/feed/rss/$
    Disallow: /*/*/trackback/$
    Disallow: /*/*/*/feed/$
    Disallow: /*/*/*/feed/rss/$
    Disallow: /*/*/*/trackback/$
    Disallow: /*?*
    Disallow: /*?
    #Disallow: /theme/*/*
    #Disallow: /tag/*/*

    (note that I’ve commented the last 2 because HeadSpace plugin already put no-index meta tag on theme and tag pages.)

    If someone want to debate the choice of the robot.txt technique VS the meta no-index technique, I’m highly interested !

  2. Jeff Starr

    Well, I don’t know about debating you, but it should be pointed out that robots.txt directives function differently than those of the meta noindex variety. As far as I know, disallow rules specified via robots.txt forbid compliant search engines from accessing matching resources entirely. On the other hand, meta noindex rules do not prevent search engines from accessing and crawling the page. This enables search engines to follow links contained within noindex content. A subtle distinction, perhaps, but important nonetheless.

  3. Yes, “debate” was not the word i should have used. That’s not easy to express in a another langage.

    Thanks for pointing out the fact that no-index allow crawlers to follow links, where as robot.txt strictly forbid access to those pages.

  4. Isn’t the link of your feed image broken ?

    <image>
         <link>https://perishablepress.com/</link>
         <url>https://perishablepress.com/pressburner.jpe</url>
         <title>Perishable Press</title>
    </image>

    https://perishablepress.com/pressburner.jpe leads to a 404.

  5. Oh, i’ve just come accros a blog that says that Google would understand the no-index statement in robots.txt files. You would write something like :

    Disallow: /wp-
    Noindex: /feed/

    It would be awesome to fight duplicate content from one unique robot.txt file !

  6. That would be awesome, especially at a higher scale than a WordPress weblog – imagine the SEO work on a website like Flickr!

    Tough I’ve been thinking a lot since I read your post, about this follow/no-index (meta no-index) – no-follow/no-index (robots.txt) dilemma.

    My point is that on a typical WordPress weblog, why would one need the crawlers to access the categories, tags, search pages; and the feed if it’s got the same content as the blog offers ?

    All the links that are on those pages are already on the posts. Also, crawlers searching into duplicate content are wasting bandwith. On a big website, with a much crawling, it represents a lot of money.

    So again, why would you want bots to crawl the links of your duplicate content pages ?

  7. Jeff Starr

    Hi Louis,

    The image path was changed during my latest site overhaul/upgrade project. I consolidated all of the miscellaneous site logos and icons into a single location. These images are available to the public at the official “Link to Perishable Press” page.

    As for the robots.txt noindex trick, yes, that would be awesome, however, as of now Google would be the only search engine supporting it. And, until the others join in, adding meta noindex to your feeds and pages remains highly useful, especially for SEO purposes.

    Eventually, I suspect, robots.txt will evolve into a full-fledged, highly flexible protocol that will replace noindex, noarchive, nofollow, disallow, and other crawl-related directives with its own, specifically developed language.. kind of like CSS for spiders ;)

  8. Jeff Starr

    When it comes to controlling link equity and indexing of content, we have three primary tools, each of which serves a different function.

    Robots.txt directives prevent compliant search engines from accessing specified resources. This is useful for admin pages and other directories that do not need to be included in the search listings.

    Meta tags such as noindex and noarchive assume search-engine access and enable spiders to crawl the pages and follow links. Link equity will also be passed through such pages.

    Nofollow tags as applied directly to links allow search engine access, but forbid the passing of link equity to the target pages. This method is useful for controlling directly the flow of link juice throughout a site.

    Depending on your SEO goals, manipulating the ebb and flow of link juice is greatly facilitated by the functional variety provided by these three techniques.

  9. John Wilberforce December 4, 2007 @ 5:14 am

    Very useful, thank you!
    I’m always on the look out for useful tips like this, and your site is full of them! I’ll be bookmarking you for sure!

  10. Jeff Starr

    Thank you, John! I am glad to be of service ;)

  11. custom web design May 2, 2008 @ 12:47 pm

    I am trying to use this with a google/yahoo sitemap. This validates, but will it really work the way it appears?

    Thanks for the great post–only one I could find on the topic.

    Custom web design

  12. Jeff Starr

    Yes, I think this method will work.. hence the article ;) I am glad you found the information useful — thanks for the feedback!

[ Comments are closed for this post ]