Latest TweetsVerify any search engine or visitor via CLI Forward-Reverse Lookup perishablepress.com/cli-forwar…
Perishable Press

Stop RSSing.com from Framing Your Content

This quick post explains how to stop the notorious site scrapers, RSSing.com, from stealing your content. In fact, this technique can be used to stop virtually any site that uses HTML frames to scrape your pages. Once again, the solution is one line of .htaccess to the rescue.

Readers reach out..

Recently a reader asked about stopping RSSing.com from stealing their content:

Do you have anything or even have any interest in building anything that stops the feed scraper RSSing.com? I notice they’ve got some channels going on you, too. […] Google “Perishable Press + Rssing.com” for a typical Google listing. I discovered your channel listings by doing so looking to see if you already had a script out.

People have been stealing my content for over 10 years now, so I’m very used to it. Still I think it’s bad practice, so I decided to pop on over to the alleged site and check it out for myself. Sure enough, there are over 30 article summaries posted, each of which links to a framed version of the complete article. And not just for this site, some of my other sites also are scraped.

Wanting to help, I quickly tried a few of my break out of frames scripts, but to no avail. Apparently, the scraping site is using some heroically advanced anti frame-busting buster script to circumvent any attempts at JavaScript-based retaliation. Fortunately, we can invoke the powers of .htaccess to stop the nonsense.

Who/what is RSSing.com

So what is RSSing.com? Who cares. Apparently it’s just another site that likes to steal other people’s content instead of doing something unique or helpful. It doesn’t matter, really, and honestly I’m not even going to block them because I can always use the extra traffic. And besides they don’t outrank me on anything important so double no cares given. I’m sharing this information for my readers and to help fellow seekers of useful security techniques.

Maybe first try asking..

Before pulling out the big guns, maybe first try just “asking” the RSSing folks to kindly stop stealing your stuff. They even have a contact form all set up for this very purpose. Not sure if they honor all requests immediately or what, so if you happen to have experience with this strategy, please share in the comment section. Here is a screenshot to help you find it:

[ RSSing.com Removal Request or Whatever ]

FWIW IMHO they’re the ones who should be asking to use your content in the first place. Not the other way around. Putting the burden on everyone else is just not cool. Anyone who assumes that everyone wants their content to be stolen is utterly clueless.

Knock ’em dead (kid)

If you’re reading this, I assume you want to block RSSing from framing your content. The first thing to understand is that they are using two different methods to scrape:

  • They scrape and post excerpts directly from your feed (cached in their database)
  • They scrape your full post content via HTML frames (not cached in their database)

So the scraping via feed excerpt is not such a huge deal, and really is difficult to prevent since they are housing your content in their own database. Anyone who publishes their content via RSS feed is subject to this sort of thing. Nonetheless, if you are serious about stopping lowlifes from stealing your feed content, check out my article How to Deal with Content Scrapers.

To stop the framed content, on the other hand, in most cases a strong JavaScript anti-frame buster script would do the job, but only if counter-measures are not in place. In such cases, and even in all cases I dare say, here is a much stronger technique for preventing your pages from being framed by third-party sites. Add the following code to your site’s root .htaccess file:

# break out of frames
<IfModule mod_headers.c>
	Header always append X-Frame-Options SAMEORIGIN
</IfModule>

That little snippet tells the server to include an X-Frame-Options header along with responses to all requests. The value of this header is SAMEORIGIN, which means that any frame request that does not originate from your domain will be blocked. So you can use HTML frames all day long if they originate from your own site. All other domains, however, will not be able to frame your pages. That is, until some clever lazy content thief figures out a way to bypass the restriction. So apply and be done, but keep an eye on things and stay vigilant.

So for now, it’s bye-bye RSSing.com and bye-bye content framing in general.

Before/after screenshots

For those who are wondering about the effect of the previous .htaccess technique, here is a screenshot showing how my scraped pages were displayed at RSSing.com before applying the prescribed snippet:

[ Perishable Press framed at RSSing.com ]

And here is a screenshot showing how my scraped pages were displayed at RSSing.com after applying the .htaccess snippet:

[ Perishable Press NOT framed at RSSing.com (thanks to .htaccess snippet) ]

As mentioned before, I am not blocking RSSing from framing my content. These screenshots are for demonstration purposes only. Basically if you employ the previous .htaccess technique, all framing pages at RSSing will display blank white pages inside of the frames. Definitely should be sufficient for getting Google to rank your pages higher than those framing your content.

Jeff Starr
About the Author Jeff Starr = Designer. Developer. Producer. Writer. Editor. Etc.
Archives
11 responses
  1. Yep gotta hate scrapers like like.

    I’ve had this in my htaccess for a few years now, and checking on rssing still seems to be providing blank pages for someone to read :)

    A small word of warning though. I implemented a third-party questionnaire on my site that pushed the form and results onto a page within on my site using javascript within an iframe. As the iframe content was not self-hosted the frame remained blank.

    The workaround was to put the poll’s javascript from the third party (easypolls.net) onto a ‘source’ page on my site (rather than running it from the their own server). This source javascript can then be imported into an iframe as the source domains now match.

    Cheers
    Andy

  2. Sorry, in my first line I meant to say ‘like that’, not ‘like like’. Doh

  3. Jelmer Smid March 3, 2016 @ 2:04 am

    for nginx:

    server { add_header X-Frame-Options SAMEORIGIN; }

  4. This would also prevent your images from appearing in Google image results, correct?

    • Jeff Starr

      No I don’t think Google Images is using HTML frames to display images. I think they host/cache their own copies and serve them via base64-encoding. Check out the source code on any Image Search page to see for yourself.

  5. Elad Karako April 12, 2016 @ 1:00 pm

    Try my code, combination of 3 methods (not using “policy” method since it is very nasty!)

    <script type="application/javascript" src="data:application/javascript;base64,KGZ1bmN0aW9uKHQscyl7Cih0Lmhvc3RuYW1lLnRvTG93ZXJDYXNlKCkhPT1zLmhvc3RuYW1lLnRvTG93ZXJDYXNlKCkpJiYodD1zKTtyZXR1cm4gdHJ1ZTt9KHRvcC5sb2NhdGlvbixzZWxmLmxvY2F0aW9uKSk7"></script>
    <noscript><meta http-equiv="X-Frame-Options" content="DENY"></noscript>
    <meta http-equiv="window-target" content="_top">
  6. Thank for this article; I just found out that they are using my content, too!

    But are you sure that they are generating extra traffic?

    • Jeff Starr

      It all depends on whether or not the scraped pages is ranking higher than the original content. In my case it’s “no”, so most likely the equation benefits this domain, albeit perhaps only marginally. Any small amount traffic offsets the work required to fight against it, and then what’s the point in that case?

      • Thank you for your answer! It seems that asking them to remove my content via contact form as described above helped: They just sent me a message:

        Your site has been added to our blocked list. No more feeds from your site can be added to our site and existing content will be removed shortly.

      • Jeff Starr

        Awesome, it’s good to get confirmation that RSSing responds to requests. Honestly I don’t think they’re “bad guys” or whatever, just that they go about things in a backwards way: they should be asking for permission to feature people’s content, instead of assuming that everyone is on board and requiring producers to go out of their way to opt-out.

        In other words, the burden should be on the scraper, not the content producer. It’s the difference between being perceived as a “content thief” vs. a “legitimate service”. All about perception. In any case, thanks for the follow-up, glad you got it worked out.

[ Comments are closed for this post ]