Tag: scrapers

How to Deal with Content Scrapers

Posted on September 24, 2010 in Websites by Jeff Starr

Chris Coyier of CSS-Tricks recently declared that people should do “nothing” in response to other sites scraping their content. I totally get what Chris is saying here. He is basically saying that the original source of content is better than scrapers because:

  • it’s on a domain with more trust.
  • you published that article first.
  • it’s coded better for SEO than theirs.
  • it’s better designed than theirs.
  • it isn’t at risk for serious penalization from search engines.

If these things are all true, then I agree, you have nothing to worry about. Unfortunately, that’s a tall order for many sites on the Web today. Although most scraping sites are pure and utter crap, the software available for automating the production of decent-quality websites is getting better everyday. More and more I’ve been seeing scraper sites that look and feel authentic because they are using some sweet WordPress theme and a few magical plugins. In the past, it was easy to spot a scraper site, but these days it’s getting harder to distinguish between scraped and original content. Not just for visitors, but for search engines too.

Continue Reading

How to Protect Your Site Against Content Thieves

Posted on September 23, 2009 in Blogging by Jeff Starr

Stolen content is the bane of every blogger who provides a publicly available RSS feed. By delivering your content via feed, you make it easy for scrapers to assimilate and re-purpose your material on their crap Adsense sites. It’s bad enough that someone would re-post your entire feed without credit, but to use it for cheap money-making schemes is about as pathetic as it gets. If you’re lucky, the bastards may leave all the links intact, so at least you will get a few back-links (if you have been linking internally) and get notified of the stolen content as well (via pingback or Google Alert). Lately, however, many of the scraper sites that I have seen are completely removing all links within the stolen content. Incidentally, there are some tell-tale signs that the site you are visiting is a scraper site:

  • No RSS feed available
  • Many quality posts that contain no links
  • Many quality posts but very low subscriber count
  • Great content but with zero comments on any posts
  • Lots of good content but with lots of Adsense or other ads
  • No “About” page or business information
  • And the number one brain-dead giveaway: no contact form or email address

If you pay attention as you surf around, you may want to keep an eye out for some of these dead giveaways. If it looks like the site is profiting from stolen content, it is advisable to leave immediately and locate an original source of information (you could even be cool and report the scraper site to the original author). I.e., help strengthen the legit blogging community and don’t support scrapers in any way. But avoiding scraper sites is merely an afterthought. The real challenge is to have a solid strategy in place that will help you identify, eliminate and prevent stolen content. Unfortunately, there is no “magic cure” that will stop the scrapers from stealing your hard work — apart from running a private site or not providing a feed — but there are many great tools that have proven quite effective in fighting the war against stolen content. While not completely exhaustive, here are some powerful tips and tricks that have served me well over the years:

Continue Reading