If you have yet to encounter the content-scraping site, bitacle.org, consider yourself lucky. The scum-sucking worm-holes at bitacle.org are well-known for literally, blatantly, and piggishly stealing blog content and using it for financial gains through advertising. While I am not here to discuss the legal, philosophical, or technical ramifications of illegal bitacle behavior, I am here to provide a few critical tools that will help stop bitacle from stealing your content.
The htaccess Finger
Perhaps the most straightforward and effective method for keeping the bitacle thieves away from your site, adding the following htaccess rules to your root htaccess file will literally block bitacle’s IP address and return a 403 Forbidden message (for more information on htaccess files, see our article, Stupid htaccess Tricks, referenced below). Add this to your site’s root htaccess file:
RewriteBase /
RewriteCond %{REMOTE_ADDR} ^212\.22\.59\.251$ [OR]
RewriteCond %{HTTP_USER_AGENT} Bitacle
RewriteRule .? - [F]
The robots.txt Slap
Next up, another effective anti-bitacle method that instructs the bitacle bots to stay away from your site. This method uses a robots.txt file in your site’s root directory and literally denies bitacle agents crawl-access to all site contents. Simply add the following lines to your site’s root robots.txt file (for more information on robots.txt, see our article, Robots Notes Plus, referenced below):
User-agent: Bitacle bot/1.1
Disallow: /
User-agent: Bitacle bot
Disallow: /
User-agent: Bitacle *
Disallow: /
User-agent: Bitacle*
Disallow: /
User-agent: Bitacle
Disallow: /
Related WordPress Plugins
For more help on the anti-plagiarism front, check out Redalt’s Antileech Plugin and MaxPower’s Digital Fingerprint Plugin. These fine WordPress plugins come highly recommended and are definitely worth checking out.
Other Essential Tools
Beyond the essential preventative methods discussed above, there are many other resources and tools now available for dealing with site scrapers, content thieves, and other worthless garbage. A worthwhile website is Copyscape, which provides an excellent tool that enables users to search the web for stolen content. If you find that your content has indeed been plagiarized, read up on how to respond properly and effectively. Finally, try searching for various search terms, such as "plagiarism tools", "content scraping", "copyright protection", "syndication theft", etc. Good Luck!


2 Responses
Noel Cower – November 30, 2006
Figured I’d let you know that Bitacle’s bot does not pay attention to robots.txt rules. The most effective way to stop them is to simply ban their user-agent and take some measures to ensure that your content can’t be easily spidered/stolen.
Also, you are at far greater risk by using a service such as FeedBurner.
Perishable – November 30, 2006
Yeah, I have read elsewhere that bitacle ignores robots.txt rules, but I am paranoid enough to include them anyway. It may not be necessary, but it is the formally accepted method, and it definitely won’t hurt anything.
As for FeedBurner (and similar services), the benefits of their service currently outweigh the potential threat of content hounds like bitacle. Nonetheless, I definitely will be looking into it further and perhaps changing my mind if anything serious unfolds.. Either way, I appreciate your comment and the heads up concerning our mutual enemy! ;)