Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security
50 posts related to: How to Deal with Content Scrapers

2G Blacklist: Closing the Door on Malicious Attacks

[ 2G Blacklist ]

Since posting the Ultimate htaccess Blacklist and then the Ultimate htaccess Blacklist 2, I find myself dealing with a new breed of malicious attacks. It is no longer useful to simply block nefarious user agents because they are frequently faked. Likewise, blocking individual IP addresses is generally a waste of time because the attacks are coming from a decentralized network of zombie machines. Watching my error and access logs very closely, I have observed the following trends in current attacks: Continue reading »

Over 150 of the Worst Spammers, Scrapers and Crackers from 2007

Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated […] Continue reading »

Yahoo! Slurp in My Blackhole (Yet Again)

Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): Continue reading »

A Dramatic Week Here at Perishable Press..

..And we’re back. After an insane week spent shopping for a new host, dealing with some Bad Behavior, and transferring Perishable Press to its new home on a virtual private server (VPS), everything is slowly falling back into place. Along the way, there have been some interesting challenges and many lessons learned. Here are a few of the highlights.. Continue reading »

Protect Your Site Against UserCash and Other Scumbags

In this brief article I explain the atrocity that is UserCash and then provide the JavaScript needed to protect your site. Continue reading »

Yahoo! in my Blackhole

Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has […] Continue reading »

How to Verify the Four Major Search Engines

Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. So it’s always a good idea to implement a procedure for […] Continue reading »

Suspicious Behavior from Yahoo! Slurp Crawler

[ Image: Black and white illustration of the upper half of a man's suspicious, paranoid face ]

Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming the door on […] Continue reading »

Allow Google Reader Access to Hotlink-Protected Images

[ Image: Google Reader Icon ]

In our previous article, we explain the process of allowing Feedburner to access your hotlink-protected images. The article details the entire process, which covers the basics of hotlink protection and involves adding several lines of code to your htaccess file. In this article, we skip the detailed explanations and present only the main points. The discussion is very similar for both Feedburner and Google Reader, and may be extrapolated to serve virtually any purpose. If you are using htaccess to […] Continue reading »

Allow Feedburner Access to Hotlink-Protected Images

[ Image: Feedburner Icon ]

Recently, we installed and configured the excellent WordPress Feedburner plugin by the venerable Steve Smith. The plugin basically redirects our various WordPress-powered content feeds to Feedburner, which then delivers them to subscribers. This method enables us to take advantage of Feedburner’s excellent statistical tools. Further, all of the action happens silently, beneath the surface, and without the subscriber even realizing it. After a few weeks running the plugin with great success, we began hearing reports of broken and missing images […] Continue reading »

How to Block IP Addresses with PHP

[ Image: Skeletor Blocks a Move ]

Figuratively speaking, hunting down and killing spammers, scrapers, and other online scum remains one of our favorite pursuits. Once we have determined that a particular IP address is worthy of banishment, we generally invoke the magical powers of htaccess to lock the gates. When htaccess is not available, we may summon the versatile functionality of PHP to get the job done. This method is straightforward. Simply edit, copy and paste the following code example into the top of any PHP […] Continue reading »

Riding the Wave

del.icio.us

Compared to some of the big players out there on the internet, we here at Perishable Press run a relatively small website. We began this project in September of 2005 with nothing but a domain name and a pocketful of inspiration. During the first several months of development, our traffic statistics looked something like: one unique visitor and 10,000 hits (i.e., nobody but us). And then, suddenly and unexpectedly, everything changed.. Continue reading »

Disobedient Robots and Company

In our never-ending battle against spammers, leeches, scrapers, and other online undesirables, we have implemented several powerful security measures to improve the operational integrity of our perpetual virtual existence. Here is a rundown of the new behind-the-scenes security features of Perishable Press. Continue reading »

Stop Bitacle from Stealing Content

[ Stop bitacle.org ]

If you have yet to encounter the content-scraping site, bitacle.org, consider yourself lucky. The scum-sucking worm-holes at bitacle.org are well-known for literally, blatantly, and piggishly stealing blog content and using it for financial gains through advertising. While I am not here to discuss the legal, philosophical, or technical ramifications of illegal bitacle behavior, I am here to provide a few critical tools that will help stop bitacle from stealing your content. Continue reading »

Stupid htaccess Tricks Redux

One of our most popular posts, Stupid htaccess Tricks, has been completely rewritten and now includes almost twice as many stupid htaccess tricks. Plus, we have added a library of regex character definitions, more information for many of the directives, and several handy references. But wait, there’s more — we even threw in a “quick-jump” Table of Contents and a complete set of “up” links [ ^ ] for easy navigation. Utterly amazing! Continue reading »

Website Attack Recovery

Recently, every website on our primary server was simultaneously attacked. The offending party indiscriminately replaced the contents of every index file, regardless of its extension or location, with a few vulgar lines of code, which indicated intention, identity, and influence. Apparently, the attack occurred via Germany, through a server at the University of Hamburg (uni-hamburg.de). This relatively minor attack resulted in several hours of valuable online education. In this article, it is our intention to share experience with website attack […] Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Banhammer: Protect your WordPress site against threats.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.