The 2010 User-Agent Blacklist blocks hundreds of bad bots while ensuring open-access for the major search engines: Google, Bing, Ask, Yahoo, et al. Blocking bad user-agents is an effective addition to any security strategy. It works like this: your site is getting hammered by rogue bots that waste valuable server resources and bandwidth. So you grab a copy of the 2010 UA Blacklist from Perishable Press, include it in your site’s root .htaccess file, and enjoy a more secure and better performing website. It’s that easy.
Proven Security
The 2010 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.
Awhile ago, Silvan Mühlemann conducted a 1.5 year experiment whereby different approaches to email obfuscation were tested for effectiveness. Nine different methods were implemented, with each test account receiving anywhere from 1800 to zero spam emails. Here is an excerpt from the article:
When displaying an e-mail address on a website you obviously want to obfuscate it to avoid it getting harvested by spammers. But which obfuscation method is the best one? I drove a test to find out.
After reading through the article and its many findings, here are what seem to be the best methods for obfuscating email addresses displayed publicly on web pages..
Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated via IP lookup and denied access via simple htaccess directive:
<Limit GET POST PUT>
Order Allow,Deny
Allow from all
Deny from 123.456.789
</LIMIT>
Although many of the worst attacks happen in randomized, zombie-like fashion, I have found that individual IPs that are not blacklisted will return repeatedly until finally blocked. Yet, despite the short-term success enjoyed by denying access to the most malicious IPs, the long-term futility of such blacklisting reflects the temporary nature of this solution.
In other words, I have found that blocking individual IPs is useful only for limited periods of time. Thus, every year, I gather my code and flush the blacklist of all individually blocked IP addresses. I then start fresh, adding the worst villains to the list, blocking entire IP ranges if necessary, and referring to previous versions of my htaccess files to cross-check suspiciously familiar entities. Eventually, a new blacklist emerges and I share it at Perishable Press. Here is the current version for 2010..
For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the Mainstream Media website and happened to follow a link to your site (of all places), you would look at your access logs, notice Johnny’s visit, and speak out loud (slowly): “hmmm.. it looks like the Mainstream Media website referred my good pal Johnny to my Alka-Seltzer sales page.” In such a bizarre case, the Mainstream Media website — or specific page — is referred to as (no pun intended) the referrer.
As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against malicious behavior, it may certainly provide another layer of protection.
Even so, there are several things to consider before choosing to implement an extensive user-agent blacklist on your site. First and most importantly is the transient nature of the user agent itself. On most systems, the user-agent variable is easy to change, making it possible for bot owners to use any user-agent name they wish. Once a bad bot makes the rounds, becomes known, and is blacklisted, the bot owner need only modify or change its declared user agent and they’re back in business. User-agent names are constantly invented, spoofed, or otherwise altered in order to operate beneath — or above — the virtual radar. Thus, a user-agent blacklist is a high-maintenance affair, requiring continuous cultivation in order to maintain relevancy and effectiveness.
In my recent article on blocking proxy servers, I explain how to use HTAccess to deny site access to a wide range of proxy servers. The method works great, but some readers want to know how to allow access for specific proxy servers while denying access to as many other proxies as possible.
Fortunately, the solution is as simple as adding a few lines to my original proxy-blocking method. Specifically, we may allow any requests coming from our whitelist of proxy servers by testing Apache’s HTTP_REFERER variable, like so:
Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..
From time to time on the show, a contestant places a bid that is so absurd and so asinine that you literally laugh out loud, point at the monitor, and openly ridicule the pathetic loser. On such occasions, even the host of the show will laugh and mock the idiocy. Of course, this same situation happens frequently here at Perishable Press, where the scumbags that manage to escape the 3G Blacklist are proving themselves to be increasingly desperate and pathetic. Such is the case with this month’s official Blacklist Candidate Number 2008-10-19:
Come on down — you’re the next cracker whore to get banished from the site!
An ongoing series of articles on the fine art of malicious exploit detection and prevention. Learn about preventing the sneaky mischievous and deceptive practices of some of the worst spammers, scrapers, crackers, and other scumbags on the Internet.
In my previous article on redirecting 404 requests for favicon files, I presented an HTAccess technique for redirecting all requests for nonexistent favicon.ico files to the actual file located in the site’s web-accessible root directory:
Clearly, none of these URL requests target the “real” favicon.ico file, yet thanks to the previous method they are all happily redirected to the proper location. This is useful for a variety of reasons, including preventing excessive and unnecessary server strain due to malicious scripts.
When these errors first began appearing in the logs several months ago, I didn’t think too much of it — “just another idiot who can’t find my site’s favicon..” As time went on, however, the frequency and variety of these misdirected requests continued to increase. A bit frustrating perhaps, but not serious enough to justify immediate action. After all, what’s the worst that can happen? The idiot might actually find the blasted thing? Wouldn’t that be nice..
Recently, I reactivated an older version (1.16) of Jalenack’s Wordspew Shoutbox plugin for the Dead Letter Artchat forum. The DLa collective (of which I am a member) has been working on a new issue of their ‘zine and needed an easy online chat location for impromptu business dealz (ideas, planning, etc.). Almost immediately after reactivating the Shoutbox plugin, the chat forum was flooded with an endless wave of spam. The rate and volume of spam was so high as to render the forum utterly useless. — Ugh.
Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..
Just under the wire! Even so, this month’s official Blacklist-Candidate article may be the last monthly installment of the series. Although additional BC articles may appear in the future, it is unlikely that they will continue as a regular monthly feature. Oh sure, I see the tears streaming down your face, but think about it: this is actually good news. After implementing the 3G Blacklist, finding decent blacklist candidates is becoming increasingly difficult. This is a good thing because it indicates that the new blacklist strategy is working. Nonetheless, every now and then, some brainless bozo will magically appear and poke around where it shouldn’t be poking. Such is the case with this month’s Blacklist Candidate Number 2008-05-31:
Come on down! You’re the next worthless turd to get banished from the site!
Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..
Since the implementation of my 2G Blacklist, I have enjoyed a significant decrease in the overall number and variety of site attacks. In fact, I had to time-travel back to March 1st just to find a candidate worthy of this month’s blacklist spotlight. I felt like Rod Roddy looking over the Price-is-Right audience to announce the next name only to discover a quiet, empty room. And then like Bob gets pissed that nobody showed up and begins to bark and snarl at Rod to go across the street to the clam store and find some damn contestants. Or, ..um, something like that. Needless to say, this month’s data isn’t as fresh as I would have liked it, but I think you’ll find the information fascinating nonetheless. So let’s get on with it then:
Blacklist Candidate number 2008-04-27, come on down! You’re the next clam-store loser to get blacklisted from the site!
Not too long ago, a reader going by the name of bjarbj78 asked about how to block proxy servers from accessing her website. Apparently, bjarbj78 had taken the time to compile a proxy blacklist of over 9,000 domains, only to discover afterwards that the formulated htaccess blacklisting strategy didn’t work as expected:
deny from proxydomain.com proxydomain2.com
Blacklisting proxy servers by blocking individual domains seems like a futile exercise. Although there are a good number of reliable, consistent proxy domains that could be blocked directly, the vast majority of such sites are constantly changing. It would take a team of professionals working around the clock just to keep up with them all.
As explained in my reply to bjarbj78’s comment, requiring Apache to process over 9,000 htaccess entries for every request could prove disastrous:
After investigating some unusual 404 errors the other day, I found myself digging through the WordPress Admin trying to locate the “Subscribe to Comments” options panel. As it turns out, administrative options for the Subscribe to Comments plugin are split into two different areas. First, the S2C plugin provides configuration options under “Options>SubscribetoComments”, which enables users to tweak everything from subscription messages to custom CSS styles. New to me was the other half of the S2C administration area: the Subscription Manager! Carefully hidden under “Manage>Subscriptions”, the Subscription Manager provides several useful ways to filter your email subscribers:
Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..
Imagine, if you will, an overly caffeinated Bob Barker, hunched over his favorite laptop, feverishly scanning his server access files. Like some underpaid factory worker pruning defective bobble heads from a Taiwanese assembly line, Bob rapidly identifies and isolates suspicious log entries with laser focus. Upon further investigation, affirmed spammers, scrapers and crackers are swiftly blacklisted from future access. For the most heinous offenders, we suddenly hear Rod Roddy’s guzzling voice echo throughout the room:
“Candidate number 2008-03-09, COME ON DOWN!! — you’re the next contestant to get blacklisted from the site!”
Every week, I dig through my access and errorlogs to learn from the spammers, scrapers, and other cracker whores. Typically, attempts to exploit potential security vulnerabilities demonstrate the following characteristics:
Since posting the Ultimate htaccess Blacklist and then the Ultimate htaccess Blacklist 2, I find myself dealing with a new breed of maliciousattacks. It is no longer useful to simply block nefarious user agents because they are frequently faked. Likewise, blocking individual IP addresses is generally a waste of time because the attacks are coming from a decentralized network of zombie machines. Watching my error and access logs very closely, I have observed the following trends in current attacks:
User agents are faked, typically using something generic like “Mozilla/5.0”
Each attack may involve hundreds of compromised IP addresses
Attacks generally target a large number of indexed (i.e., known) pages, posts, etc.
Frequently, attacks utilize query strings appended to variously named PHP files
The target URLs often include a secondary URL appended to the end of a permalink
An increasing number of attacks employ random character strings to probe for holes
Yet despite the apparent complexity of such attacks, they tend to look remarkably similar. Specifically, notice the trends in the following examples of (nonexistent) target URLs, or “attack strings,” as I like to call them:
Update 2010/07/07: Please visit the 2010 IP Blacklist for more current information.
Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated via IP lookup and denied access via simple htaccess directive:
<Limit GET POST PUT>
order allow,deny
allow from all
deny from 123.456.789
</LIMIT>
Although many of the worst attacks happen in randomized, zombie-like fashion, I have found that individual IPs that are not blacklisted will return repeatedly until finally blocked. Yet, despite the short-term success enjoyed by denying access to the most malicious IPs, the long-term futility of such blacklisting reflects the temporary nature of this solution. In other words, I have found that blocking individual IPs is useful only for limited periods of time. Thus, every year, I gather my code and flush the blacklist of all individually blocked IP addresses. I then start fresh, adding the worst villains to the list, blocking entire IP ranges if necessary, and referring to previous versions of my htaccess files to cross-check suspiciously familiar entities. It is within this context, then, that I present the following manually assembled collection of over 150 of the worst spammers, scrapers, and crackers to hit my site in 2007.
Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..
Scumbag number 2008-02-10, “COME ON DOWN!!” — you’re the next baboon to get banished from the site!
Like many bloggers, I like to spend a little quality time each week examining my site’s error logs. The data contained in Apache, 404, and even PHP error logs is always enlightening. In addition to suspicious behavior, spam nonsense, and cracker mischief, this site frequently endures automated and even manual attacks targeting various XSS exploits, WordPress vulnerabilities, and other potential security holes. Although the number of successful attacks remains relatively small, the very nature of some of the attacks serves to threaten site performance, security and stability. Such is the case of blacklist candidate number 2008-02-10: IP address 128.111.48.138.
Approximately 30 days ago, I completely uninstalled the Bad Behavior plugin from Perishable Press. As you may recall, many Bad Behavior users were unexpectedly locked out of their own sites and forced to either uninstall or upgrade in order to fix the problem. Of course, in my perpetual battle to optimize and streamline everything, I decided to drop Bad Behavior from the otherwise obligatory WordPress anti-spam trinity.
30 days later..
I am happy to report that Perishable Press has not seen a noticeable increase in comment spam since the removal of the Bad Behavior plugin. Of course, during the past month, two or three trackback spam turds managed to slide through, however, even Bad Behavior failed to stop everything. This is great news because I prefer to avoid unnecessary plugins whenever possible — especially those of the resource-intensive variety.
Come one, come all — today we officially begin a new series of posts here at Perishable Press: the public exposure, humiliation, and banishment of spammers, crackers, and other site attackers. Kicking things off for 2008: blacklist candidate number 2008-01-02!
Every Wednesday, I take a little time to investigate my 404errorlogs. In addition to spam, crack attacks, and other deliberate mischief, the 404 logs for Perishable Press contain errors due to missing resources, mistyped URLs, and the occasional bizarre or even suspiciousbehavior of the search-engine robots. Whenever possible, I attempt to resolve a majority of the “fixable” errors, either by restoring missing resources, adding an htaccess redirect, or by any other means available.
Having exercised this rigorous maintenance practice for well over a year now, my 404 error logs are almost completely devoid of all “fixable” 404 errors, and are filled almost exclusively with spam attacks, XSS attempts, and other miscellaneous cracker nonsense. Fortunately, my site has only fallen victim to such espionage on one occasion, and on a different server.
“Oh no, not again!” It looks like another one of my non-existent bank accounts has been blocked at Bank of America. But that’s cool, because I like, totally graduated from third grade. Knowing best for all grammar and words in email. Let’s examine yet anotheridioticphishingattempt, shall we? First, let’s have a look at the full-meal deal (sans bank logos, links, and other forged minutia):
From : abuse@bankofamerica.com
Date : Wednesday, November 07, 2007 6:19 AM
To : none
Subject : Online Banking Alert
------------------------------
Your Online Banking is Blocked
Because of unusual number of invalid login attempts on you account,
we had to believe that, their might be some security problem on you
account. So we have decided to put an extra verification process to
ensure your identity and your account security. Please click on
sign in to Online Banking to continue to the verification process
and ensure your account security. It is all about your security.
Thank you, and visit the customer service section.
------------------------------
First of all, it needs to be said that, especially in our modern, “phishing-aware” world, it is absolutely critical for would-be phishers to comprehend thoroughly the language in which their bait will be delivered. This is especially true when it comes to the emulation of formal communication from legitimate business establishments such as banks, online shops, and governmental offices.
In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement. For example, as reader Greg suggests, a compressed version of the blacklist would be very useful. In this post, we present a compressed version of our Ultimate htaccess Blacklist that features around 50 new agents. Whereas the original blacklist is approximately 8.6KB in size, the compressed version is only 3.4KB, even with the additional agents. Overall, the compressed version requires fewer system resources to block a greater number of bad agents.
This effective triage of free WordPress plugins has served many a WP-blogger well, eliminating virtually 99% of all automated comment-related spam. When spam first became a problem for me, I installed this triple-threat arsenal of anti-spam plugins and immediately enjoyed the results. Although Spam Karma seemed a little invasive and resource-intensive, too much protection seemed far better than not enough.
Even so, during the most recent redesign of the site, one of my goals was to lighten things up as much as possible — fewer scripts, fewer images, fewer plugins, etc. During that process, I decided to drop both Bad Behavior and Spam Karma. What a mistake that turned out to be! At first Akismet held up just fine, but it only took a few weeks before Perishable Press got hit hard: over 300 spam comments, trackbacks and pingbacks snuck through the Akismet gate. Needless to say, I was extremely upset and spent over two hours scouring the database to remove the stench.