Building the 3G Blacklist, Part 3: Improving Site Security by Selectively Blocking Rogue User Agents

[ 3G Stormtroopers ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this third article, I discuss targeted, user-agent blacklisting and present an alternate approach to preventing site access for the most prevalent and malicious user agents. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist.

Improving Site Security by Selectively Blocking Rogue User Agents

Several months ago, while developing improved methods for protecting websites against malicious attacks, I decided to remove the Ultimate htaccess Blacklist from the Perishable Press domain. In its place, I have been using the 2G Blacklist, individual IP blocking, and an extremely concise selection of blacklisted user-agents. So far, this new, streamlined strategy has proven just as (if not more) effective as the previous “ultimate-blacklist” method.

There are several reasons why using extensive lists of blocked user agents inevitably fails to provide worthwhile protection. First, creating and maintaining a current list of every nasty agent consumes far too much time. Once lists become significantly populated, efficient cultivation becomes increasingly difficult. Once an agent is added to the list, it typically remains there indefinitely, unless the required time is spent to periodically verify and edit each and every entry.

Another important reason why such blacklists are becoming increasingly futile involves the mutation rate of user-agent identities. New agents are unleashed every minute, it seems, and old agents are constantly disappearing and reappearing in altered form. Further, many user agents are masked with fake identities, such that “agent-x” gains access even though it is technically blacklisted.

One final (and important) point about the ineffectiveness of trying to maintain a blacklist of every scumbag user agent that crosses your path:

  • Without vigilant cultivation, the “install-and-forget” blacklist will eventually become obsolete due to constantly evolving user agents.
  • With vigilant cultivation, the “ever-expanding” blacklist will eventually compromise the performance of your site.

Although blacklisting hundreds of user agents provides an effective dose of short-term immunization, it is obviously a “no-win” situation in the long-term scheme of things.

A Better Approach to Blacklisting

Although I no longer depend on a massive user-agent blacklist as my primary line of defense, I continue to blacklist a select collection of the worst agents as a way to reinforce a more comprehensive and multifaceted security strategy. This continuously evolving new approach happens to be the focus of this current series of articles. In each of the five articles, I present a different piece of of the overall strategy, which will culminate in the release of the complete, next-generation 3G Blacklist.

Within the context of this new multifaceted security strategy, endless blacklists of user agents are replaced with the following, highly selective, hand-picked set of blocked agents:

# BLACKLISTED USER AGENTS
SetEnvIfNoCase User-Agent "Jakarta Commons" keep_out
SetEnvIfNoCase User-Agent "Y!OASIS/TEST"    keep_out
SetEnvIfNoCase User-Agent "libwww-perl"     keep_out
SetEnvIfNoCase User-Agent "MOT-MPx220"      keep_out
SetEnvIfNoCase User-Agent "MJ12bot"         keep_out
SetEnvIfNoCase User-Agent "Nutch"           keep_out
SetEnvIfNoCase User-Agent "cr4nk"           keep_out
<Limit GET POST PUT>
 order allow,deny
 allow from all
 deny from env=keep_out
</Limit>

Are you surprised to see fewer than 10 lines replace more than 100? It may not seem clear at this point, but in the new blacklisting strategy, blocking endless collections of user agents, IP addresses, and referrer domains is no longer necessary for effective site protection. Personally, I hate having hundreds of lines of htaccess code that may or may not be protecting my website.

It’s time to redefine the purpose of the traditional user-agent blacklist. From this point on, think of user-agent blacklists as a secondary line of defense. Once the complete 3G Blacklist is revealed, blocking individual user agents (or individual anything) may not even be necessary. Nonetheless, retaining a concise, working collection of blocked user agents remains a powerful tool for providing a double-layer of protection against some of the most nefarious offenders.

As is, the concise user-agent blacklist presented above blocks some of the most beligerent, offensive agents currently plaguing the web. The idea here is to have at your disposal a working list of the most heinous bots. Once in place, it is simple to add or remove bots as needed. You may either cultivate the list yourself by pruning your Apache log files, or you may refresh the list upon subsequent updates from Perishable Presssubscribe! ). Either way, if you are following this series of articles, there is no need to implement anything at this time. Each of the techniques presented in the series will ultimately be integrated into an comprehensive security strategy known as the 3G Blacklist.

About the Code..

Before we close, let’s examine the process whereby this user-agent blacklist code operates. In each of the first seven lines, we are using Apache’s SetEnvIfNoCase directive to match anything within the User-Agent variable that resembles the quoted character string (e.g., libwww-perl). Then, each of these lines concludes by setting a keep_out variable to all such matches. In the last five lines, we prevent “getting” and/or “posting” for all user agents tagged with the associated keep_out variable. Everyone else enjoys access as usual. :)

Open Call for Fresh Entries..

In a sense, this new user-agent blacklist is a reborn version of the ultimate htaccess blacklist. As if we were to wipe the slate clean, reevaluate current conditions, and repopulate the list with a much more concise, customized, and relevant selection of user agents. The bots included in the list at this point are the result of my personal research, however, if you know of other sinister agents that are worthy, please drop a comment and share the news. If possible, provide a link to some form of documentation or other verification (blog, post, article, news report, whatever) to demonstrate malevolence. Muchas gracias! ;)

Next..

Stay tuned for the continuation of the Building the 3G Blacklist series, Part 4: Improving the RedirectMatch Directives of the Original 2G Blacklist. If you have yet to do so, I highly encourage you to subscribe to Perishable Press. As always, thank you for your generous attention.