6G Beta

[ 6G Blacklist (beta) ] Since releasing the 5G Blacklist earlier this year, malicious server scans and bad requests have surged with more novel attacks than I’ve seen since first getting into this stuff six years ago. In other words, now is the time to beef up security and lock things down. If you’re into monitoring your server and knowing your traffic, you may be observing the same recent spike in malicious activity. In response to these attacks, I’ve been secretly working on the next generation of G-series blacklist, the inevitable 6G Firewall.

Featured in this jam-packed post:

Before getting started, take a moment to read thru the important notes, which contain information about using blacklists, server requirements, licensing, and other details. Then after presenting the 6G beta, we’ll jog through some of the thinking and strategy going into the code. Even without trying the blacklist, reading through “building the 6G Blacklist” should prove a beneficial exercise in pattern-matching and protecting against malicious HTTP behavior.

6G Blacklist beta

The 6G consists of the following sections:

  • # 6G:[REQUEST STRINGS]
  • # 6G:[QUERY STRINGS]
  • # 6G:[USER AGENTS]
  • # 6G:[REFERRERS]
  • # 6G:[BAD IPS]

Each of these sections works independently of the others, such that you could, say, omit the entire query-string and IP-address blocks and the remaining sections would continue to work just fine. Mix-n-match to suit your needs. This code is formatted for deployment in your site’s root .htaccess file.

# 6G BLACKLIST/FIREWALL (beta)
# @ http://perishablepress.com/6g-beta/

# 6G:[REQUEST STRINGS]
<ifModule mod_alias.c>
 RedirectMatch 403 /(\$|\*)/?$
 RedirectMatch 403 (?i)(<|>|:|;|\'|\s)
 RedirectMatch 403 (?i)([a-zA-Z0-9]{18})
 RedirectMatch 403 (?i)(https?|ftp|php)\:/
 RedirectMatch 403 (?i)(\"|\.|\_|\&|\&amp)$
 RedirectMatch 403 (?i)(\=\\\'|\=\\%27|/\\\'/?)\.
 RedirectMatch 403 (?i)/(author\-panel|submit\-articles)/?$
 RedirectMatch 403 (?i)/(([0-9]{5})|([0-9]{6}))\-([0-9]{10})\.(gif|jpg|png)
 RedirectMatch 403 (?i)(\,|//|\)\+|/\,/|\{0\}|\(/\(|\.\.|\+\+\+|\||\\\"\\\")
 RedirectMatch 403 (?i)/uploads/([0-9]+)/([0-9]+)/(cache|cached|wp-opt|wp-supercache)\.php
 RedirectMatch 403 (?i)\.(asp|bash|cfg|cgi|dll|exe|git|hg|ini|jsp|log|mdb|out|sql|svn|swp|tar|rar|rdf|well)
 RedirectMatch 403 (?i)/(^$|1|addlink|btn_hover|contact?|dkscsearch|dompdf|easyboard|ezooms|formvars|fotter|fpw|i|imagemanager|index1|install|iprober|legacy\-comments|join|js\-scraper|mapcms|mobiquo|phpinfo|phpspy|pingserver|playing|postgres|product|register|scraper|shell|signup|single\-default|t|sqlpatch|test|textboxes.css|thumb|timthumb|topper|tz|ucp_profile|visit|webring.docs|webshell|wp\-lenks|wp\-links|wp\-plugin|wp\-signup|wpcima|zboard|zzr)\.php
 RedirectMatch 403 (?i)/(\=|\$\&|\_mm|administrator|auth|bytest|cachedyou|cgi\-|cvs|config\.|crossdomain\.xml|dbscripts|e107|etc/passwd|function\.array\-rand|function\.parse\-url|livecalendar|localhost|makefile|muieblackcat|release\-notes|rnd|sitecore|tapatalk|wwwroot)
 RedirectMatch 403 (?i)(\$\(this\)\.attr|\&pws\=0|\&t\=|\&title\=|\%7BshopURL\%7Dimages|\_vti\_|\(null\)|$itemURL|ask/data/ask|com\_crop|document\)\.ready\(fu|echo.*kae|eval\(|fckeditor\.htm|function.parse|function\(\)|gifamp|hilton.ch|index.php\&amp\;quot|jfbswww|monstermmorpg|msnbot\.htm|netdefender/hui|phpMyAdmin/config|proc/self|skin/zero_vote|/spaw2?|text/javascript|this.options)
</ifModule>

# 6G:[QUERY STRINGS]
<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} !^/$ [NC]
 RewriteCond %{QUERY_STRING} (mod|path|tag)= [NC,OR]
 RewriteCond %{QUERY_STRING} ([a-zA-Z0-9]{32}) [NC,OR]
 RewriteCond %{QUERY_STRING} (localhost|loopback|127\.0\.0\.1) [NC,OR]
 RewriteCond %{QUERY_STRING} (\?|\.\./|\.|\*|:|;|<|>|'|"|\)|\[|\]|=\\\'$|%0A|%0D|%22|%27|%3C|%3E|%00|%2e%2e) [NC,OR]
 RewriteCond %{QUERY_STRING} (benchmark|boot.ini|cast|declare|drop|echo.*kae|environ|etc/passwd|execute|input_file|insert|md5|mosconfig|scanner|select|set|union|update) [NC]
 RewriteRule .* - [F,L]
</IfModule>

# 6G:[USER AGENTS]
<ifModule mod_setenvif.c>
 #SetEnvIfNoCase User-Agent ^$ keep_out
 SetEnvIfNoCase User-Agent (<|>|'|&lt;|%0A|%0D|%27|%3C|%3E|%00|href\s) keep_out
 SetEnvIfNoCase User-Agent (archiver|binlar|casper|checkprivacy|clshttp|cmsworldmap|comodo|curl|diavol|dotbot|email|extract|feedfinder|flicky|grab|harvest|httrack|ia_archiver|jakarta|kmccrew|libwww|loader|miner|nikto|nutch|planetwork|purebot|pycurl|python|scan|skygrid|sucker|turnit|vikspider|wget|winhttp|youda|zmeu|zune) keep_out
 <limit GET POST PUT>
  Order Allow,Deny
  Allow from all
  Deny from env=keep_out
 </limit>
</ifModule>

# 6G:[REFERRERS]
<IfModule mod_rewrite.c>
 RewriteCond %{HTTP_REFERER} (<|>|'|%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
 RewriteCond %{HTTP_REFERER} ([a-zA-Z0-9]{32}) [NC]
 RewriteRule .* - [F,L]
</IfModule>

# 6G:[BAD IPS]
<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 # uncomment/edit/repeat next line to block IPs
 # Deny from 123.456.789
</Limit>

Whoop, there it is, but only for testing at this point. So let me know in the comments or via email with any discoveries on 6G beta. I’ll give it at least a month or so before rolling out the official release of the 6G. This beta version is admittedly heavy-handed in some areas, so plenty of edits are expected in the process of fine-tuning and dialing it in. Your help in this process is HUGE and appreciated by myself and other 6G users.

Alright, that’s that. New beta version, but how does it work? Let’s continue with some of the thinking and strategy going into the 6G Firewall..

Behind the scenes / development strategy

Filtering URL requests with Apache involves various modules and directives:

These modules enable us to filter different parts of the request, such as the user-agent, referrer, and request-string. They operate both autonomously and cumulatively, providing much control over specific HTTP activity and server traffic in general. Apache gives us numerous ways to blacklist bad requests and block bad user agents, requests & queries to prevent hacking. To better understand how the 6G Firewall works, let’s “zoom-in” on the different modules & directives and examine some concrete examples..

Front Line: Request strings

Apache’s mod_alias module enables our frontline of defense via the RedirectMatch directive. RM is used to filter the actual base part of the URL that is requested on the server. Here are some examples of the types of nasty URL requests that are easily blocked via mod_alias/RM:

http://example.com/wp-content/themes/mimboedited/timthumb.php
http://example.com/themes/SimplePress/timthumb.php?src=http%3a%2f
http://example.com/plugins/auto-attachments/timthumb.php?src=http%3A%2F%2Fpicasa.com.ipsupply.com.au%2Fwp-http://example.com/content%2Fuploads%2F2012%2F03%2FIN.php
http://example.com/timthumb.php?src=http%3a%2f
http://example.com/timthumb.php?src=http%3A%2F%2Fflickr.com.bpmohio.com%2Fbad.php
http://example.com/timthumb/timthumb.php?src=http%3A%2F%2Fflickr.com.bpmohio.com%2Fbad.php
http://example.com/timthumb.php?src=http%3A%2F
http://example.com/themes/coda/timtumb.php?src=
http://example.com/timthumb.php?src=http%3A%2F%2Fpicasa.com.ipsupply.com.au%2Fwp-content%2Fuploads%2F2012%2F03%2FIN.php
http://example.com/timthumb.phptimthumb.php?src=
http://example.com/timthumb.phptimthumb.php?src=

http://example.com/wp-content/themes/chapters/thumb.php?src=http%3a%2f%2fpicasa.combos.orgasmguide.org/tmp.php
http://example.com/wp-content/themes/chapters/thumb.php?src=http%3a%2f%2fpicasa.combos.orgasmguide.org/byroe.php

This is a great example as it shows varieties of possibly the most-scanned-for target ever: timthumb.php and its numerous incarnations. Malicious scanners also frequently target files named thumb.php and similar. Recursive scans can mean hundreds or thousands of requests hitting your server in short periods of time. This drains resources and negatively impacts site performance. As if that’s not reason enough to block such activity, if the target vulnerability is actually found on your server, it’s “game over”. So the 6G protects by blocking requests for both thumb.php and timthumb.php, using logic similar to this:

RedirectMatch 403 (?i)/(thumb|timthumb)\.php

That one line in your .htaccess file will block all URL requests that include either thumb.php and timthumb.php (not including the query string). This helps keep many malicious requests at bay, freeing up valuable resources for legit requests. Note that if you are timthumb or similar “thumb” script for your site, you will need to remove the thumb|timthumb| string from 6G (REQUEST STRINGS section).

The first “REQUEST-STRINGS” section in the 6G uses this strategy to block many different types of malicious requests. With each generation of the 6G, the various rules and patterns are further refined and updated to block the most dangerous and relevant types of requests. Pattern-matching with regular expressions enables us to block many different types of threats; however, as precise as we can get, there remain commonly scanned-for targets that are simply too common or too general to block effectively. Consider the following examples:

http://example.com/[path]/share
http://example.com/[path]]/login
http://example.com/[path]/signin
http://example.com/[path]/accepted
http://example.com/[path]/feed.php
http://example.com/[path]/form.php
http://example.com/[path]/format.php
http://example.com/[path]/plugin-editor.php
http://example.com/[path]/post.php
http://example.com/[path]/post-new.php
http://example.com/[path]/wp-comments-post.php
http://example.com/[path]/wp-conf.php
http://example.com/[path]/wp-error.php
http://example.com/[path]/wp-library.php
http://example.com/[path]/wp-post.php
http://example.com/[path]/update.php
http://example.com/[path]/upload.php

In these examples URLs, the target string is the part appearing immediately after the “http://example.com/[path]/”, which is necessary to include in this post because it prevents sloppy search engines and bad bots from following these supposedly “relative” links and generating further 404 errors. But I digress.. the point here is that malicious scans frequently target existing files that are too common to block in a widely distributed firewall such as 6G. If you’re getting hit with many requests for common/well-known files, my best advice is to custom-craft a few rules based on the actual structure and content of your site.

A quick example of this, let’s say the server is getting hammered by malicious requests targeting a file named post-new.php. This file name is common enough to warrant not blacklisting in the 6G, even though it is trivial to block on an individual basis. Here at Perishable Press, I’m running WordPress in a subdirectory named “/wp/”, so I know immediately that I can safely block all requests for post.php that aren’t located in the /wp/ directory.

RewriteCond %{REQUEST_URI} !^/wp/wp-admin/post.php [NC]
RewriteCond %{REQUEST_URI} /post.php [NC]
RewriteRule .* - [F,L]

Similarly, as the post.php file is located in a subdirectory and not root, we can use mod_aliasRedirectMatch to block all requests for the file in a root-install of WordPress:

RedirectMatch 403 ^/wp-admin/post.php

With either of these methods, other common files are easily added to the rule, safely eliminating extraneous requests for non-existent files. This example serves to demonstrate one of the shortcomings of any copy/paste blacklist, while illustrating the importance of customizing and fine-tuning your own security strategy.

Filtering Query strings

Some URLs include a query-string, which is appended to the URL via question mark (?). Query strings tend to look like gibberish or random strings to the uninitiated, but are actually highly specific, well-structured data used to communicate between browser and server. Without knowing what’s happening on your server, it may difficult to discern between good and bad query-string requests, but there are some things to look for:

  • Unusual and/or unexpected characters such as additional question marks, angled brackets, asterix, and so on
  • Unencoded characters that should be encoded, such as these: $ & + , / : ; = ? @
  • Super-long random-looking strings of encoded gibberish, alphanumeric or laced with symbols such as %
  • Super-short query strings that may seem to terminate abruptly, often with a single quote ('), double quote ("), or equal sign (=)

There are other signs as well, but ultimately it comes down to whether the request is understood or not by the server. If it’s not, the request could be a simple 404 error or similar, or it could be malicious. Generally the one-off 404s are the result of typos or other human errors, and tend to appear sporadically or infrequently in the server-access logs. Contrast this with malicious query-string requests that occur frequently, in rapid succession, targeting non-existent files with encoded gibberish and other nonsense.

With the 5G Blacklist in place, many evil query-string requests are blocked, but with the recent surge of scanning activity, a new breed of encoded nasty was getting through, looking similar to these examples:

?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vY3NzLWltYWdlLWNhY2hpbmcv==
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vaHRtbDUtdGFibGUtdGVtcGxhdGUv==
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vYmFzaWMtZG9zLWNvbW1hbmRzLw==
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vd2hhdC1pcy1teS13b3JkcHJlc3MtZmVlZC11cmwv
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vcHJlc3MvMjAwNy8wMS8xNi9tYXhpbXVtLWFuZC1taW5pbXVtLWhlaWdodC1hbmQtd2lkdGgtaW4taW50ZXJuZXQtZXhwbG9yZXIv
?actions=get_wp_version%2Cget_plugins%2Cget_themes%2Csupports_backups%2Cget_filesystem_method&wpr_api_key=15644F32D7D80B3150710834D8F406E9&t=1335026415
?actions=get_wp_version%2Cget_plugins%2Cget_themes%2Csupports_backups%2Cget_filesystem_method&wpr_api_key=15644F32D7D80B3150710834D8F406E9&t=1335026385

As you can see, these malicious strings contain numerous common-denominators that could be matched against, such as:

  • %2C matching the UTF-8 (hex) encoded encoded comma (,) would be partially effective
  • == matching two equal signs would be partially effective
  • Other character combinations..?

We could match the hex-encoded comma, but that’s such a common character that it would cause more problems than it would solve (in most cases), so really not an option. Looking closely at other possible character-combinations, suddenly the “least-common denominator” hits you: long, random sequences of alphanumeric characters appear in all of these examples, and many others that I’ve encountered. Thus, in the query-string section of the 6G, excessively long strings of alphanumeric characters are effectively blocked with the following rule:

RewriteCond %{QUERY_STRING} ([a-zA-Z0-9]{32}) [NC,OR]

Yeah.. the trick here is choosing the optimal number of sequential characters to match against. If we set the match to, say, {16}, the number of false positives increases; conversely, if we set the match to a larger number, such as {64}, the number of false negatives increases. So once again it’s all about finding the balance.

Important note about placement of the 6G query-string rules within the .htaccess file. If the query-string rules don’t seem to be working, try moving them to appear before any other mod_rewrite rules that may be in play. I’m not sure why this is the case, but I think it has something to do with the query-string data being unavailable for processing after the first encounter with mod_rewrite. Any info on this would be appreciated :)

Blocking Bad User-agents

The next two sections in the 6G protect against some of the worst user-agents and referrers from messing with your site. The technique is essentially the same as with the request-string and query-string sections, but filters different properties of the URI request.

The specified user-agent of a request may consist of multiple elements, and it may be empty. Previous versions of the g-series blacklist block empty (or “blank”) user-agents with the following rule:

SetEnvIfNoCase User-Agent ^$ keep_out

This rule “flags” any request from a blank user-agent, and worked well for many years. These days, however, social-media, mobile apps, PayPal, and certain Ajax requests frequently use an empty string as the user-agent when interacting with the server. For example, Google requires the blank user-agent in order to display thumbnails for Google+. So at this point the pros/cons of blocking bad empty requests is a no-brainer and the rule is now “deprecated” (commented-out) with a pound-sign (#).

Beyond this, the 6G USER-AGENTS section includes new rules

  • binlar
  • nutch
  • sucker
  • zmeu

Plus around 20 other nasty agents are blocked in the 5G, with the entire “USER-AGENT” section included as sort of a template for individual customization. Unfortunately, there are increasing numbers of malicious strings being passed as the user-agent, so the 6G includes more protection in this area. The 6G not only blocks additional well-known bad agents, it protects against encoded strings, forbidden characters, and other malicious garbage. Most of this is accomplished with a single new directive:

SetEnvIfNoCase User-Agent (<|>|'|&lt;|%0A|%0D|%27|%3C|%3E|%00|href\s) keep_out

These character strings have no business appearing in the user-agent string. Most if not all of the widely used browsers such as Firefox, Chrome, Opera, IE, mobile browsers, feed readers, and even borderline/questionable scripts and bots refrain from suing any of these forbidden characters in their user-agent description. For example, here is Chrome’s reported user-agent:

Mozilla/5.0 Macintosh Intel Mac OS X 10_6_8 AppleWebKit/536.5 KHTML, like Gecko Chrome/19.0.1084.46 Safari/536.5

Legitimate user-agents contain only valid strings, so blocking illegal characters is an effective way to filter directory-traversals, XSS attacks, and other malicious exploits.

Blocking Bad Referrers

The 6G Firewall/Blacklist also includes new directives for blocking bad referrers. The strategy here is similar to that of the additional QUERY-STRING rules: filtering malicious character-strings to protect against bad referrers. Referrer information isn’t always included with the request, so we don’t want to block blank referrers, but forbidden characters are safely blocked, as are long strings (32 characters or more) of strictly alphanumeric characters. A simple and effective strategy using the following two filters:

RewriteCond %{HTTP_REFERER} (<|>|'|%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
RewriteCond %{HTTP_REFERER} ([a-zA-Z0-9]{32}) [NC]

This also serves as a template for further customization. If you’re seeing lots of weird referrers filling your access/error logs, the REFERRERS section of the 6G will help to curb the riff-raff.

Blocking Bad IPs

Blocking by IP address is best used for specific threats, either individual, or by region, country, or similar. With a strong firewall, blocking IPs is unnecessary unless someone or something is attacking you specifically with requests that aren’t being blocked. I’ve heard from a number of people saying that their sites are being targeted/harassed by weird stalkers, enemies, spurned lovers, and it goes on and on. I’ve experienced this through a chat/forum site that had attracted all sorts of low-life, bottom-feeding douche-bags. They would just jump into the chat at random and ruin the conversation with potty humor and juvenile slurs. The PHP blacklist for the chat script wasn’t catching a lot of the garbage, so it was a perfect time to check the logs and ban the fools individually. After a bit of research and a few lines of .htaccess, the idiots were gone and peace was restored to the chat forum.

Thus, for the purpose of blocking individual threats the “bad-IPs” section of the 6G is entirely optional and intended as a template to use should the need arise. By default, the bad-IPs section in the 6G is empty, but over the past few months I’ve assembled my own private collection of blacklisted IP addresses. These are the some of the worst offenders I’ve seen this year:

# 2012 IP Blacklist
 Deny from 24.213.139.114
 Deny from 87.144.218.222
 Deny from 95.5.32.79
 Deny from 213.251.186.27
 Deny from 88.191.93.186
 Deny from 91.121.136.44
 Deny from 50.56.92.47
 Deny from 174.143.148.105
 Deny from 82.170.168.91
 Deny from 24.213.139.114
 Deny from 61.147.110.14
 Deny from 188.134.42.65
 Deny from 122.164.215.155
 Deny from 65.49.68.173
 Deny from 220.155.1.166
 Deny from 218.38.16.26
 Deny from 50.56.92.47
 Deny from 24.213.139.114
 Deny from 91.200.19.84
 Deny from 31.44.199.131
 Deny from 49.50.8.63

Including these IPs is entirely optional — they are provided here mostly for reference, but also for über-paranoid faction ;)

Further reading..

For more information on blacklisting, regular-expressions, and .htaccess methods, here are some choice offerings from the archives:

And of course, many more articles in the Perishable Press Archives.

Thanks to..

Thank you to everyone who contributes to the g-series blacklist with feedback, suggestions, test-results, and links. Specifically for the 6G beta, huge thanks goes to Ken Dawes and Andy Wrigley for their generous help.

Important notes..

This is the beta release of the 6G Blacklist. There have been many improvements, including optimized code, greater accuracy, and better overall protection. I’ve been running the 6G (in its various incarnation) here at Perishable Press for the past several weeks and have been well-pleased with the results. The 6G is pretty slick stuff, but there are some important things to keep in mind:

It takes more than a blacklist to secure your site
No one single security measure is perfect; good security is the result of many concerted strategic layers of protection. The 6G is designed to better secure your site by adding a strong layer of protection.
Sometimes blacklists block legit requests
A perfect firewall would block only bad traffic, but in reality it’s inevitable that some good requests get blocked. The goal is to keep the number of false positives to a minimum while maximizing the effectiveness of the ruleset. It’s a statistical game of sorts.
Resolving issues..
If/when you do encounter a potential false positive (e.g., you can’t load a certain page), there is a simple way to determine if it’s that crazy chunk of blacklist code you stuck into your .htaccess file. If you remove the blacklist and the page in question begins to work, well, you’ve can either forget about it or take a few moments to locate the offending rule and remove it from the list. I’ve found that it’s better to “comment out” rather than delete as it’s easier to keep track of things when the inevitable next version of the blacklist hits the streets.
This is beta.
And most importantly, this is the beta version of the 6G. As mentioned, there’s a lot of new stuff happening with this blacklist, and it’s super-important for me to thoroughly test via widest base possible. Only use this code if you are savvy and want to help out by reporting data, errors, logs, or whatever. That said, this “beta” version has been running flawlessly on multiple sites, including one that’s super-complex with many themes, plugins, and customizations (i.e., this site).
It’s all you.
Once the code leaves this site, you assume all responsibility. Always back up your original working .htaccess file and you should be good to go.
Server requirements
Linux/Apache or similar (if adapted). 6G is formatted for deployment in .htaccess files, and also works when formatted for use directly in the Apache main configuration file. For the required Apache modules, see this list.
License
GNU General Public License.

I freely share my work on the g-series blacklist to help the community better protect their sites against malicious activity. If you find it useful, please show support by linking and sharing so others may learn and benefit as well. Thanks.