Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

5G Blacklist 2012

[ 5G (2012) ] The 5G Blacklist helps reduce the number of malicious URL requests that hit your website. It’s one of many ways to improve the security of your site and protect against evil exploits, bad requests, and other nefarious garbage. If you’re tired of all the slow, bloated security plugins and expensive 3rd-party services, the 5G Blacklist is a solid solution to help protect your Apache-powered site.

Update: Check out the new and improved 6G Firewall »

Evolution

After extensive beta testing, the 5G Blacklist/Firewall is solid and ready to help secure sites hosted on Apache servers. In addition to beta testing for the 5G, this is the 5th major update of my “G”-series blacklists. Here is a quick overview of its evolution.

  1. Ultimate htaccess Blacklist (Compressed Version)
  2. 2G Blacklist: Closing the Door on Malicious Attacks
  3. Perishable Press 3G Blacklist
  4. The Perishable Press 4G Blacklist
  5. 5G Firewall (Beta)

Along the way, I’ve explored a wide variety of different blacklist techniques. The 5G is the culmination of all these efforts, and will eventually be replaced by the imminent 6G Blacklist/Firewall.

What it does

The 5G Blacklist is a simple, flexible blacklist that checks all URI requests against a series of carefully constructed HTAccess directives. This happens quietly behind the scenes at the server level, saving resources for stuff like PHP and MySQL for all blocked requests.

How it works

Blacklists can block just about any part of a request: IP, user agent, request string, query string, referrer, and everything in between. But IP addresses change constantly, and user agents and referrers are easily spoofed. As discussed, request strings yield the best results: greater protection with fewer false positives.

The 5G works beautifully with WordPress, and should help any site conserve bandwidth and server resources while protecting against malicious activity.

How to use

To install the 5G Firewall, append the following code to your site’s root .htaccess:

# 5G BLACKLIST/FIREWALL
# @ https://perishablepress.com/5g-blacklist-2012/

# 5G:[QUERY STRINGS]
<ifModule mod_rewrite.c>
 RewriteEngine On
 RewriteBase /
 RewriteCond %{QUERY_STRING} (environ|localhost|mosconfig|scanner) [NC,OR]
 RewriteCond %{QUERY_STRING} (menu|mod|path|tag)\=\.?/? [NC,OR]
 RewriteCond %{QUERY_STRING} boot\.ini  [NC,OR]
 RewriteCond %{QUERY_STRING} echo.*kae  [NC,OR]
 RewriteCond %{QUERY_STRING} etc/passwd [NC,OR]
 RewriteCond %{QUERY_STRING} \=\\%27$   [NC,OR]
 RewriteCond %{QUERY_STRING} \=\\\'$    [NC,OR]
 RewriteCond %{QUERY_STRING} \.\./      [NC,OR]
 RewriteCond %{QUERY_STRING} \?         [NC,OR]
 RewriteCond %{QUERY_STRING} \:         [NC,OR]
 RewriteCond %{QUERY_STRING} \[         [NC,OR]
 RewriteCond %{QUERY_STRING} \]         [NC]
 RewriteRule .* - [F]
</ifModule>

# 5G:[USER AGENTS]
<ifModule mod_setenvif.c>
 SetEnvIfNoCase User-Agent ^$ keep_out
 SetEnvIfNoCase User-Agent (casper|cmsworldmap|diavol|dotbot)   keep_out
 SetEnvIfNoCase User-Agent (flicky|ia_archiver|jakarta|kmccrew) keep_out
 SetEnvIfNoCase User-Agent (libwww|planetwork|pycurl|skygrid)   keep_out
 SetEnvIfNoCase User-Agent (purebot|comodo|feedfinder|turnit)   keep_out
 SetEnvIfNoCase User-Agent (zmeu|nutch|vikspider|binlar|sucker) keep_out
 <limit GET POST PUT>
  Order Allow,Deny
  Allow from all
  Deny from env=keep_out
 </limit>
</ifModule>

# 5G:[REQUEST STRINGS]
<ifModule mod_alias.c>
 RedirectMatch 403 (https?|ftp|php)\://
 RedirectMatch 403 /(cgi|https?|ima|ucp)/
 RedirectMatch 403 /(Permanent|Better)$
 RedirectMatch 403 (\=\\\'|\=\\%27|/\\\'/?|\)\.css\()$
 RedirectMatch 403 (\,|//|\)\+|/\,/|\{0\}|\(/\(|\.\.\.|\+\+\+|\||\\\"\\\")
 RedirectMatch 403 \.(cgi|asp|aspx|cfg|dll|exe|jsp|mdb|sql|ini|rar)$
 RedirectMatch 403 /(contac|fpw|install|pingserver|register)\.php$
 RedirectMatch 403 (base64|crossdomain|localhost|wwwroot|e107\_)
 RedirectMatch 403 (eval\(|\_vti\_|\(null\)|echo.*kae|config\.xml)
 RedirectMatch 403 \.well\-known/host\-meta
 RedirectMatch 403 /function\.array\-rand
 RedirectMatch 403 \)\;\$\(this\)\.html\(
 RedirectMatch 403 proc/self/environ
 RedirectMatch 403 msnbot\.htm\)\.\_
 RedirectMatch 403 /ref\.outcontrol
 RedirectMatch 403 com\_cropimage
 RedirectMatch 403 indonesia\.htm
 RedirectMatch 403 \{\$itemURL\}
 RedirectMatch 403 function\(\)
 RedirectMatch 403 labels\.rdf
 RedirectMatch 403 /playing.php
 RedirectMatch 403 muieblackcat
</ifModule>

# 5G:[BAD IPS]
<limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 # uncomment/edit/repeat next line to block IPs
 # Deny from 123.456.789
</limit>

That’s the golden ticket right there. The 5G Firewall is serious protection for your website: extensively tested, plug-n-play, and completely free. “Grab, gulp, n go” as they say. For more information, see the beta article (and comments).

Troubleshooting

Remember, test thoroughly. If something stops working when the 5G is installed, try removing the 5G. If things start working normally again, you can either pass on the 5G or investigate further. Investigating further is straightforward using something like the halving method, where you remove chunks of the 5G until isolating and identifying the issue. Here is a quick example:

  • I’ve installed the 5G, thanks Jeff.
  • Uh-oh, the page at http://example.com/indonesia.html stopped loading
  • Hmm, the URL contains the phrase “indonesia”, so let’s check the 5G for it
  • Yep, there’s a rule that blocks indonesia\.htm
  • Removing that line resolves the issue, thanks me.

Is it okay to remove rules that are blocking your own pages? Yes, the only downside is that malicious requests that would have otherwise been blocked will now get through. The 5G will continue to block a massive volume of malicious requests — it’ll just be a bit less effective. The protective effect is cumulative, not dependent on any one rule. So customization is encouraged. Once you dial it in, you’re all set.

Disclaimer

The 5G Firewall is provided “as-is”, with the intention of helping site administrators protect their sites against bad requests and other malicious activity. The code is open and free to use and modify as long as the first two credit lines remain intact. By using this code you assume all risk & responsibility for anything that happens, whether good or bad. In short, use wisely, test thoroughly, don’t sue me.

Learn more..

To learn more about the theory and development of the 5G Firewall, check out my articles on building the 3G, 4G and 5G Blacklist. A search for “blacklist” in the sidebar should also yield many results.

Happy securing!

About the Author
Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.
Digging Into WordPress: Take your WordPress skills to the next level.

223 responses to “5G Blacklist 2012”

  1. thanks so much for sharing the awesomeness!!! I’m updating all the htaccess files now :D

  2. Grazie mille!

    Many thanks in italian (or 1000 thanks?)

  3. @Jeff

    what about jeffM comment : “..you might try commenting-out the https blocking sections from 5G…”?

    Thanks

    • Any of the 5G patterns/expressions may be removed if needed or called for.. the https is implied and not considered as part of the request string, so it’s purpose is to stop further instances of the string from being included in the URI.

  4. Everything seamed to work ok on a WordPress Multisite Install. I only had to comment out this line:

    #SetEnvIfNoCase User-Agent ^$ keep_out

    As it interfered with a slider image script on 1 page only. It was Not a timthumb script, just FYI.

    Please let me know what that line does, I assume it is if the User-Agent is blank?

    • Yep, that’s exactly what that does – blocks blank user-agents.

      • I’ve not seen “/” as a UA, only as the referrer field: in Nov/2011 from a fastwebserver.de box hosting two sites, one of which was running Zookabot. An earlier visit (May/2011) from another Zookabot script on the same network showed the referrer empty, “-“. So I wonder ‘why the slash?’

        For the UA, I just think ^-?$ is so cheap and sweet, and lets one make interesting noises at parties ;)

    • Not all empty user agents are truly empty. At least one bot has sent a single hyphen as its UA to dodge that regex. Only detectable in real time, it’s cloaked in the logs.

      Cover that with: ^-?$.

      FWIW, I’ve also seen ‘/’ sent as a referrer string.

      • Jeff Starr 2012/01/24 1:11 pm

        Thanks for teh codez jeffM!

        I’ve seen those request strings, crazy stuffz indeed.. say, what’s up with the “only detectable in real time” bit? That sounds interesting, as far as the cloaking and whatnot..

      • If I send you ‘-‘ as my UA, I defeat your ^$. You’ll only know I sent ‘-‘ if you read it off the wire (real time).

        In your logs you’ll see “-“, which is exactly what I sent, you think my UA was empty. It wasn’t. It was ‘-‘ (but your access log syntax cloaked it).

      • Jeff Starr 2012/01/24 5:04 pm

        Ah, I see what you mean.. I think this only applies to Apache’s built-in logs, correct? For custom logging I think you’ll pick up the true UA (ie, the dash), but it’s been awhile since I’ve seen anything similar roll thru.. thoughts?

      • Not sure what ‘custom logging’ you have in mind, Jeff. I suspect most hosted Apache setups use Combined Logfile Format, which does show referrer and UA.

        What’s sublime about the hyphen trick is that (in a regular CLF log) the faked UA is hidden in plain sight. You might never figure why ^$ didn’t work.

        My instincts tell me it may be more common than we realize.

      • Jeff Starr 2012/01/25 9:54 am

        Custom as in via PHP variables recorded in a log file.. I’m pretty sure the dash-only UAs are recorded, and if so I can say that I’ve not seen enough dash or slash UAs to be concerned about. If the custom methods are somehow also misrepresenting, then yes, it could be way more common than people think.

  5. Daniel Davidson 2012/01/26 11:38 am

    Thanks for making this available. For some reason the section:

    # 5G:[USER AGENTS]

    Was blocking Google Analytics. Am I just being stupid here, why would this be? Is it also blocking general Google indexing? To get it to do tracking I have had to comment out all the way to

    # 5G:[REQUEST STRINGS]

    Any help or advice would be appreciated!

  6. Jeff, thanks a million for this.

    Aside from having an amazing “easiness:effectiveness” ratio, it also provided a situation where I had to learn a little more PHP. (And that’s a good thing!)

    I have a number of scripts I’ve written running on various trusted hosts that call certain other scripts residing on my webserver and process the results. They all broke the instant I added 5G.

    A little troubleshooting revealed that the culprit was “SetEnvIfNoCase User-Agent ^$ keep_out”.

    Who knew that fopen and CURL don’t have a user agent by default?

    I could have just removed the rule, but I knew that was the cheap and dumb way out. So, thanks to you and a little research, I now know about using stream_context_create() with my fopen commands and CURLOPT_USERAGENT with my CURL commands. Everything is back in working order and all my future scripts will be better netizens.

    Thanks!

  7. Jeff,

    Thanks, we are running your 5G blacklist, along with some of the additions from the comments herein, on a new path based multisite install. If we run into any issues we will holler, but so far this seems great.

    Funnily enough, last night we had an attempted hack via the turkish telecom ip range – previously it was used by Iranian Muslim hackers targeting christian school sites – defacing them and bringing them down. This is when we got called in, and it will be interesting to see if the security measures we have implemented, including 5G and some of the other commenter suggestions, help put a stop to some of this crap.

    If we can buy you a beer, let us know!

  8. Is there anyway to add logging to the 5G script so that anyone blocked is logged in a custom log file? I would love to be able to monitor the results to ensure I am not blocking legit traffic and also see it blocking bad traffic.

    • Jeff Starr 2012/01/29 3:16 pm

      Yes you could create a custom script using PHP or just about anything, grab some variables from each request, and print it to a log file on the server. but there’s no need to replicate what’s already available to you in your server log(s). it’s trivial to scan for error codes or even use a search/find to highlight all of the errors. they contain much more information as well.

      • Thank you Jeff, I’ve tried googleing for information on a script to add logging, but I find every article but what I need. Do you or any readers have a link or two about how to add this logging to the htaccess file?

  9. This is a good article.

    I’m developing Apache Log reporter(for myself),and i’m sure that your information help me to improve detect&block web attacks.

    Keep up a good work :)
    (btw, I’m using IP Filter(ex: IPBlock,PeerBlock) to kick out from web server.)

  10. One very important problem: I discovered this morning that PayPal IPN is blocked because PayPal doesn’t provide a user agent even though people have been screaming at them for ten years to do so. Add in the fact that they change IP addresses from time to time, and there’s no way to block blank user agents without also breaking your ability to receive notifications from PayPal IPN. Ouch.

    • Does the PayPal IPN API provide a referrer?
      You might be able to create an exception based on that.

    • Yeh that’s lame of PayPal if true.. I would just comment out or remove the rule that blocks blank user-agents, at least until PayPal resolves the issue (i.e., never).

    • MickeyRoush 2012/02/01 4:54 am

      You may need to use a rule that states is not “!” the PayPal IPN. I’m not sure which one it is, but here are the links.

      https://ppmts.custhelp.com/app/answers/detail/a_id/92
      https://ppmts.custhelp.com/app/answers/detail/a_id/883/related/1/session/L2F2LzEvdGltZS8xMzE2NjUzODEyL3NpZC9LSDYtYkhFaw%3D%3D
      https://ppmts.custhelp.com/app/answers/detail/a_id/250/related/1/session/L2F2LzEvdGltZS8xMzE2NjUzODEyL3NpZC9LSDYtYkhFaw%3D%3D

      But you could actually whitelist PayPal from the 5G rules using a
      RewriteCond %{REMOTE_ADDR} !^xxx.xxx.xxx.xxx$

      Where xxx.xxx.xxx.xxx is the PayPal IPN.

      • Unfortunately, PayPal changes IP’s from time to time. To make matters worse, a big chunk of them are now handled dynamically by Akamai. Even ignoring the complications introduced by Akamai, whitelisting the IPs becomes an on-going chore, and you lose sales any time you fall behind.

      • MickeyRoush 2012/02/03 3:42 pm

        It’s not so much a task if you whitelist by say, Class B, leaving off the ending anchor.

        The Akamai information is covered within those links as well, so using that information it would be quite easy and straightforward to whitelist based on Class B or even Class C, depending on how specific you wanted it.

    • PP recommend DNS lookups, but your hoster may not permit that. Is there not something in IPN’s REQUEST_URI or QUERY_STRING that you could check?

  11. Jeff

    Can I suggest that the mod_setenvif wrapper is maybe not the right place for the empty user agent filter? It’s brutally (potentially fatally) unforgiving since it ignores context. IMHO, the mod_rewrite wrapper (processed later) is the place to express an intelligent decision about the UA.

    My 2 cents.

  12. I have also noticed this blacklist blocks the script from websitedefender.com anyone know a way to fix this?

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
.htaccess made easy: Improve site performance and security.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.