Perishable Press 3G Blacklist
After much research and discussion, I have developed a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses.
Over time, these mega-lists became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks.
Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..
The 3G Blacklist
Work on the 3G Blacklist required several weeks of research, testing, and analysis. During the development process, five major improvements were discovered, documented, and implemented. Using pattern recognition, access immunization, and multiple layers of protection, the 3G Blacklist serves as an extremely effective security strategy for preventing a vast majority of common exploits. The list consists of four distinct parts, providing multiple layers of protection while synergizing into a comprehensive defense mechanism. Further, as discussed in previous articles, the 3G Blacklist is designed to be as lightweight and flexible as possible, thereby facilitating periodic cultivation and maintenance. Sound good? Here it is:
# PERISHABLE PRESS 3G BLACKLIST
# PART I: CHARACTER STRINGS
<IfModule mod_alias.c>
RedirectMatch 403 \:
RedirectMatch 403 \;
RedirectMatch 403 \<
RedirectMatch 403 \>
RedirectMatch 403 \/\,
RedirectMatch 403 \/\/
RedirectMatch 403 f\-\.
RedirectMatch 403 \.\.\.
RedirectMatch 403 \.inc
RedirectMatch 403 alt\=
RedirectMatch 403 ftp\:
RedirectMatch 403 ttp\:
RedirectMatch 403 \.\$url
RedirectMatch 403 \/\$url
RedirectMatch 403 \/\$link
RedirectMatch 403 news\.php
RedirectMatch 403 menu\.php
RedirectMatch 403 main\.php
RedirectMatch 403 home\.php
RedirectMatch 403 view\.php
RedirectMatch 403 about\.php
RedirectMatch 403 blank\.php
RedirectMatch 403 block\.php
RedirectMatch 403 order\.php
RedirectMatch 403 search\.php
RedirectMatch 403 errors\.php
RedirectMatch 403 button\.php
RedirectMatch 403 middle\.php
RedirectMatch 403 threads\.php
RedirectMatch 403 contact\.php
RedirectMatch 403 include\.php
RedirectMatch 403 display\.php
RedirectMatch 403 register\.php
RedirectMatch 403 authorize\.php
RedirectMatch 403 \/wp\-signup\.php
RedirectMatch 403 \/classes\/
RedirectMatch 403 \/includes\/
RedirectMatch 403 \/path\_to\_script\/
RedirectMatch 403 ImpEvData\.
RedirectMatch 403 head\_auth\.
RedirectMatch 403 db\_connect\.
RedirectMatch 403 check\_proxy\.
RedirectMatch 403 doeditconfig\.
RedirectMatch 403 submit\_links\.
RedirectMatch 403 change\_action\.
RedirectMatch 403 send\_reminders\.
RedirectMatch 403 comment\-template\.
RedirectMatch 403 syntax\_highlight\.
RedirectMatch 403 admin\_db\_utilities\.
RedirectMatch 403 admin\.webring\.docs\.
RedirectMatch 403 function\.main
RedirectMatch 403 function\.mkdir
RedirectMatch 403 function\.opendir
RedirectMatch 403 function\.require
RedirectMatch 403 function\.array\-rand
RedirectMatch 403 ref\.outcontrol
</IfModule>
# PART II: QUERY STRINGS
<ifmodule mod_rewrite.c>
RewriteCond %{QUERY_STRING} ftp\: [NC,OR]
RewriteCond %{QUERY_STRING} http\: [NC,OR]
RewriteCond %{QUERY_STRING} https\: [NC,OR]
RewriteCond %{QUERY_STRING} \[ [NC,OR]
RewriteCond %{QUERY_STRING} \] [NC]
RewriteRule .* - [F,L]
</ifmodule>
# PART III: USER AGENTS
SetEnvIfNoCase User-Agent "Jakarta Commons" keep_out
SetEnvIfNoCase User-Agent "Y!OASIS/TEST" keep_out
SetEnvIfNoCase User-Agent "libwww-perl" keep_out
SetEnvIfNoCase User-Agent "MOT-MPx220" keep_out
SetEnvIfNoCase User-Agent "MJ12bot" keep_out
SetEnvIfNoCase User-Agent "Nutch" keep_out
SetEnvIfNoCase User-Agent "cr4nk" keep_out
<Limit GET POST PUT>
order allow,deny
allow from all
deny from env=keep_out
</Limit>
# PART IV: IP ADDRESSES
<Limit GET POST PUT>
order allow,deny
allow from all
deny from 75.126.85.215 "# blacklist candidate 2008-01-02 = admin-ajax.php attack "
deny from 128.111.48.138 "# blacklist candidate 2008-02-10 = cryptic character strings "
deny from 87.248.163.54 "# blacklist candidate 2008-03-09 = block administrative attacks "
deny from 84.122.143.99 "# blacklist candidate 2008-04-27 = block clam store loser "
</Limit>
Installation and Usage
Before using the 3G Blacklist, check the following system requirements:
- Linux server running Apache
- Enabled Apache module:
mod_alias
- Enabled Apache module:
mod_rewrite
- Ability to edit your site’s root htaccess file (or)
- Ability to modify Apache’s server configuration file
With these requirements met, copy and paste the entire 3G Blacklist into either the root htaccess file or the server configuration file. After uploading, visit your site and check proper loading of as many different types of pages as possible. For example, if you are running a blogging platform (such as WordPress), test different page views (single, archive, category, home, etc.), log into and surf the admin pages (plugins, themes, options, posts, etc.), and also check peripheral elements such as individual images, available downloads, and alternate protocols (FTP, HTTPS, etc.).
While the 3G Blacklist is designed to target only the bad guys, the regular expressions used in the list may interfere with legitimate URL access. If this happens, the browsing device will display a 403 Forbidden
error. Don’t panic! Simply check the blocked URL, locate the matching blacklist string, and disable the directive by placing a pound sign ( #
) at the beginning of the associated line. Once the correct line is commented out, the blocked URL should load normally. Also, if you do happen to experience any conflicts involving the 3G Blacklist, please leave a comment or contact me directly. Thank you :)
Wrap Up..
As my readers know, I am serious about site security. Nothing gets my adrenaline pumping more than the thought of a giant meat grinder squirting out endless chunks of mangled cracker meat. Spam and other exploitative activity on the web has grown exponentially. Targeting and blocking individual agents and IP is no longer a viable strategy. By recognizing and immunizing against the broadest array of common attack elements, the 3G Blacklist maximizes resources while providing solid defense against malicious attacks.
Updates
Updates to the 3G Blacklist/firewall:
2008/05/14
Removed “RedirectMatch 403 \/scripts\/
” from the first part of the blacklist due to conflict with Mint Statistics.
2008/05/18
Removed the following three directives to facilitate Joomla functionality:
RedirectMatch 403 \/modules\/
RedirectMatch 403 \/components\/
RedirectMatch 403 \/administrator\/
2008/05/31
Removed “RedirectMatch 403 config\.php
” from the first part of the list to ensure proper functionality with the “visual-editing” feature of the WordPress Admin Area.
84 responses to “Perishable Press 3G Blacklist”
Thanks for the list. I am running a 1.5.8 Joomla! site (and am not technical – but have read up all I can about security. I installed in its entirety and it stops me accessing admin, but I just temp remove the # RedirectMatch 403 // using sftp then reset when I am finished.
1) Is it possible to exclude a certain directory &/or file but pick up others?
2) Why does the # RedirectMatch 403 /includes/ command appear to have no impact in FF3 but causes an error message to appear in IE7 which states “Overlib 4.10…..” This occurs when the events calendar is opened, and mouseovered. I think I have traced the code and the reason it should error is because of a call to an overlib JavaScript which is in a directory “public_html/includes/js/. The routine is by no means essential, but it bugs me not understanding why.
Does this mean that FF3 handles this in a different way or is it that maybe the command is in fact being ignored in FF3?
FYI I added another robot “Twiceler” (something to do with a Uni in USA) that was hammering my site to your blacklist. I do believe it is malicious but heavy on resources. It stopped it in its tracks.
I am looking forward to your next blacklist.
Thanks again.
Regards,
Mike
@Mike: For the first question, check out Apache’s
Files
andFilesMatch
directives. They may be used to target specific files and/or file types, and there are additional, similar directives for targeting specific types of requests as well.For your second question, different browsers handle requests for JavaScript according to different algorithms. I wish I knew the specifics of their protocols for this, but unfortunately this information falls outside of my area of expertise. To retain the functionality of any files that are otherwise blocked by the blacklist, simply comment out the interfering directives in the blacklist. The remaining rules in the blacklist will continue to function perfectly well.
And.. thanks for the heads up on the Twiceler bot — I will definitely be adding it to the blocked user agent section of the upcoming 4G Blacklist.
Thanks for the great feedback!
Cheers,
Jeff
Jeff,
Thx for the quick response, I will see what I can make of the suggested directives, at 1st glance they appear double Dutch.
Just to clarify about Twiceler, I meant to state I did NOT think it was malicious, but was hammering my site. I found a ref to it at the following http://www.tngforum.us/lofiversion/index.php?t2746.html
Regards,
Mike
Hello,
I have a private site, I don’t want any bots clogging up my log files. I want to block all of the following get srings:
"GET /robots.txt
"GET / HTTP/
"GET /images
"GET /favicon.ico
Can anyone help me with the exact code I need for the htaccess file to deny the above commands?
I only want to allow real users who type an exact url address or enter by following a specific URL link to my site.
Thanks
Sleepy22
Hi and thanks for all your hard work.
I’ve just popped the 3g Blacklist into my .htaccess file, and everything seems OK.
However, the Featurific plugin no longer does it’s thing http://featurific.com/ffw
Any idea what’s up?
Cheers,
Alex
And you have some lovely juicy stuff here!
Thanks for looking into this Jeff, and for the quick response.
I look forward to hearing your thoughts.
While you are on – do you think munging the name section of the comments code will kill off spam bots?
<input type="text" name="email" id="email" value="" size="22" tabindex="2" />
Mail (will not be published)
And yes, I am not the worlds greatest coder!
Cheers and thanks,
Alex
@Alex: Not sure, but I have downloaded a copy of the plugin and will investigate this weekend. Hopefully something will jump out at me. I will report back here early next week with my findings. Stay tuned..
Hi Jeff, and all!
I had some troubles with Google after using the 3G Blacklist. I wasn’t sure it’s because of it, so I didn’t write here. But I guess it’s something worth investigating. Google deindexed my little blog, every single article about 2 months ago, and all I could find about it in Google Webmaster tools was that there was a 4xx (yes, just 4xx, nothing specific!) error when googlebot tried accessing the blog. I have another blog on the same server that wasn’t affected, and that blog didn’t have the 3G Blacklist in the .htaccess.
After deleting the 3G entries, googlebot accessed my site, and slowly started reindexing all of the articles.
I wasn’t archiving my monthly logs then (now I do), so I can’t know what really happened! But that unhelpful “4xx error” from Google really drove me crazy!!!
Jeff, please let me know if there’s any way I can find out anything about this problem. For the 4G Blacklist’s sake! I’ll setup a new blog and use the 3G on it if needed.
@Tony: Yikes, I haven’t heard of anything like that happening before. Could you share a few of the URLs that were dropped from Google because of the 4xx error? This would enable me to look for a pattern or other reason why this may have happened. There are many things that can cause generic 4xx errors, but if the Blacklist was blocking access to all of your posts, then it may certainly be the culprit (but I hope not). Drop some of those URLs and let’s have a look.
@Alex: Interesting idea, but probably not effective, especially against spambots that bypass the form altogether and just hit the
comments.php
(or whatever) directly.Jeff, that was in December, and all of the articles disappeared then, but are back in Google’s index now. I didn’t want to say anything because I wasn’t sure, and nobody here had any problems. I should have emailed you I guess. Here are a couple of links:
http://www.bikeshake.com/pv-glider-mini-glider-balance-bike-review/
http://www.bikeshake.com/giro-hex-mountain-bike-helmet-review/
Maybe some plugin interfered, or maybe wasn’t compatible with the 3G. FYI, I used Super Cache then, which is temporarily deactivated now.
I hope you find the culprit. I really am waiting for the 4G Blacklist, and really want to taste it! :-)
Thanks in advance!
@Tony: Ah, Super Cache. That would explain it. Check this out:
http://blog.nerdstargamer.com/2008/05/30/perishable-press-3g-blacklist-and-wp-super-cache/
And, even better than Alissa’s fix is the fact the Super Cache plugin is reported to have fixed their HTAccess code.
Jeff,
The thing is, I had read all of the comments here, and Alissa’s post, before I pasted the 3G in my htaccess. And I also commented here a few months back about Super Cache not conflicting. Apparently only googlebot had problems accessing my site. Who knows, maybe it was drunk for a few days then? :-D