Building the Perishable Press 4G Blacklist

Posted on March 8, 2009 in Websites by

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

[ Building the Hoover Dam, Part 2 ]

Encouraged by the results of the 2G Blacklist and determined to further improve site security, I continued collecting data, testing directives, and refining my strategy. Work on the next generation of the blacklist — the 3G — required many weeks of research, testing, and analysis. During the development process, five major improvements were implemented. Using pattern recognition, access immunization, and multiple layers of protection, the 3G Blacklist serves as an extremely effective security strategy for preventing a vast majority of common exploits. The list consists of four distinct parts, providing multiple layers of protection that synergize into a comprehensive defense mechanism. Further, as discussed in previous articles, the 3G Blacklist is designed to be as lightweight and flexible as possible, thereby facilitating periodic cultivation and maintenance. Once finished with the development and testing of the 3G, it was finally released for public use. Since then, many people have implemented the 3G Blacklist and the overall results have been very positive.

[ Building the Hoover Dam, Part 3 ]

But the work didn’t stop there. The 3G is very effective at preventing a majority of malicious exploits, but new and/or previously undetected attacks continue to hit the server. As much as I hate to say it, there are people in the world who have nothing better to do than to go around and try to mess with other people’s stuff. Especially on teh Web, this just seems to be a fact of life. Automated attacks, cracker exploits, and spam will never stop. And so I find myself diligently scouring my access and error logs in search of new patterns and methods to defend against. I spend two or three hours each week scanning my logs — line by line — taking notes, following leads, and researching the clues left behind by the brainless lice that continue to plague my defenses. Now, after several months of careful research, analysis and development, I combine this new insight and information with an improved understanding of HTAccess functionality to produce a completely reformulated security strategy referred to as the 4G Blacklist.

[ Building the Hoover Dam, Part 4 ]

For the previous generation of the Blacklist, I spent a great deal of time elucidating the ideas, methods, and data involved with its development. This information remains quite relevant and certainly applies to our current discussion of the upcoming 4G Blacklist. In this article, I share some of the thinking and analysis that went into the creation of the 4G, while outlining the development of its various subsections. All of this foreplay then finally pays off in the next article here at Perishable Press, as the much-anticipated 4G Blacklist is finally released. For now, let’s take a delightful romp through the building of the 4G Blacklist..

Forbidden Characters

One of the most commonly seen exploits involves the use of restricted or forbidden characters to manipulate the environment, trigger errors, or induce vulnerable behavior. These characters are seen both in the root portion of URLs and in query strings. Any character that is not one of the following must be encoded in order to appear legitimately within URLs:

Regular-use characters - allowed unencoded within URLs

$ - _ . + ! * ' ( ) ,

0 1 2 3 4 5 6 7 8 9

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Of these, the dollar sign ( $ ) and the comma ( , ) are commonly used in exploitative attacks, and have thus been blocked in the 4G Blacklist. The period ( . ) itself is required for file-name extensions and should not be blocked, however multiple periods are frequently used in directory-traversal exploit attempts and may be blocked with zero liability. Thus, the 4G Blacklist employs this directive:

RedirectMatch 403 \.\.

..to account for any and all of the following cases:

http://domain.tld/path/target/../string/
http://domain.tld/path/target/../../string/
http://domain.tld/path/target/../../../string/
http://domain.tld/path/target/../../../../string/
http://domain.tld/path/target/../../../../../string/
.
.
.

Likewise, the humble comma:

RedirectMatch 403 \,

..eliminating this sort of nonsense:

component/option,com_rss/feed,RSS2.0/no_html,1/
component/option,com_rss/feed,ATOM0.3/no_html,1/
component/option,com_rss/feed,ATOM0.3/no_html,1/

/press/component/option,com_facileforms/Itemid,98/
/press/component/option,com_facileforms/Itemid,109/
/press/component/option,com_facileforms/Itemid,108/
/press/component/option,com_facileforms/Itemid,109/
/press/component/option,com_facileforms/Itemid,109/

/stupid-htaccess-tricks/path/s,/
/stupid-htaccess-tricks/path/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/

And so on and so forth. In addition to these regular-use characters, there are also “special-use” and “unsafe” characters. The special-use characters play a specific role in the URL when present unencoded, and thus need to be encoded when present in literal form.

Special-use characters - literal use must be encoded

$ & + , / : ; = ? @

These characters are typically used in the construction of query strings and are rarely used legitimately in the root portion of the URL. These characters are commonly seen in the URLs associated with exploitative behavior, however it is only safe to block most of them from appearing in the root portion of the URL. In the 4G Blacklist, all but the ampersand ( & ), plus sign ( + ), and question mark ( ? ) are blocked for root-portion URLs.

[ Building the Hoover Dam, Part 5 ]

There are also characters that are considered “unsafe” for unencoded use in URLs because of the possibility of misinterpretation. The following characters should never appear in their literal form in any portion of the URL:

Unsafe characters - literal use must always be encoded

space " < > # % { } | \ ^ ~ [ ] `

As you might suspect, these characters are the most commonly seen entities in malicious attacks. The backslash ( \ ) is used to escape various characters, including the tab return ( \r ) and new line ( \n ) commands used in scripting. Thus, by blocking the backslash, we immediately clean up any exploits involving this sort of crud:

\\

/\\\\

/\\\'/

\n\n\n

\r\r\r

Most of the other characters are also blocked in the 4G Blacklist. For obvious reasons, the only “unsafe” characters that aren’t blocked are the blank space, the pound sign ( # ), and the percentage symbol ( % ). By blocking everything else, we effectively eliminate all future occurrences of this type of nonsense:

||0|0
i==r.K-1
i==r.K-1
l..,~f@nr
directories/title=directory
nu+in+html&btnG=Search&meta=/
/1x2n6l6bx6nt/001mAFC(-~l-xAou6.oCqAjB4ukkmrntoz1A/0011C/uikqijg4InjxGu.k
/1x2n6l6bx6nt/001mAFC(-~l-xAou6.oCqAjB4ukkmrntoz1A/0011C/uikqijg4InjxGu.k

Encoded Characters

There are also a number of escaped hexadecimal ASCII characters that have no business appearing in the URL. There are several different groups of these characters that are blocked in the 4G Blacklist. The first group contains hexadecimal-encoded characters beginning with “%0”, for example:


null  %00 - %07
bsp   %08
tab   %09
\n    %0A
null  %0B
null  %0C
\r    %0D

These characters are not used in “normal” URLs and are the tools of many types of malicious exploits. Fortunately, the entire lot of these characters is easily blocked with the following directive:

RedirectMatch 403 \%0

The next group of blocked character codes includes the following hexadecimal representations:

+  %2B
<  %3C
>  %3E
?  %3F
[  %5B
\  %5C
]  %5D
{  %7B
|  %7C
}  %7D
"  %22
'  %27
(  %28
)  %29

These hexadecimal entities are blocked from the root portion of the URL, the query string, or both. These encoded characters are not required for proper URL formation, and are commonly seen in malicious attacks. Thus, in blocking this small subset of hexadecimal representations, the 4G Blacklist puts an end to a great amount of malicious behavior, including the following types of exploits (taken from actual log entries):

%0d
\\%27
%0a%0a
/\\%27/
png%5d%7b/
gif%5d%7b/
jpg%5d,div/
png%5d,div/
jpeg%5d,div/
domain%5d%7b/
%27.admin_url(
directory%5d%7b/
15.1Y(m%5B3%5D,a).K
%5BURL%20to%20preload%5D/
/%5BURL%20to%20preload%5D/
png%5d,a%5bhref$=gif%5d%7b/
ng.fromCharCode(c)%7Dkode=x
/sitemap/%22+accesskey=%226/
/contact/%22+accesskey=%227/
/contact/%22+accesskey=%229/
\\%27.%20get_permalink()%20.%20
\\%27%20.%20$referer%20.%20\\%27
/%5BNext%20URL%20in%20series%5D/
/%2B%28200%2Bok%29%2BACCEPTED%2B/
a.12.4l(/\\%27*/\\%27)%5b0%5d==a/
%5BPrevious%20URL%20in%20series%5D/
15.2I(a.12.5p,1,/\\%274d/\\%27)==a/
%22+title=%22Permalink+for+article+/
/%5BPrevious%20URL%20in%20series%5D/
this.options%5Bthis.selectedIndex%5D.value
jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/
#comment-7327+++++++++++++%09+success+%28from+first+page%29;
jpeg%5d,a%5bhref$=jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/
jpg%5d,a%5bhref$=jpeg%5d,a%5bhref$=jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/

These attack strings are a common site in unfiltered access and error logs. They are generally appended or otherwise inserted into arrays of legitimate URLs, typically occurring in repetitive fashion over various time intervals. Obviously, these types of URLs do not exist on normal, everyday sites, and are better off blocked to conserve valuable resources and protect against malicious exploits.

[ Building the Hoover Dam, Part 6 ]

The third group of hexadecimal codes that are blocked by the 4G Blacklist consist of all the encoded representations for uncommon entities such as the following:

¡ - %A1
¢ - %A2
£ - %A3
¤ - %A4
¥ - %A5
¦ - %A6
§ - %A7

These characters certainly have their place, but there is typically no reason for them to occur in the URL construct. All of these rare characters are represented by hexadecimal encodings that begin with the first six letters of the alphabet. In the example above, only a handful of the “%A’s” are shown, but rest assured there are oodles of these oddities, each beginning with one of the following character pairs:

%A %B %C %D &E %F

Blocking these little rascals prevents the remainder of commonly employed hex strings from serving the villains. Here are a few strays cherry-picked after installing the 3G:

%7c%7c0%7c0
%d0%b2%d0%be%d1%82
PS%ef%bc%9a*%20%7b/

With the 4G in place, hapless crud like that just bounces off the walls. Moving on..

Common Patterns & Specific Exploits

Identifying common patterns and specific exploits is essential to maintaining an effective firewall. Over the course of the past few months, I have identified, extracted, and consolidated a wide variety of character strings, scripts, and file names commonly used by attackers when scanning for exploits. Common patterns were then identified and targeted via regular expressions in the 4G Blacklist in order to inoculate your site against future attacks. In this section of the article, we examine some of the patterns and exploits commonly appearing in the root portion of the variously targeted URLs. The following section will then examine patterns and trends observed in query strings.

[ Building the Hoover Dam, Part 7 ]

One of the most important things to keep in mind when examining these data is the inherent bias in the sample population due to the presence of the previous, 3G Blacklist. The 3G directives are effective at eliminating a vast portfolio of potential attacks, such that the exploit data cultivated for the 4G Blacklist represents a much narrower spectrum of malicious activity. Thus, the evolution of the Perishable Press Blacklist is in fact cumulative in nature, with each successive generation building on previously existing security.

That said, let’s examine some of the character-string exploits observed within my sphere of online domains. Some of the most commonly observed patterns involve character-strings representing various scripting functions, database queries, and template tags. Here are some examples of frequently seen attack strings (taken from actual error/access logs):

select(
convert(
db_name
sys_cpanel
remoteFile
servername
system_user
option_value
clientrequest
maincore.php
password.php

These types of strings generally appear appended to URLs along with more context-specific vulnerability-scanning characters. These strings represent the common patterns present within many types of exploit scanning and have no business in the URLs of typical websites. Incidentally, all of these items are duly blocked in the 4G Blacklist.

Other commonly seen character strings apparently target various XML vulnerabilities. Here are some typical examples extracted from the access/error logs:

xmlrpc.php
xmlrpc.php
adxmlrpc.php

.XMLHTTP
Msxml2.XMLHTTP
Msxml2.XMLHTTP
Msxml2.XMLHTTP
Microsoft.XMLHTTP
Microsoft.XMLHTTP

Unfortunately, the “xmlrpc.php” string represents an actual file used by WordPress and thus cannot be blocked. While scanning for exploits, attackers will append this file name to variously targeted URLs. The xmlrpc.php file is called consistently via the <head> section of the web document, such that there may be a way to target all illicit references by using some advanced mod_rewrite blacklisting directives. If this string is frequently appearing in your server logs, this may be something to investigate.

Moving on, we bring this section to a close with a smorgasbord of recurring exploit patterns. First up, unexplained requests for URLs appended with the following character strings:

&rptmode=

#comment-55862&rptmode=2
#comment-56320&rptmode=2
#comment-55797&rptmode=2
#comment-55797&rptmode=2
#comment-55872&rptmode=2
#comment-55872&rptmode=2
#comment-55872&rptmode=2

Yeah, good stuff. As a side note, if you are one of the billions of people who lives their entire life without feeling the need to probe innocent websites for “rptmode” vulnerabilities, consider yourself lucky. There are obviously a handful of hungry desperadoes out there who feel compelled to lower themselves into this utterly sad virtual arena. Needless to say, the illuminated careers of these lost souls effectively ends with the 4G Blacklist.

[ Building the Hoover Dam, Part 8 ]

Also on the menu, the mysteriously ubiquitous “macromates” probe. It’s like, “WTF” — I pity the poor holes out there who spend their time here on earth searching and scanning for macromates vulnerabilities, of all things. The payoff must be huge! Whatever. Here are a few representative lines, taken from thousands of logged attempted exploits:

macromates.com

macromates.com/screencasts/
macromates.com/screencasts/
macromates.com/screencasts/

Another interesting set of character strings commonly seen in the root portion of URLs targets WordPress database tables directly:

wp_options
wp_posts
wp_terms

Would love to get my hands on the creeps who do this kind of stuff! Oh well, in case that day never comes, the least I can do is prevent the bastards from getting anywhere near my database tables by throwing down the following directive in the 4G Blacklist:

RedirectMatch 404 wp\_

And finally, an otherwise random collection of poisonous little strings that are used in a wide range of vulnerability scans:

_vpi
http%
http;//
/query/
/(null)/
/Table/Latest/index.php

By blocking these arbitrary snippets from URL requests, we effectively eliminate countless attack vectors. Bonus points and honorable mention for identifying which of these strings is not blocked by the 4G Blacklist.

Query Strings

At the heart of any effective firewall technique is the fine art of query-string filtering. Because of the influential and sensitive nature of query strings, cleaning up query-string input is an essential part of any serious website security strategy. By manipulating query strings, the savvy attacker may gain access to, take control of, and ultimately corrupt or destroy your files, database, and the even server itself. As you can imagine, a great deal of time, effort, and research has gone into the development of the “query-string” portion of the 4G Blacklist.

[ Building the Hoover Dam, Part 9 ]

For the average site, when it comes to cleaning cracker crumbs from the query string, we may scrub with broad, sweeping strokes. By blocking a select handful of character strings, we can disinfect a vast majority of maliciously requested query strings. Due to the way in which query strings function, many different types of attacks contain similar characters. Consider this collection of malicious query strings that were harvested from actual access/error logs (note: line breaks inserted for readability):

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),concat(0x3a,user_login,
0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

st_newsletter/stnl_iframe.php?newsletter=-9999+UNION+SELECT+
concat(0x3a,user_login,0x3a,user_pass,0x3a)+FROM+wp_users--

wpSS/ss_load.php?ss_id=1+and+(1=0)+union+select+1,concat(0x3a,
user_login,0x3a,user_pass,0x3a),3,4+from+wp_users--&display=plain

wp-download.php?dl_id=null/**/union/**/all/**/select/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/from/**/wp_users/*

forums?forum=1&topic=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forum=1&topic=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

sf-forum?forum=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

sf-forum?forum=-99999/**/UNION/**/SELECT/**/0,concat(0x3a,
user_login,0x3a,user_pass,0x3a),0,0,0,0,0/**/FROM/**/wp_users/*

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),concat(0x3a,user_login,
0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

wp-adserve/adclick.php?id=-1%20union%20select%20concat(0x3a,
user_login,0x3a,user_pass,0x3a)%20from%20wp_users

fim_rss.php?album=-1%20union%20select%201,concat(0x3a,
user_login,0x3a,user_pass,0x3a),3,4,5,6,7%20from%20wp_users--

wp-cal/functions/editevent.php?id=-1%20union%20select%201,concat
(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6%20from%20wp_users--

wp-content/plugins/wp-cal/functions/editevent.php?id=-1%20union%20select%201,
concat(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6%20from%20wp_users--

wp-content/plugins/fgallery/fim_rss.php?album=-1%20union%20select%201,
concat(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6,7%20from%20wp_users--

wp-content/plugins/wp-adserve/adclick.php?id=-1%20union%20select
%20concat(0x3a,user_login,0x3a,user_pass,0x3a)%20from%20wp_users

wp-content/plugins/st_newsletter/stnl_iframe.php?newsletter=-9999
+UNION+SELECT+concat(0x3a,user_login,0x3a,user_pass,0x3a)+FROM+wp_users--

wp-content/plugins/wp-download/wp-download.php?dl_id=null/**/union/**/all
/**/select/**/concat(0x3a,user_login,0x3a,user_pass,0x3a)/**/from/**/wp_users/*

wp-content/plugins/wpSS/ss_load.php?ss_id=1+and+(1=0)+union+select+1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),3,4+from+wp_users--&display=plain

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat(0x3a,user_login,0x3a,
user_pass,0x3a),concat(0x3a,user_login,0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

sf-forum?forum=-99999/**/UNION/**/SELECT/**/0,concat(0x3a,
user_login,0x3a,user_pass,0x3a),0,0,0,0,0/**/FROM/**/wp_users/*

web/sf-forum?forum=-99999/**/UNION/**/SELECT/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forums?forum=1&topic=-99999/**/UNION/**/SELECT/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forum_feed.php?thread=-99999+union+select+1,2,3,concat(char(37),char(95),char(37),char(95),char(37),
user_login,char(37),char(95),char(37),char(95),char(37),user_pass,char(37),char(95),char(37),char(95),
char(37),user_email,char(37),char(95),char(37),char(95),char(37)),5,6,7+from+wp_users/*

These URLs are some of the worst that I have encountered, each one scanning for specific database-related vulnerabilities via uniquely configured query strings. As you can see, this type of exploit scanning utilizes a wide range of character strings. Fortunately, there are common elements that may be identified, targeted, and subsequently blocked via the 4G Blacklist. Thus, by including the following directive in our QUERY_STRING directives, we immediately eliminate the entire array of these types of attacks:

RewriteCond %{QUERY_STRING} ^.*(select|union).* [NC]

Powerful stuff. And, as there is no reason for these terms to appear in legitimate query-string constructs, we may secure our sites quietly and effectively and with zero affect on proper functionality. A similar broad-sweeping blacklist directive for query strings is the highly targeted “mosConfig” parameter. There are many scripts that maliciously attempt to set mosConfig values through the URL. Here are some actual examples (note: line breaks inserted for the last entry):

index.php?option=com_peoplebook&Itemid=&mosConfig_absolute_path=?
index.php?option=com_peoplebook&Itemid=&mosConfig_absolute_path=?
index.php?option=com_joomap&view=google&no_html=&mosConfig_absolute_path=?

/path/?option=com_simpleboard&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?
/path/?option=com_remository&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?
/path/?option=com_remository&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?

http%%20-%2037k%20-http://perishablepress.com/press/2006/08/28/spamless-email-address-via-javascript/
%20-%2037k%20-/http://perishablepress.com/press/2008/03/08/blacklist-candidate-number-2008-03-09/
index.php?option=com_remository&Itemid=&mosConfig_absolute_path=?

Each of these entries was logged as coming from a unique IP address, and each group of entries (first, second, and third) was recorded by way of a different user agent. Further, no referrer information was associated with any of these exploit attempts. But that’s “okay” with us, because we have identified and immunized against the common patterns found among the various query strings themselves. The first and most obvious character string that has been added to the 4G Blacklist is the mosConfig parameter. Revisit the previous examples and see how blocking that single term will prevent all future occurrences of any similar sort of “mosConfig”-type attack.

The astute reader will have also noticed a second common element within our mosConfig collection: the terminating question mark ( ? ). The question mark is not a viable blacklist candidate, however, because it is not reliably present in every mosConfig attack. Even so, blocking the question mark in the query string eliminates an even greater subset of malicious URLs. For example, scrubbing the query string of all question marks is an effective and efficient way to flush all of the following turds (taken from actual access/error logs):

?3,f.@45
?3,f.@45
?3,f.@45
?3,f.@45

index.php?=?
index.php?url=?
login.php?dir=?
setup.php?dir=?
index.php?mode=?
ask_password.php?dir=?
index.php?DOCUMENT_ROOT=?

error.php?error=uid=48(apache)%20gid=48(apache)%20groups=48(apache)%0A?

And that’s just a sampling. Blocking question marks is an excellent way to clean up an enormous amount of dangerous exploit attempts. This is the line of thinking that went into the development of the 4G Blacklist. There are many illicit characters not allowed in the query string that are frequently used by attackers while scanning for potential exploits. Thus, by blacklisting their presence, we apply another strong layer of defense to our website.

[ Building the Hoover Dam, Part 10 ]

In addition to blocking illicit characters, the QUERY_STRING directives of the 4G Blacklist also protect against a wide variety of more sophisticated attacks. For example, all query-string attempts to leverage base64 encoding, <script> tags, PHP globals, REQUEST variables, and so forth are blocked via 4G. Also neutralized is the threat of this sort of nonsense:

index.php?loopback
index.php?localhost
index.php?127.0.0.1

Of course, the QUERY_STRING directives of the 4G Blacklist are designed to protect your site against a wide variety of malicious attacks. Needless to say, a more comprehensive exploration of the methods and strategies involved with this part of the Blacklist would be major overkill, if not already ;)

Other Modifications

Briefly, and for the sake of future reference, here are some of my notes concerning the important differences between the 3G and 4G Blacklist.

Updated Directives

RedirectMatch 403 \/\,
RedirectMatch 403 \.\.\.
RedirectMatch 403 \_vpi\.xml
RedirectMatch 403 ImpEvData\.
RedirectMatch 403 blank\.php
RedirectMatch 403 errors\.php
RedirectMatch 403 config\.php
RedirectMatch 403 include\.php
RedirectMatch 403 display\.php
RedirectMatch 403 register\.php
Redirectmatch 403 password\.php
RedirectMatch 403 maincore\.php
RedirectMatch 403 authorize\.php
RedirectMatch 403 doeditconfig\.
RedirectMatch 403 function\.main
RedirectMatch 403 function\.mkdir
RedirectMatch 403 function\.opendir
RedirectMatch 403 function\.require
RedirectMatch 403 \/wp\-signup\.php
RedirectMatch 403 function\.array\-rand
RedirectMatch 403 comment\-template\.php
RedirectMatch 403 function\.require\-once

Removed Directives


RedirectMatch 403 f\-\.
RedirectMatch 403 ftp\:
RedirectMatch 403 ttp\:
RedirectMatch 403 blank\.
Redirectmatch 403 xmlrpc\.
RedirectMatch 403 et\.html
RedirectMatch 403 news\.php
RedirectMatch 403 menu\.php
RedirectMatch 403 main\.php
RedirectMatch 403 home\.php
RedirectMatch 403 view\.php
RedirectMatch 403 about\.php
RedirectMatch 403 block\.php
RedirectMatch 403 order\.php
RedirectMatch 403 search\.php
RedirectMatch 403 button\.php
RedirectMatch 403 middle\.php
RedirectMatch 403 \/login\.php
RedirectMatch 403 contact\.php
RedirectMatch 403 threads\.php
RedirectMatch 403 path\_to\_script
RedirectMatch 403 send\_reminders\.
RedirectMatch 403 syntax\_highlight\.
RedirectMatch 403 \/themes\/
RedirectMatch 403 \/plugins\/
RedirectMatch 403 \/modules\/
RedirectMatch 403 \/classes\/
RedirectMatch 403 \/scripts\/
RedirectMatch 403 \/includes\/
RedirectMatch 403 \/components\/
RedirectMatch 403 \/administrator\/

Redundant Directives

RedirectMatch 403 alt\=
RedirectMatch 403 \.\$url
RedirectMatch 403 \/\$url
RedirectMatch 403 \/\$link

The 5G Blacklist

During the development of the 4G Blacklist, I encountered a number of maliciously employed character strings that I could not block without invoking a more complicated set of rewrite directives. The items discussed below were not included in the 4G version, but may be integrated into the eventual 5G Blacklist.

Dot Nonsense

This type of nonsense is a very prevalent nuisance:

a.1
a.2
a.3
a.n
a.cross-link

..ad nauseam. The general pattern that would be useful in preventing this type of lunacy looks something like this:

"single alphanumeric character" . "single alphanumeric character" (or) "any sequence of alphanumeric characters that contains an invalid or unexpected character"

So, until I find time to craft something along those lines, the “dot-nuisance” entries will continue to plague teh access logz! Oh well..

Another stinky turd that just won’t go away on it’s own involves the myriad mutations of this little monster:

-moz-grabbing

How desperate these script kiddies must be to have to resort to such meaningless idiocy! Get a life, losers!

Other character strings that appear frequently but that are not readily blocked via existing methodology include the following:

(
)
-1/
cfrm/
/null

Also, there are a host of exploits referring to files that actually exist for various software packages, blogging platforms, and other web applications. Of course, for sites that do not use these files, blacklisting is a legitimate solution, but we certainly wouldn’t want to block any files that are actually in use. Two possibilities exist for defending against malicious requests that contain actual file names. We could either block any requests for such files that are not coming from our own server, or block any requests that deviate from the actual file path. In either case, the rules required to accomplish this transcend the current functionality provided by the 4G Blacklist.

navmenu.js
external.js
autoclear.js

page.html
info.html
noscript.html
no-javascript.html
forum_summary.html
static-content.html
nested-frameset.html

edit.php
categories.php
edit-comments.php
wp-admin/upgrade.php
wp-admin/wp-login.php

And finally, I leave you with this question: what are the consequences of blacklisting single versus double backslashes?

\ vs. \\

[ Building the Hoover Dam, Part 11 ]

Somebody Stop Me

That’s it for this fun-filled article. I would be surprised if anyone actually read all the way through, but then again, that’s not really the point of the exercise. After developing the 4G Blacklist, I wanted a clear, concise summary of all the notes and thinking that went into its creation. By combining everything into this single post, I save myself time and effort when referring to this information in the future. And hopefully, by publishing this lengthy diatribe on the Web, it will be of some educational value to others as well. In either case, stay tuned, because the 4G Blacklist is coming up in the next article..

Related articles

35 Responses

  1. [ Gravatar Icon ] Thom says:

    Awesome, great work, great read!
    I know for sure you’re making a whole lot of people very happy (and equally as much unhappy hopefully ;-))

  2. [ Gravatar Icon ] Louis says:

    I must say that it took me some courage to throw myself into the reading of this post, when I saw the size of the browser scroll bar :d

    At the begining of the post, you remind us of the 1G blacklist, and how it “eventually became unmanageable and ineffective”. Maintening a list is not the perfect solution as you have to add more and more items. With the new generations of blacklists, you don’t target the IP or user agents, but focus on the patterns of their attacks. That’s a more powerful solution, but aren’t we still dealing with a list? A list of patterns. You put it in those words:

    Thus, the evolution of the Perishable Press Blacklist is in fact cumulative in nature, with each successive generation building on previously existing security.

    Now, I wonder if the blacklist strategy is the right way to go.

    You say yourself that having this deep analysis of the patterns is quite time-consuming (”I spend two or three hours each week scanning my logs”). It also will stay that way because of the cumulative nature of the blacklist. All of this makes me think that the blacklist solution is not the way to go. It simply won’t ever be finished. It will require regular additions, forever. That’s why I wonder if the final solution wouldn’t be a whitelist.

    I don’t know if technically this is faisable, but if it is, that would be the way to go in my opinion. Does someone have some knowledge of past attempts, or technical issues that would prevent us from building such a list?

  3. [ Gravatar Icon ] Jeff Starr says:

    @Thom: Thanks, I definitely hope that this work will help people block out the garbage from their sites. As Louis will tell you, the method isn’t perfect, but it certainly serves well as another layer of protection.

    @Louis: I would be so bold as to say that, for publicly accessible websites, every security technique — even a whitelist — requires periodic attention and maintenance. The bad guys will never stop updating and improving their methods, so any form of blacklist or firewall will need to be improved as well. Likewise, unless you aren’t concerned with maximizing public access to your site, whitelisting will always need to take into account the changing landscape of legitimate user agents in order to minimize the number of false positives. There have been many different whitelists created, but they are all limited in the scope of their engendered accessibility.

  4. [ Gravatar Icon ] Louis says:

    @Jeff:

    The bad guys will never stop updating and improving their methods

    Yes, that’s why I thought a whitelist would be a better solution. These guys won’t be able to find any new method if we allow only what we are sure is okay.

    minimize the number of false positives

    I don’t understand how a false positive would be possible with a fonctionnal whitelist. Let’s say the whitelist contains all the URL that we want to make public. How could someone be forbidden access to a content if this content is known to be public. Erroneous URL are another problem.

    If the whitelist lists all the public URL of a website, and if this whitelist is updated programmatically, then I don’t see what would be wrong.

  5. [ Gravatar Icon ] Jeff Starr says:

    @Louis: I see your point, but unfortunately there is no way to whitelist every acceptable entity on the Web. If you read my comment carefully, you will see that I am focusing on public websites, which by definition must allow the general public access. Of course a whitelist would work if you only wanted to allow access to a percentage of users, but most of us are working with everyday websites that strive to maximize traffic. And even for private websites, what aspect of the user entity would you be whitelisting? If you allow everyone using Firefox, say, you are still leaving yourself open to exploits. Blocking by IP address is unrealistic as well. There is no way you could whitelist every individual address, and allowing various ranges will also let the bad guys through. The same goes for the referrer: easy to work around. And, your suggested method of whitelisting by URL is a bad idea because it would punish users for 404 and other legitimate errors. Further, even if you ignore all of these things, you would still have to spend time updating and maintaining your whitelist to account for these changing environmental variables. Unless you are talking about a private intranet or something, nothing stays the same and thus work is required to stay current and relevant. I don’t think there is any effective way of using a whitelist to protect a public website.

    As for minimizing the number the false positives, again, keep in mind that my argument clearly addresses the case of “publicly accessible websites.” Any legitimate user that is blocked while trying to access a whitelisted site could be considered a “false positive.”

  6. [ Gravatar Icon ] Louis says:

    @Jeff: I didn’t understand your “false positive” issue because I hadn’t even thought of blocking request using Users caracteristics! My idea really was to list every single valid URL, and make that a whitelist. You say that the problem with that idea is that it “punish[es] users for 404 and other legitimate errors”, but what if we redirected any URL which is not in the whitelist on a 404 page, with Google powered suggestions for example?

    That sound all right to me. What am I missing here? :(

  7. [ Gravatar Icon ] Jeff Starr says:

    Yes, I see that your focus was more on the “whitelisted-URL” idea. Sorry for straying into a general argument against whitelisting. Actually, in theory your idea is quite good, however it would be terribly expensive and/or time-consuming to implement. One reason for this involves the many different URLs that are requested during the loading of a single page, let alone every page on entire site. It may seem that you only need to whitelist page URLs, but as far as I know you would also need to whitelist every CSS file, JavaScript file, image, include, PHP script, and so on. For every page, theme, and resource. Such a whitelisting system would then have to account for every changed name and location for each of these resources, as well as take into account every new file, page, theme, script, and so on. Even more hideous to think about is the typical e-commerce site, with all of its constantly changing dynamic query-string URLs. Imagine trying to account for every legitimate URL on say, Amazon or something like that and I will just laugh at you (as a friend, of course). Finally, even if you decided to try doing something like this for own site, how would you then generalize and internalize it for others? And how on earth would you test something that supposedly blocks URLs that haven’t been invented yet? Again, I think the idea is sound in theory, but in practice would just be a nightmare to develop, test, and implement.

  8. [ Gravatar Icon ] Louis says:

    @Jeff: listing every static file wouldn’t be very hard as we have them on the hard-drive, and can add a new file to the list immediatly after it’s been uploaded, or remove it accordingly. Also, includes doesn’t generate external requests… as far as I know :o

    For the dynamic URL, I must say that it seems way harder. Though, if we coded algorithms to create these URL, we should be able to capture them on a list.

    Of course, the solution wouldn’t be a list of rules that any website could plug-in, like your 4G Blacklist does. It would be an internal system written in a chosen langage, and specific to the website. It’s a lot more work to implement, but the result is, in theory, a perfect defense.

    Also, on smaller websites that does not have the complexity of Amazon.com, we could write generic purpose code in popular langages (PHP, Ruby, Perl, Python), and make it some sort of “module”, for more ease of deployment.

  9. [ Gravatar Icon ] Jeff Starr says:

    Basically what you are describing here is the Apache 404 protocol, only instead of delivering a 404 page, you are sending non-existent requests to the home page. The only difference being that you would probably reduce performance significantly by listing every single file and page on the server. My current domain empire includes something like fifty-thousand files! To have to account for each of them with every URL request (including for the files themselves!) would just be insanely resource-intensive.

  10. [ Gravatar Icon ] Myra R. says:

    I read through the entire article. :p

    I love reading about your thoughts and ideas as you put together the Blacklists.

    Your posts are very enlightening and entertaining as well.

    Thank you for sharing your knowledge (and sense of humor) with us. I look forward reading your next article and putting your 4G Blacklist to use, once it’s released.

  11. [ Gravatar Icon ] Jeff Starr says:

    @Myra: Coolness! That makes my day :) I am guessing that many people will “skim” through and catch the main points, but the article is rather long, so kudos to you for taking the time to read through. Your kind words are encouraging, Myra — thanks for taking the time to respond! :)

  12. [ Gravatar Icon ] Louis says:

    @Jeff: so I guess my Whitelist dream will never be :)

    That’s too bad. I hate when the best solution is the “least bad”. The web is definitely the most hostile environment one can think of.

  13. [ Gravatar Icon ] Jeff Starr says:

    @Louis: Alright, to play devil’s advocate.. It would be possible to create a URL whitelist that essentially says:

    if the URL request is from the server, allow access
    if the URL request is not from the server, then do the following:
         if the request is for a legitimate, existing resource, allow access
         if the request if for anything else, deny access or redirect

    So the question is, why wouldn’t something like this work?

  14. [ Gravatar Icon ] Louis says:

    I don’t know if it’s possible to use Apache in combination with a database manager, but if it is, then our performance issue — which seem to be the only issue preventing us from digging into the whitelist solution — will be gone.

  15. [ Gravatar Icon ] Jeff Starr says:

    Yes, using a database would require a scripting language, which then could also be used to control traffic one way or another for various site resources. Such a database-based solution would probably improve performance, but even then I don’t think it would be as elegant or efficient as the logic presented in my previous comment, which may very well be possible to construct from within Apache.

  16. I was a passenger for a 2.5hr drive yesterday, reading this article helped me pass some time :p.

    I may have touched on this before in a previous comment; but as Louis states and you yourself state a list on its own is not a ‘do-all’ ‘end-of’ defence.

    Security is about layers, I think its safe to say using this, along with integration with blocking ip’s from project honey pot etc redundant useragents etc etc it will be a defence against miscreants though only the ‘un-knowledgeable’ new guys (a significant %) though the people who want to do harm will find a way.

  17. [ Gravatar Icon ] Louis says:

    if the request is for a legitimate, existing resource, allow access

    How do you know if the URL is legitimate without using a list, and thus, a database? Static files are ok to figure out, but how do you verify dynamic URL?

  18. [ Gravatar Icon ] Jeff Starr says:

    Easy. We take advantage of Apache’s 404 protocol: any request otherwise resulting in a 404 error is not allowed access or redirected. No database necessary :)

  19. [ Gravatar Icon ] Louis says:

    But Apache does not handle dynamic URL fully, I mean, if we add ?whatever after a valid URL, it becomes a valid URL too for Apache. That may be problematic as it lets the door open for pirates.

    Anyways, you seemed enthousiastic on this comment. Does that mean you may consider working on a whitelist solution? :^)

  20. [ Gravatar Icon ] Jeff Starr says:

    @Donace: Awesome — glad I helped you pass some time! ;) And you are absolutely correct that security is all “about layers.” Firewalls and blacklists can be very effective at filtering out bad requests and other external nonsense, but you also have to keep form input clean as well. A solid server configuration combined with a securely developed website is always important, regardless of any extra protection afforded by firewalls.

    As for blocking individual IPs, user agents, and referrers, I think that it is best to do so in moderation and for extreme and/or cases of well-known maliciousness. Avoid never-ending lists of appended entities and only block the items for as long as is necessary. My opinion, anyway :P

  21. [ Gravatar Icon ] Jeff Starr says:

    @Louis: Quite honestly, I was more enthused about corresponding with you again. It has been awhile and I was beginning to wonder if everything was okay with you (so don’t scare me like that!). So when I saw that you were back in town, so to speak, I dropped everything and threw myself into the discussion. Not to mention the fact that the topic is one of my favorites, as you are well aware. But, even so, you have given me something to think about and I will certainly be doing some experimenting to see if such a whitelist is indeed a lucrative idea. As you say, the query string would be a bit of a challenge, but well worth it.

  22. [ Gravatar Icon ] Louis says:

    Yes it would be worth it, I guaranty! The blacklist may protect from 99% of the attacks, but you have this little 1% in your head that makes you wake up in the night yelling “AAAaaAaaaah not the spam bots! Leave me alone!!!”. If you manage to get a whitelist mecanism working, the war will be over and you will have won.

    Now, thank you for your kind words; they are really touching. I missed discussing with you too! The thing is, I’ve really dropped interest in Wordpress and PHP. These are two important themes on PP, so I guess I had fewer comments to make recently. But when this post came in my RSS reader, I knew I had to write a little something — it’s one of your favorite topics after all :).

    I’m also not as much interested by the internet as I used to be — the technical internet I mean, from the builders side of the thing. For example, I loved to write on Mootools on my blog, but now, with the arrival of the new JavaScript engines natively in the browsers, I feel I have lost my time. The same thing occured with PHP when I played with Ruby — not even RoR, just Ruby. Simple and powerful solutions are emerging, and I feel like I’ve wasted a lot of time doing some great work, yes, but a work on a problem that didn’t had to be in the first place. I’m not sure if I’m clear here, but the idea is that I enjoy less reading tech posts these days.

    And the worst thing is that you seem to provide some great readings on MindFeed, some readings that don’t deal with 0 and 1, that are more human oriented. But I simply can’t enjoy these, because they are written in English. That’s so sad. I could read them, but it would become a pain to translate that quantity of text, and I don’t want reading you to become a pain :)

    But enough complaining about everything! This post was awesome as usual, and with a little luck, the spammers with stop spamming and the pirates will stop being assholes to people. You say they won’t? Okay, but summer is coming anyways and my batteries are recharging :)

  23. [ Gravatar Icon ] Jeff Starr says:

    I figured something like that was up (either that or school, which can be a huge burden) — I am glad that it wasn’t because of anything serious or traumatic, like illness or something. To tell you the truth, I have been doing a little soul-searching myself about all of this Web stuff. Just as you mention, it seems as if a lot of the stuff that I spend my time on, such as WordPress for example, is just becoming pointless due to the millions of people who are now writing about the same thing.

    But not just that, it seems like, for technical web dev stuff in general, it is just too competitive too make anything worthwhile. I don’t like the feeling that everything I do is sheer and utter vanity, because there will always be a bigger, better site doing work on the exact same stuff. As you know, I do this at my own expense and collect no revenue from my efforts, so when the inherent rewards of providing novel information become increasingly difficult, well, let’s just say that I am also looking into other, more satisfying endeavors.

    I hope you continue reading mindfeed (and Perishable Press), as I will definitely be heading in that direction more as the days unwind.. and don’t give me that crap about not being able to follow along because of the language barrier — you understand English better than half of the people I know.

    In any case, thanks for taking the time to chime in and discuss this post — it is always a pleasure exploring topics via critical and contemplative discussion.

  24. [ Gravatar Icon ] Sam Beale says:

    Wow, thankyou!

    That was the most interesting read I have encountered on the internet, probably ever. I’m actually supposed to be working right now but stumbled across your site and haven’t been able to tear myself away from this article or the discussion afterwards.

    Thankyou for providing a page that despite the scarily small scrollbar on the right kept me hooked from start to finish!

    Cheers,
    Sam.
    Ps. I looked through your previous site designs and this one just blows me away - amazing!

  25. [ Gravatar Icon ] Deb Phillips says:

    Wow, what a scary, vicious Internet world we live in. It’s a world where “Live and let live” is a Pollyanna concept. I found this post content completely compelling and a confirmation of some of my suspicions regarding certain activity I’ve noticed on my server.

    Thank you for all your work, and for sharing it with us. I really look forward to the 4G Blacklist — and I can’t wait to implement it.

    All the best,
    Deb

  26. [ Gravatar Icon ] Jeff Starr says:

    @Sam Beale: Well I’m certainly glad the tiny scrollbar didn’t keep you from reading! Thanks for the excellent feedback — it is my honor to share my work with you. Thanks for reading! :)

    @Deb Phillips: Ah yes, don’t even get me started on what a crazy world this is! It’s too bad that there are people out there who just don’t “get it”, and insist on messing with other people and their stuff. Before I fly off on a tangent here, allow me just to say “thank you” for taking the time to comment on the article — it is much appreciated. And, if all goes well, the 4G should be released early this week.

    Cheers,
    Jeff

  27. [ Gravatar Icon ] Mike says:

    As a non technical person I have persevered and read the whole thing all the way through, referring to various other sources to try to understand better what you are doing.
    I look forward to the 4g list and my attempt at installing it on my Joomla site.
    All I can say is WOW!
    Brilliant work, keep the faith.

  28. [ Gravatar Icon ] Jeff Starr says:

    @Mike: Thanks Mike, much appreciated. The 4G Blacklist is now available:

    http://perishablepress.com/press/2009/03/16/the-perishable-press-4g-blacklist/

    Cheers,
    Jeff

  29. [ Gravatar Icon ] WolfRage says:

    Thank you for this most informative read. I took the time to read this article because I am preparing to harden my framework. I am always trying to make applications that are as interoperable as possible. I am now trying to decide if I should simply implement your blacklist our if I should work to build my own.

    One reason I have for building it myself, but using your valuable insight is to of course learn. Another reason is that my framework is hand coded and does not use Word Press or any other CMS, although Drupal is a model for the design, but the implementation is unique to my needs.

    Either way i thank you for your insight and for taking the time to write such an article.

  30. [ Gravatar Icon ] Jeff Starr says:

    @WolfRage: Thanks for the feedback.

    I encourage anyone who is interested to do their own research and build their own customized blacklist. There is no single “perfect” blacklist that will suit all situations, so assembling one according to your specific needs seems like an ideal strategy.

    If you do end up with something solid, I encourage you to share it with the community to help others who may be running a similar setup.

    Hopefully my work on this and my many other blacklists will prove useful.

  31. [ Gravatar Icon ] WolfRage says:

    I will be sure to share what I learn with others just as you have with us.

    The depth of your site is quite impressive, how long have you been at this? I ask only because every time I come back, I find something new and interesting.

    I will have to work my way through your articles over the coming months so please excuse; if my comments pop up on older articles.

  32. [ Gravatar Icon ] Jeff Starr says:

    Hi WolfRage, I have been working online since around 2000, and began this site specifically in late 2005. As you can see, I have spent a great deal of time on it.

    Looking forward to hearing more from you on other posts. Some of the older ones are now closed, but there are still plenty open for comments.

  33. [ Gravatar Icon ] michael soriano says:

    do you recomment pasting this code in the root wp folder only? or subdirectories as well such as “wp-admin” “wp-content” etc.

    also, has anybody tried this with vaultpress plugin?

  34. [ Gravatar Icon ] Jeff Starr says:

    Yeah the root folder enables the rules to protect everything. Moving it to another location is fine also, but the execution is top-down, so anything outside of the directory won’t be protected by 5G. No need to include multiple copies in any subdirectories (wp-admin, et al) — one copy in root should do the trick.

  35. [ Gravatar Icon ] michael soriano says:

    Thanks Jeff. You can add Vaultpress in your plugin list. It works fine.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Please use basic markup. Wrap code with <code> tags!