Building the Perishable Press 4G Blacklist

♦ Posted by Jeff Starr in .htaccess, Security

Updated August 8, 2024 • 35 comments

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

Encouraged by the results of the 2G Blacklist and determined to further improve site security, I continued collecting data, testing directives, and refining my strategy. Work on the next generation of the blacklist — the 3G — required many weeks of research, testing, and analysis. During the development process, five major improvements were implemented. Using pattern recognition, access immunization, and multiple layers of protection, the 3G Blacklist serves as an extremely effective security strategy for preventing a vast majority of common exploits. The list consists of four distinct parts, providing multiple layers of protection that synergize into a comprehensive defense mechanism. Further, as discussed in previous articles, the 3G Blacklist is designed to be as lightweight and flexible as possible, thereby facilitating periodic cultivation and maintenance. Once finished with the development and testing of the 3G, it was finally released for public use. Since then, many people have implemented the 3G Blacklist and the overall results have been very positive.

But the work didn’t stop there. The 3G is very effective at preventing a majority of malicious exploits, but new and/or previously undetected attacks continue to hit the server. As much as I hate to say it, there are people in the world who have nothing better to do than to go around and try to mess with other people’s stuff. Especially on teh Web, this just seems to be a fact of life. Automated attacks, cracker exploits, and spam will never stop. And so I find myself diligently scouring my access and error logs in search of new patterns and methods to defend against. I spend two or three hours each week scanning my logs — line by line — taking notes, following leads, and researching the clues left behind by the brainless lice that continue to plague my defenses. Now, after several months of careful research, analysis and development, I combine this new insight and information with an improved understanding of HTAccess functionality to produce a completely reformulated security strategy referred to as the 4G Blacklist.

For the previous generation of the Blacklist, I spent a great deal of time elucidating the ideas, methods, and data involved with its development. This information remains quite relevant and certainly applies to our current discussion of the upcoming 4G Blacklist. In this article, I share some of the thinking and analysis that went into the creation of the 4G, while outlining the development of its various subsections. All of this foreplay then finally pays off in the next article here at Perishable Press, as the much-anticipated 4G Blacklist is finally released. For now, let’s take a delightful romp through the building of the 4G Blacklist..

Forbidden Characters

One of the most commonly seen exploits involves the use of restricted or forbidden characters to manipulate the environment, trigger errors, or induce vulnerable behavior. These characters are seen both in the root portion of URLs and in query strings. Any character that is not one of the following must be encoded in order to appear legitimately within URLs:

Regular-use characters - allowed unencoded within URLs

$ - _ . + ! * ' ( ) ,

0 1 2 3 4 5 6 7 8 9

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Of these, the dollar sign ( $ ) and the comma ( , ) are commonly used in exploitative attacks, and have thus been blocked in the 4G Blacklist. The period ( . ) itself is required for file-name extensions and should not be blocked, however multiple periods are frequently used in directory-traversal exploit attempts and may be blocked with zero liability. Thus, the 4G Blacklist employs this directive:

RedirectMatch 403 \.\.

..to account for any and all of the following cases:

http://domain.tld/path/target/../string/
http://domain.tld/path/target/../../string/
http://domain.tld/path/target/../../../string/
http://domain.tld/path/target/../../../../string/
http://domain.tld/path/target/../../../../../string/
.
.
.

Likewise, the humble comma:

RedirectMatch 403 \,

..eliminating this sort of nonsense:

component/option,com_rss/feed,RSS2.0/no_html,1/
component/option,com_rss/feed,ATOM0.3/no_html,1/
component/option,com_rss/feed,ATOM0.3/no_html,1/

/press/component/option,com_facileforms/Itemid,98/
/press/component/option,com_facileforms/Itemid,109/
/press/component/option,com_facileforms/Itemid,108/
/press/component/option,com_facileforms/Itemid,109/
/press/component/option,com_facileforms/Itemid,109/

/stupid-htaccess-tricks/path/s,/
/stupid-htaccess-tricks/path/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/

And so on and so forth. In addition to these regular-use characters, there are also “special-use” and “unsafe” characters. The special-use characters play a specific role in the URL when present unencoded, and thus need to be encoded when present in literal form.

Special-use characters - literal use must be encoded

$ & + , / : ; = ? @

These characters are typically used in the construction of query strings and are rarely used legitimately in the root portion of the URL. These characters are commonly seen in the URLs associated with exploitative behavior, however it is only safe to block most of them from appearing in the root portion of the URL. In the 4G Blacklist, all but the ampersand ( & ), plus sign ( + ), and question mark ( ? ) are blocked for root-portion URLs.

There are also characters that are considered “unsafe” for unencoded use in URLs because of the possibility of misinterpretation. The following characters should never appear in their literal form in any portion of the URL:

Unsafe characters - literal use must always be encoded

space " < > # % { } | \ ^ ~ [ ] `

As you might suspect, these characters are the most commonly seen entities in malicious attacks. The backslash ( \ ) is used to escape various characters, including the tab return ( \r ) and new line ( \n ) commands used in scripting. Thus, by blocking the backslash, we immediately clean up any exploits involving this sort of crud:

\\

/\\\\

/\\\'/

\n\n\n

\r\r\r

Most of the other characters are also blocked in the 4G Blacklist. For obvious reasons, the only “unsafe” characters that aren’t blocked are the blank space, the pound sign ( # ), and the percentage symbol ( % ). By blocking everything else, we effectively eliminate all future occurrences of this type of nonsense:

||0|0
i==r.K-1
i==r.K-1
l..,~f@nr
directories/title=directory
nu+in+html&btnG=Search&meta=/
/1x2n6l6bx6nt/001mAFC(-~l-xAou6.oCqAjB4ukkmrntoz1A/0011C/uikqijg4InjxGu.k
/1x2n6l6bx6nt/001mAFC(-~l-xAou6.oCqAjB4ukkmrntoz1A/0011C/uikqijg4InjxGu.k

Encoded Characters

There are also a number of escaped hexadecimal ASCII characters that have no business appearing in the URL. There are several different groups of these characters that are blocked in the 4G Blacklist. The first group contains hexadecimal-encoded characters beginning with “%0”, for example:

null  %00 - %07
bsp   %08
tab   %09
\n    %0A
null  %0B
null  %0C
\r    %0D

These characters are not used in “normal” URLs and are the tools of many types of malicious exploits. Fortunately, the entire lot of these characters is easily blocked with the following directive:

RedirectMatch 403 \%0

The next group of blocked character codes includes the following hexadecimal representations:

+  %2B
<  %3C
>  %3E
?  %3F
[  %5B
\  %5C
]  %5D
{  %7B
|  %7C
}  %7D
"  %22
'  %27
(  %28
)  %29

These hexadecimal entities are blocked from the root portion of the URL, the query string, or both. These encoded characters are not required for proper URL formation, and are commonly seen in malicious attacks. Thus, in blocking this small subset of hexadecimal representations, the 4G Blacklist puts an end to a great amount of malicious behavior, including the following types of exploits (taken from actual log entries):

%0d
\\%27
%0a%0a
/\\%27/
png%5d%7b/
gif%5d%7b/
jpg%5d,div/
png%5d,div/
jpeg%5d,div/
domain%5d%7b/
%27.admin_url(
directory%5d%7b/
15.1Y(m%5B3%5D,a).K
%5BURL%20to%20preload%5D/
/%5BURL%20to%20preload%5D/
png%5d,a%5bhref$=gif%5d%7b/
ng.fromCharCode(c)%7Dkode=x
/sitemap/%22+accesskey=%226/
/contact/%22+accesskey=%227/
/contact/%22+accesskey=%229/
\\%27.%20get_permalink()%20.%20
\\%27%20.%20$referer%20.%20\\%27
/%5BNext%20URL%20in%20series%5D/
/%2B%28200%2Bok%29%2BACCEPTED%2B/
a.12.4l(/\\%27*/\\%27)%5b0%5d==a/
%5BPrevious%20URL%20in%20series%5D/
15.2I(a.12.5p,1,/\\%274d/\\%27)==a/
%22+title=%22Permalink+for+article+/
/%5BPrevious%20URL%20in%20series%5D/
this.options%5Bthis.selectedIndex%5D.value
jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/
#comment-7327+++++++++++++%09+success+%28from+first+page%29;
jpeg%5d,a%5bhref$=jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/
jpg%5d,a%5bhref$=jpeg%5d,a%5bhref$=jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/

These attack strings are a common site in unfiltered access and error logs. They are generally appended or otherwise inserted into arrays of legitimate URLs, typically occurring in repetitive fashion over various time intervals. Obviously, these types of URLs do not exist on normal, everyday sites, and are better off blocked to conserve valuable resources and protect against malicious exploits.

The third group of hexadecimal codes that are blocked by the 4G Blacklist consist of all the encoded representations for uncommon entities such as the following:

Â¡ - %A1
Â¢ - %A2
Â£ - %A3
Â¤ - %A4
Â¥ - %A5
Â¦ - %A6
Â§ - %A7

These characters certainly have their place, but there is typically no reason for them to occur in the URL construct. All of these rare characters are represented by hexadecimal encodings that begin with the first six letters of the alphabet. In the example above, only a handful of the “%A’s” are shown, but rest assured there are oodles of these oddities, each beginning with one of the following character pairs:

%A %B %C %D &E %F

Blocking these little rascals prevents the remainder of commonly employed hex strings from serving the villains. Here are a few strays cherry-picked after installing the 3G:

%7c%7c0%7c0
%d0%b2%d0%be%d1%82
PS%ef%bc%9a*%20%7b/

With the 4G in place, hapless crud like that just bounces off the walls. Moving on..

Common Patterns & Specific Exploits

Identifying common patterns and specific exploits is essential to maintaining an effective firewall. Over the course of the past few months, I have identified, extracted, and consolidated a wide variety of character strings, scripts, and file names commonly used by attackers when scanning for exploits. Common patterns were then identified and targeted via regular expressions in the 4G Blacklist in order to inoculate your site against future attacks. In this section of the article, we examine some of the patterns and exploits commonly appearing in the root portion of the variously targeted URLs. The following section will then examine patterns and trends observed in query strings.

One of the most important things to keep in mind when examining these data is the inherent bias in the sample population due to the presence of the previous, 3G Blacklist. The 3G directives are effective at eliminating a vast portfolio of potential attacks, such that the exploit data cultivated for the 4G Blacklist represents a much narrower spectrum of malicious activity. Thus, the evolution of the Perishable Press Blacklist is in fact cumulative in nature, with each successive generation building on previously existing security.

That said, let’s examine some of the character-string exploits observed within my sphere of online domains. Some of the most commonly observed patterns involve character-strings representing various scripting functions, database queries, and template tags. Here are some examples of frequently seen attack strings (taken from actual error/access logs):

select(
convert(
db_name
sys_cpanel
remoteFile
servername
system_user
option_value
clientrequest
maincore.php
password.php

These types of strings generally appear appended to URLs along with more context-specific vulnerability-scanning characters. These strings represent the common patterns present within many types of exploit scanning and have no business in the URLs of typical websites. Incidentally, all of these items are duly blocked in the 4G Blacklist.

Other commonly seen character strings apparently target various XML vulnerabilities. Here are some typical examples extracted from the access/error logs:

xmlrpc.php
xmlrpc.php
adxmlrpc.php

.XMLHTTP
Msxml2.XMLHTTP
Msxml2.XMLHTTP
Msxml2.XMLHTTP
Microsoft.XMLHTTP
Microsoft.XMLHTTP

Unfortunately, the “xmlrpc.php” string represents an actual file used by WordPress and thus cannot be blocked. While scanning for exploits, attackers will append this file name to variously targeted URLs. The xmlrpc.php file is called consistently via the <head> section of the web document, such that there may be a way to target all illicit references by using some advanced mod_rewrite blacklisting directives. If this string is frequently appearing in your server logs, this may be something to investigate.

Moving on, we bring this section to a close with a smorgasbord of recurring exploit patterns. First up, unexplained requests for URLs appended with the following character strings:

&rptmode=

#comment-55862&rptmode=2
#comment-56320&rptmode=2
#comment-55797&rptmode=2
#comment-55797&rptmode=2
#comment-55872&rptmode=2
#comment-55872&rptmode=2
#comment-55872&rptmode=2

Yeah, good stuff. As a side note, if you are one of the billions of people who lives their entire life without feeling the need to probe innocent websites for “rptmode” vulnerabilities, consider yourself lucky. There are obviously a handful of hungry desperadoes out there who feel compelled to lower themselves into this utterly sad virtual arena. Needless to say, the illuminated careers of these lost souls effectively ends with the 4G Blacklist.

Also on the menu, the mysteriously ubiquitous “macromates” probe. It’s like, “WTF” — I pity the poor holes out there who spend their time here on earth searching and scanning for macromates vulnerabilities, of all things. The payoff must be huge! Whatever. Here are a few representative lines, taken from thousands of logged attempted exploits:

macromates.com

macromates.com/screencasts/
macromates.com/screencasts/
macromates.com/screencasts/

Another interesting set of character strings commonly seen in the root portion of URLs targets WordPress database tables directly:

wp_options
wp_posts
wp_terms

Would love to get my hands on the creeps who do this kind of stuff! Oh well, in case that day never comes, the least I can do is prevent the bastards from getting anywhere near my database tables by throwing down the following directive in the 4G Blacklist:

RedirectMatch 404 wp\_

And finally, an otherwise random collection of poisonous little strings that are used in a wide range of vulnerability scans:

_vpi
http%
http;//
/query/
/(null)/
/Table/Latest/index.php

By blocking these arbitrary snippets from URL requests, we effectively eliminate countless attack vectors. Bonus points and honorable mention for identifying which of these strings is not blocked by the 4G Blacklist.

Query Strings

At the heart of any effective firewall technique is the fine art of query-string filtering. Because of the influential and sensitive nature of query strings, cleaning up query-string input is an essential part of any serious website security strategy. By manipulating query strings, the savvy attacker may gain access to, take control of, and ultimately corrupt or destroy your files, database, and the even server itself. As you can imagine, a great deal of time, effort, and research has gone into the development of the “query-string” portion of the 4G Blacklist.

For the average site, when it comes to cleaning cracker crumbs from the query string, we may scrub with broad, sweeping strokes. By blocking a select handful of character strings, we can disinfect a vast majority of maliciously requested query strings. Due to the way in which query strings function, many different types of attacks contain similar characters. Consider this collection of malicious query strings that were harvested from actual access/error logs (note: line breaks inserted for readability):

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),concat(0x3a,user_login,
0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

st_newsletter/stnl_iframe.php?newsletter=-9999+UNION+SELECT+
concat(0x3a,user_login,0x3a,user_pass,0x3a)+FROM+wp_users--

wpSS/ss_load.php?ss_id=1+and+(1=0)+union+select+1,concat(0x3a,
user_login,0x3a,user_pass,0x3a),3,4+from+wp_users--&display=plain

wp-download.php?dl_id=null/**/union/**/all/**/select/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/from/**/wp_users/*

forums?forum=1&topic=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forum=1&topic=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

sf-forum?forum=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

sf-forum?forum=-99999/**/UNION/**/SELECT/**/0,concat(0x3a,
user_login,0x3a,user_pass,0x3a),0,0,0,0,0/**/FROM/**/wp_users/*

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),concat(0x3a,user_login,
0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

wp-adserve/adclick.php?id=-1%20union%20select%20concat(0x3a,
user_login,0x3a,user_pass,0x3a)%20from%20wp_users

fim_rss.php?album=-1%20union%20select%201,concat(0x3a,
user_login,0x3a,user_pass,0x3a),3,4,5,6,7%20from%20wp_users--

wp-cal/functions/editevent.php?id=-1%20union%20select%201,concat
(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6%20from%20wp_users--

wp-content/plugins/wp-cal/functions/editevent.php?id=-1%20union%20select%201,
concat(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6%20from%20wp_users--

wp-content/plugins/fgallery/fim_rss.php?album=-1%20union%20select%201,
concat(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6,7%20from%20wp_users--

wp-content/plugins/wp-adserve/adclick.php?id=-1%20union%20select
%20concat(0x3a,user_login,0x3a,user_pass,0x3a)%20from%20wp_users

wp-content/plugins/st_newsletter/stnl_iframe.php?newsletter=-9999
+UNION+SELECT+concat(0x3a,user_login,0x3a,user_pass,0x3a)+FROM+wp_users--

wp-content/plugins/wp-download/wp-download.php?dl_id=null/**/union/**/all
/**/select/**/concat(0x3a,user_login,0x3a,user_pass,0x3a)/**/from/**/wp_users/*

wp-content/plugins/wpSS/ss_load.php?ss_id=1+and+(1=0)+union+select+1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),3,4+from+wp_users--&display=plain

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat(0x3a,user_login,0x3a,
user_pass,0x3a),concat(0x3a,user_login,0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

sf-forum?forum=-99999/**/UNION/**/SELECT/**/0,concat(0x3a,
user_login,0x3a,user_pass,0x3a),0,0,0,0,0/**/FROM/**/wp_users/*

web/sf-forum?forum=-99999/**/UNION/**/SELECT/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forums?forum=1&topic=-99999/**/UNION/**/SELECT/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forum_feed.php?thread=-99999+union+select+1,2,3,concat(char(37),char(95),char(37),char(95),char(37),
user_login,char(37),char(95),char(37),char(95),char(37),user_pass,char(37),char(95),char(37),char(95),
char(37),user_email,char(37),char(95),char(37),char(95),char(37)),5,6,7+from+wp_users/*

These URLs are some of the worst that I have encountered, each one scanning for specific database-related vulnerabilities via uniquely configured query strings. As you can see, this type of exploit scanning utilizes a wide range of character strings. Fortunately, there are common elements that may be identified, targeted, and subsequently blocked via the 4G Blacklist. Thus, by including the following directive in our QUERY_STRING directives, we immediately eliminate the entire array of these types of attacks:

RewriteCond %{QUERY_STRING} ^.*(select|union).* [NC]

Powerful stuff. And, as there is no reason for these terms to appear in legitimate query-string constructs, we may secure our sites quietly and effectively and with zero affect on proper functionality. A similar broad-sweeping blacklist directive for query strings is the highly targeted “mosConfig” parameter. There are many scripts that maliciously attempt to set mosConfig values through the URL. Here are some actual examples (note: line breaks inserted for the last entry):

index.php?option=com_peoplebook&Itemid=&mosConfig_absolute_path=?
index.php?option=com_peoplebook&Itemid=&mosConfig_absolute_path=?
index.php?option=com_joomap&view=google&no_html=&mosConfig_absolute_path=?

/path/?option=com_simpleboard&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?
/path/?option=com_remository&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?
/path/?option=com_remository&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?

http%%20-%2037k%20-https://perishablepress.com/press/2006/08/28/spamless-email-address-via-javascript/
%20-%2037k%20-/https://perishablepress.com/press/2008/03/08/blacklist-candidate-number-2008-03-09/
index.php?option=com_remository&Itemid=&mosConfig_absolute_path=?

Each of these entries was logged as coming from a unique IP address, and each group of entries (first, second, and third) was recorded by way of a different user agent. Further, no referrer information was associated with any of these exploit attempts. But that’s “okay” with us, because we have identified and immunized against the common patterns found among the various query strings themselves. The first and most obvious character string that has been added to the 4G Blacklist is the mosConfig parameter. Revisit the previous examples and see how blocking that single term will prevent all future occurrences of any similar sort of “mosConfig”-type attack.

The astute reader will have also noticed a second common element within our mosConfig collection: the terminating question mark ( ? ). The question mark is not a viable blacklist candidate, however, because it is not reliably present in every mosConfig attack. Even so, blocking the question mark in the query string eliminates an even greater subset of malicious URLs. For example, scrubbing the query string of all question marks is an effective and efficient way to flush all of the following turds (taken from actual access/error logs):

?3,f.@45
?3,f.@45
?3,f.@45
?3,f.@45

index.php?=?
index.php?url=?
login.php?dir=?
setup.php?dir=?
index.php?mode=?
ask_password.php?dir=?
index.php?DOCUMENT_ROOT=?

error.php?error=uid=48(apache)%20gid=48(apache)%20groups=48(apache)%0A?

And that’s just a sampling. Blocking question marks is an excellent way to clean up an enormous amount of dangerous exploit attempts. This is the line of thinking that went into the development of the 4G Blacklist. There are many illicit characters not allowed in the query string that are frequently used by attackers while scanning for potential exploits. Thus, by blacklisting their presence, we apply another strong layer of defense to our website.

In addition to blocking illicit characters, the QUERY_STRING directives of the 4G Blacklist also protect against a wide variety of more sophisticated attacks. For example, all query-string attempts to leverage base64 encoding, <script> tags, PHP globals, REQUEST variables, and so forth are blocked via 4G. Also neutralized is the threat of this sort of nonsense:

index.php?loopback
index.php?localhost
index.php?127.0.0.1

Of course, the QUERY_STRING directives of the 4G Blacklist are designed to protect your site against a wide variety of malicious attacks. Needless to say, a more comprehensive exploration of the methods and strategies involved with this part of the Blacklist would be major overkill, if not already ;)

Other Modifications

Briefly, and for the sake of future reference, here are some of my notes concerning the important differences between the 3G and 4G Blacklist.

Updated Directives

RedirectMatch 403 \/\,
RedirectMatch 403 \.\.\.
RedirectMatch 403 \_vpi\.xml
RedirectMatch 403 ImpEvData\.
RedirectMatch 403 blank\.php
RedirectMatch 403 errors\.php
RedirectMatch 403 config\.php
RedirectMatch 403 include\.php
RedirectMatch 403 display\.php
RedirectMatch 403 register\.php
Redirectmatch 403 password\.php
RedirectMatch 403 maincore\.php
RedirectMatch 403 authorize\.php
RedirectMatch 403 doeditconfig\.
RedirectMatch 403 function\.main
RedirectMatch 403 function\.mkdir
RedirectMatch 403 function\.opendir
RedirectMatch 403 function\.require
RedirectMatch 403 \/wp\-signup\.php
RedirectMatch 403 function\.array\-rand
RedirectMatch 403 comment\-template\.php
RedirectMatch 403 function\.require\-once

Removed Directives


RedirectMatch 403 f\-\.
RedirectMatch 403 ftp\:
RedirectMatch 403 ttp\:
RedirectMatch 403 blank\.
Redirectmatch 403 xmlrpc\.
RedirectMatch 403 et\.html
RedirectMatch 403 news\.php
RedirectMatch 403 menu\.php
RedirectMatch 403 main\.php
RedirectMatch 403 home\.php
RedirectMatch 403 view\.php
RedirectMatch 403 about\.php
RedirectMatch 403 block\.php
RedirectMatch 403 order\.php
RedirectMatch 403 search\.php
RedirectMatch 403 button\.php
RedirectMatch 403 middle\.php
RedirectMatch 403 \/login\.php
RedirectMatch 403 contact\.php
RedirectMatch 403 threads\.php
RedirectMatch 403 path\_to\_script
RedirectMatch 403 send\_reminders\.
RedirectMatch 403 syntax\_highlight\.
RedirectMatch 403 \/themes\/
RedirectMatch 403 \/plugins\/
RedirectMatch 403 \/modules\/
RedirectMatch 403 \/classes\/
RedirectMatch 403 \/scripts\/
RedirectMatch 403 \/includes\/
RedirectMatch 403 \/components\/
RedirectMatch 403 \/administrator\/

Redundant Directives

RedirectMatch 403 alt\=
RedirectMatch 403 \.\$url
RedirectMatch 403 \/\$url
RedirectMatch 403 \/\$link

The 5G Blacklist

During the development of the 4G Blacklist, I encountered a number of maliciously employed character strings that I could not block without invoking a more complicated set of rewrite directives. The items discussed below were not included in the 4G version, but may be integrated into the eventual 5G Blacklist.

Dot Nonsense

This type of nonsense is a very prevalent nuisance:

a.1
a.2
a.3
a.n
a.cross-link

..ad nauseam. The general pattern that would be useful in preventing this type of lunacy looks something like this:

"single alphanumeric character" . "single alphanumeric character" (or) "any sequence of alphanumeric characters that contains an invalid or unexpected character"

So, until I find time to craft something along those lines, the “dot-nuisance” entries will continue to plague teh access logz! Oh well..

Another stinky turd that just won’t go away on it’s own involves the myriad mutations of this little monster:

-moz-grabbing

How desperate these script kiddies must be to have to resort to such meaningless idiocy! Get a life, losers!

Other character strings that appear frequently but that are not readily blocked via existing methodology include the following:

(
)
-1/
cfrm/
/null

Also, there are a host of exploits referring to files that actually exist for various software packages, blogging platforms, and other web applications. Of course, for sites that do not use these files, blacklisting is a legitimate solution, but we certainly wouldn’t want to block any files that are actually in use. Two possibilities exist for defending against malicious requests that contain actual file names. We could either block any requests for such files that are not coming from our own server, or block any requests that deviate from the actual file path. In either case, the rules required to accomplish this transcend the current functionality provided by the 4G Blacklist.

navmenu.js
external.js
autoclear.js

page.html
info.html
noscript.html
no-javascript.html
forum_summary.html
static-content.html
nested-frameset.html

edit.php
categories.php
edit-comments.php
wp-admin/upgrade.php
wp-admin/wp-login.php

And finally, I leave you with this question: what are the consequences of blacklisting single versus double backslashes?

\ vs. \\

Somebody Stop Me

That’s it for this fun-filled article. I would be surprised if anyone actually read all the way through, but then again, that’s not really the point of the exercise. After developing the 4G Blacklist, I wanted a clear, concise summary of all the notes and thinking that went into its creation. By combining everything into this single post, I save myself time and effort when referring to this information in the future. And hopefully, by publishing this lengthy diatribe on the Web, it will be of some educational value to others as well. In either case, stay tuned, because the 4G Blacklist is coming up in the next article..

apache blacklist firewall ip mod_rewrite nG tips websites

About the Author

Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.

35 responses to “Building the Perishable Press 4G Blacklist”

Louis 2009/03/10 4:55 am

I must say that it took me some courage to throw myself into the reading of this post, when I saw the size of the browser scroll bar :d

At the begining of the post, you remind us of the 1G blacklist, and how it “eventually became unmanageable and ineffective”. Maintening a list is not the perfect solution as you have to add more and more items. With the new generations of blacklists, you don’t target the IP or user agents, but focus on the patterns of their attacks. That’s a more powerful solution, but aren’t we still dealing with a list? A list of patterns. You put it in those words:

Thus, the evolution of the Perishable Press Blacklist is in fact cumulative in nature, with each successive generation building on previously existing security.

Now, I wonder if the blacklist strategy is the right way to go.

You say yourself that having this deep analysis of the patterns is quite time-consuming (“I spend two or three hours each week scanning my logs”). It also will stay that way because of the cumulative nature of the blacklist. All of this makes me think that the blacklist solution is not the way to go. It simply won’t ever be finished. It will require regular additions, forever. That’s why I wonder if the final solution wouldn’t be a whitelist.

I don’t know if technically this is faisable, but if it is, that would be the way to go in my opinion. Does someone have some knowledge of past attempts, or technical issues that would prevent us from building such a list?
Thom 2009/03/10 4:07 am

Awesome, great work, great read!
I know for sure you’re making a whole lot of people very happy (and equally as much unhappy hopefully ;-))
Jeff Starr 2009/03/10 7:49 am • Post Author

@Thom: Thanks, I definitely hope that this work will help people block out the garbage from their sites. As Louis will tell you, the method isn’t perfect, but it certainly serves well as another layer of protection.

@Louis: I would be so bold as to say that, for publicly accessible websites, every security technique — even a whitelist — requires periodic attention and maintenance. The bad guys will never stop updating and improving their methods, so any form of blacklist or firewall will need to be improved as well. Likewise, unless you aren’t concerned with maximizing public access to your site, whitelisting will always need to take into account the changing landscape of legitimate user agents in order to minimize the number of false positives. There have been many different whitelists created, but they are all limited in the scope of their engendered accessibility.
Louis 2009/03/10 9:04 am

@Jeff:

The bad guys will never stop updating and improving their methods

Yes, that’s why I thought a whitelist would be a better solution. These guys won’t be able to find any new method if we allow only what we are sure is okay.

minimize the number of false positives

I don’t understand how a false positive would be possible with a fonctionnal whitelist. Let’s say the whitelist contains all the URL that we want to make public. How could someone be forbidden access to a content if this content is known to be public. Erroneous URL are another problem.

If the whitelist lists all the public URL of a website, and if this whitelist is updated programmatically, then I don’t see what would be wrong.
Louis 2009/03/10 10:13 am

@Jeff: listing every static file wouldn’t be very hard as we have them on the hard-drive, and can add a new file to the list immediatly after it’s been uploaded, or remove it accordingly. Also, includes doesn’t generate external requests… as far as I know :o

For the dynamic URL, I must say that it seems way harder. Though, if we coded algorithms to create these URL, we should be able to capture them on a list.

Of course, the solution wouldn’t be a list of rules that any website could plug-in, like your 4G Blacklist does. It would be an internal system written in a chosen langage, and specific to the website. It’s a lot more work to implement, but the result is, in theory, a perfect defense.

Also, on smaller websites that does not have the complexity of Amazon.com, we could write generic purpose code in popular langages (PHP, Ruby, Perl, Python), and make it some sort of “module”, for more ease of deployment.
Jeff Starr 2009/03/10 9:28 am • Post Author

@Louis: I see your point, but unfortunately there is no way to whitelist every acceptable entity on the Web. If you read my comment carefully, you will see that I am focusing on public websites, which by definition must allow the general public access. Of course a whitelist would work if you only wanted to allow access to a percentage of users, but most of us are working with everyday websites that strive to maximize traffic. And even for private websites, what aspect of the user entity would you be whitelisting? If you allow everyone using Firefox, say, you are still leaving yourself open to exploits. Blocking by IP address is unrealistic as well. There is no way you could whitelist every individual address, and allowing various ranges will also let the bad guys through. The same goes for the referrer: easy to work around. And, your suggested method of whitelisting by URL is a bad idea because it would punish users for 404 and other legitimate errors. Further, even if you ignore all of these things, you would still have to spend time updating and maintaining your whitelist to account for these changing environmental variables. Unless you are talking about a private intranet or something, nothing stays the same and thus work is required to stay current and relevant. I don’t think there is any effective way of using a whitelist to protect a public website.

As for minimizing the number the false positives, again, keep in mind that my argument clearly addresses the case of “publicly accessible websites.” Any legitimate user that is blocked while trying to access a whitelisted site could be considered a “false positive.”
Louis 2009/03/10 9:39 am

@Jeff: I didn’t understand your “false positive” issue because I hadn’t even thought of blocking request using Users caracteristics! My idea really was to list every single valid URL, and make that a whitelist. You say that the problem with that idea is that it “punish[es] users for 404 and other legitimate errors”, but what if we redirected any URL which is not in the whitelist on a 404 page, with Google powered suggestions for example?

That sound all right to me. What am I missing here? :(
Jeff Starr 2009/03/10 9:52 am • Post Author

Yes, I see that your focus was more on the “whitelisted-URL” idea. Sorry for straying into a general argument against whitelisting. Actually, in theory your idea is quite good, however it would be terribly expensive and/or time-consuming to implement. One reason for this involves the many different URLs that are requested during the loading of a single page, let alone every page on entire site. It may seem that you only need to whitelist page URLs, but as far as I know you would also need to whitelist every CSS file, JavaScript file, image, include, PHP script, and so on. For every page, theme, and resource. Such a whitelisting system would then have to account for every changed name and location for each of these resources, as well as take into account every new file, page, theme, script, and so on. Even more hideous to think about is the typical e-commerce site, with all of its constantly changing dynamic query-string URLs. Imagine trying to account for every legitimate URL on say, Amazon or something like that and I will just laugh at you (as a friend, of course). Finally, even if you decided to try doing something like this for own site, how would you then generalize and internalize it for others? And how on earth would you test something that supposedly blocks URLs that haven’t been invented yet? Again, I think the idea is sound in theory, but in practice would just be a nightmare to develop, test, and implement.
Myra R. 2009/03/10 11:28 am

I read through the entire article. :p

I love reading about your thoughts and ideas as you put together the Blacklists.

Your posts are very enlightening and entertaining as well.

Thank you for sharing your knowledge (and sense of humor) with us. I look forward reading your next article and putting your 4G Blacklist to use, once it’s released.
Louis 2009/03/10 11:42 am

@Jeff: so I guess my Whitelist dream will never be :)

That’s too bad. I hate when the best solution is the “least bad”. The web is definitely the most hostile environment one can think of.
Louis 2009/03/10 12:00 pm

I don’t know if it’s possible to use Apache in combination with a database manager, but if it is, then our performance issue – which seem to be the only issue preventing us from digging into the whitelist solution – will be gone.
Jeff Starr 2009/03/10 11:16 am • Post Author

Basically what you are describing here is the Apache 404 protocol, only instead of delivering a 404 page, you are sending non-existent requests to the home page. The only difference being that you would probably reduce performance significantly by listing every single file and page on the server. My current domain empire includes something like fifty-thousand files! To have to account for each of them with every URL request (including for the files themselves!) would just be insanely resource-intensive.

123 • Newer Comments »

Comments are closed for this post. Something to add? Let me know.