Building the Perishable Press 4G Blacklist

♦ Posted by Jeff Starr in .htaccess, Security

Updated January 24, 2019 • 35 comments

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

Encouraged by the results of the 2G Blacklist and determined to further improve site security, I continued collecting data, testing directives, and refining my strategy. Work on the next generation of the blacklist — the 3G — required many weeks of research, testing, and analysis. During the development process, five major improvements were implemented. Using pattern recognition, access immunization, and multiple layers of protection, the 3G Blacklist serves as an extremely effective security strategy for preventing a vast majority of common exploits. The list consists of four distinct parts, providing multiple layers of protection that synergize into a comprehensive defense mechanism. Further, as discussed in previous articles, the 3G Blacklist is designed to be as lightweight and flexible as possible, thereby facilitating periodic cultivation and maintenance. Once finished with the development and testing of the 3G, it was finally released for public use. Since then, many people have implemented the 3G Blacklist and the overall results have been very positive.

But the work didn’t stop there. The 3G is very effective at preventing a majority of malicious exploits, but new and/or previously undetected attacks continue to hit the server. As much as I hate to say it, there are people in the world who have nothing better to do than to go around and try to mess with other people’s stuff. Especially on teh Web, this just seems to be a fact of life. Automated attacks, cracker exploits, and spam will never stop. And so I find myself diligently scouring my access and error logs in search of new patterns and methods to defend against. I spend two or three hours each week scanning my logs — line by line — taking notes, following leads, and researching the clues left behind by the brainless lice that continue to plague my defenses. Now, after several months of careful research, analysis and development, I combine this new insight and information with an improved understanding of HTAccess functionality to produce a completely reformulated security strategy referred to as the 4G Blacklist.

For the previous generation of the Blacklist, I spent a great deal of time elucidating the ideas, methods, and data involved with its development. This information remains quite relevant and certainly applies to our current discussion of the upcoming 4G Blacklist. In this article, I share some of the thinking and analysis that went into the creation of the 4G, while outlining the development of its various subsections. All of this foreplay then finally pays off in the next article here at Perishable Press, as the much-anticipated 4G Blacklist is finally released. For now, let’s take a delightful romp through the building of the 4G Blacklist..

Forbidden Characters

One of the most commonly seen exploits involves the use of restricted or forbidden characters to manipulate the environment, trigger errors, or induce vulnerable behavior. These characters are seen both in the root portion of URLs and in query strings. Any character that is not one of the following must be encoded in order to appear legitimately within URLs:

Regular-use characters - allowed unencoded within URLs

$ - _ . + ! * ' ( ) ,

0 1 2 3 4 5 6 7 8 9

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Of these, the dollar sign ( $ ) and the comma ( , ) are commonly used in exploitative attacks, and have thus been blocked in the 4G Blacklist. The period ( . ) itself is required for file-name extensions and should not be blocked, however multiple periods are frequently used in directory-traversal exploit attempts and may be blocked with zero liability. Thus, the 4G Blacklist employs this directive:

RedirectMatch 403 \.\.

..to account for any and all of the following cases:

http://domain.tld/path/target/../string/
http://domain.tld/path/target/../../string/
http://domain.tld/path/target/../../../string/
http://domain.tld/path/target/../../../../string/
http://domain.tld/path/target/../../../../../string/
.
.
.

Likewise, the humble comma:

RedirectMatch 403 \,

..eliminating this sort of nonsense:

component/option,com_rss/feed,RSS2.0/no_html,1/
component/option,com_rss/feed,ATOM0.3/no_html,1/
component/option,com_rss/feed,ATOM0.3/no_html,1/

/press/component/option,com_facileforms/Itemid,98/
/press/component/option,com_facileforms/Itemid,109/
/press/component/option,com_facileforms/Itemid,108/
/press/component/option,com_facileforms/Itemid,109/
/press/component/option,com_facileforms/Itemid,109/

/stupid-htaccess-tricks/path/s,/
/stupid-htaccess-tricks/path/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/
/stupid-htaccess-tricks/path/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/s,/

And so on and so forth. In addition to these regular-use characters, there are also “special-use” and “unsafe” characters. The special-use characters play a specific role in the URL when present unencoded, and thus need to be encoded when present in literal form.

Special-use characters - literal use must be encoded

$ & + , / : ; = ? @

These characters are typically used in the construction of query strings and are rarely used legitimately in the root portion of the URL. These characters are commonly seen in the URLs associated with exploitative behavior, however it is only safe to block most of them from appearing in the root portion of the URL. In the 4G Blacklist, all but the ampersand ( & ), plus sign ( + ), and question mark ( ? ) are blocked for root-portion URLs.

There are also characters that are considered “unsafe” for unencoded use in URLs because of the possibility of misinterpretation. The following characters should never appear in their literal form in any portion of the URL:

Unsafe characters - literal use must always be encoded

space " < > # % { } | \ ^ ~ [ ] `

As you might suspect, these characters are the most commonly seen entities in malicious attacks. The backslash ( \ ) is used to escape various characters, including the tab return ( \r ) and new line ( \n ) commands used in scripting. Thus, by blocking the backslash, we immediately clean up any exploits involving this sort of crud:

\\

/\\\\

/\\\'/

\n\n\n

\r\r\r

Most of the other characters are also blocked in the 4G Blacklist. For obvious reasons, the only “unsafe” characters that aren’t blocked are the blank space, the pound sign ( # ), and the percentage symbol ( % ). By blocking everything else, we effectively eliminate all future occurrences of this type of nonsense:

||0|0
i==r.K-1
i==r.K-1
l..,~f@nr
directories/title=directory
nu+in+html&btnG=Search&meta=/
/1x2n6l6bx6nt/001mAFC(-~l-xAou6.oCqAjB4ukkmrntoz1A/0011C/uikqijg4InjxGu.k
/1x2n6l6bx6nt/001mAFC(-~l-xAou6.oCqAjB4ukkmrntoz1A/0011C/uikqijg4InjxGu.k

Encoded Characters

There are also a number of escaped hexadecimal ASCII characters that have no business appearing in the URL. There are several different groups of these characters that are blocked in the 4G Blacklist. The first group contains hexadecimal-encoded characters beginning with “%0”, for example:


null  %00 - %07
bsp   %08
tab   %09
\n    %0A
null  %0B
null  %0C
\r    %0D

These characters are not used in “normal” URLs and are the tools of many types of malicious exploits. Fortunately, the entire lot of these characters is easily blocked with the following directive:

RedirectMatch 403 \%0

The next group of blocked character codes includes the following hexadecimal representations:

+  %2B
<  %3C
>  %3E
?  %3F
[  %5B
\  %5C
]  %5D
{  %7B
|  %7C
}  %7D
"  %22
'  %27
(  %28
)  %29

These hexadecimal entities are blocked from the root portion of the URL, the query string, or both. These encoded characters are not required for proper URL formation, and are commonly seen in malicious attacks. Thus, in blocking this small subset of hexadecimal representations, the 4G Blacklist puts an end to a great amount of malicious behavior, including the following types of exploits (taken from actual log entries):

%0d
\\%27
%0a%0a
/\\%27/
png%5d%7b/
gif%5d%7b/
jpg%5d,div/
png%5d,div/
jpeg%5d,div/
domain%5d%7b/
%27.admin_url(
directory%5d%7b/
15.1Y(m%5B3%5D,a).K
%5BURL%20to%20preload%5D/
/%5BURL%20to%20preload%5D/
png%5d,a%5bhref$=gif%5d%7b/
ng.fromCharCode(c)%7Dkode=x
/sitemap/%22+accesskey=%226/
/contact/%22+accesskey=%227/
/contact/%22+accesskey=%229/
\\%27.%20get_permalink()%20.%20
\\%27%20.%20$referer%20.%20\\%27
/%5BNext%20URL%20in%20series%5D/
/%2B%28200%2Bok%29%2BACCEPTED%2B/
a.12.4l(/\\%27*/\\%27)%5b0%5d==a/
%5BPrevious%20URL%20in%20series%5D/
15.2I(a.12.5p,1,/\\%274d/\\%27)==a/
%22+title=%22Permalink+for+article+/
/%5BPrevious%20URL%20in%20series%5D/
this.options%5Bthis.selectedIndex%5D.value
jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/
#comment-7327+++++++++++++%09+success+%28from+first+page%29;
jpeg%5d,a%5bhref$=jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/
jpg%5d,a%5bhref$=jpeg%5d,a%5bhref$=jpe%5d,a%5bhref$=png%5d,a%5bhref$=gif%5d%7b/

These attack strings are a common site in unfiltered access and error logs. They are generally appended or otherwise inserted into arrays of legitimate URLs, typically occurring in repetitive fashion over various time intervals. Obviously, these types of URLs do not exist on normal, everyday sites, and are better off blocked to conserve valuable resources and protect against malicious exploits.

The third group of hexadecimal codes that are blocked by the 4G Blacklist consist of all the encoded representations for uncommon entities such as the following:

Â¡ - %A1
Â¢ - %A2
Â£ - %A3
Â¤ - %A4
Â¥ - %A5
Â¦ - %A6
Â§ - %A7

These characters certainly have their place, but there is typically no reason for them to occur in the URL construct. All of these rare characters are represented by hexadecimal encodings that begin with the first six letters of the alphabet. In the example above, only a handful of the “%A’s” are shown, but rest assured there are oodles of these oddities, each beginning with one of the following character pairs:

%A %B %C %D &E %F

Blocking these little rascals prevents the remainder of commonly employed hex strings from serving the villains. Here are a few strays cherry-picked after installing the 3G:

%7c%7c0%7c0
%d0%b2%d0%be%d1%82
PS%ef%bc%9a*%20%7b/

With the 4G in place, hapless crud like that just bounces off the walls. Moving on..

Common Patterns & Specific Exploits

Identifying common patterns and specific exploits is essential to maintaining an effective firewall. Over the course of the past few months, I have identified, extracted, and consolidated a wide variety of character strings, scripts, and file names commonly used by attackers when scanning for exploits. Common patterns were then identified and targeted via regular expressions in the 4G Blacklist in order to inoculate your site against future attacks. In this section of the article, we examine some of the patterns and exploits commonly appearing in the root portion of the variously targeted URLs. The following section will then examine patterns and trends observed in query strings.

One of the most important things to keep in mind when examining these data is the inherent bias in the sample population due to the presence of the previous, 3G Blacklist. The 3G directives are effective at eliminating a vast portfolio of potential attacks, such that the exploit data cultivated for the 4G Blacklist represents a much narrower spectrum of malicious activity. Thus, the evolution of the Perishable Press Blacklist is in fact cumulative in nature, with each successive generation building on previously existing security.

That said, let’s examine some of the character-string exploits observed within my sphere of online domains. Some of the most commonly observed patterns involve character-strings representing various scripting functions, database queries, and template tags. Here are some examples of frequently seen attack strings (taken from actual error/access logs):

select(
convert(
db_name
sys_cpanel
remoteFile
servername
system_user
option_value
clientrequest
maincore.php
password.php

These types of strings generally appear appended to URLs along with more context-specific vulnerability-scanning characters. These strings represent the common patterns present within many types of exploit scanning and have no business in the URLs of typical websites. Incidentally, all of these items are duly blocked in the 4G Blacklist.

Other commonly seen character strings apparently target various XML vulnerabilities. Here are some typical examples extracted from the access/error logs:

xmlrpc.php
xmlrpc.php
adxmlrpc.php

.XMLHTTP
Msxml2.XMLHTTP
Msxml2.XMLHTTP
Msxml2.XMLHTTP
Microsoft.XMLHTTP
Microsoft.XMLHTTP

Unfortunately, the “xmlrpc.php” string represents an actual file used by WordPress and thus cannot be blocked. While scanning for exploits, attackers will append this file name to variously targeted URLs. The xmlrpc.php file is called consistently via the <head> section of the web document, such that there may be a way to target all illicit references by using some advanced mod_rewrite blacklisting directives. If this string is frequently appearing in your server logs, this may be something to investigate.

Moving on, we bring this section to a close with a smorgasbord of recurring exploit patterns. First up, unexplained requests for URLs appended with the following character strings:

&rptmode=

#comment-55862&rptmode=2
#comment-56320&rptmode=2
#comment-55797&rptmode=2
#comment-55797&rptmode=2
#comment-55872&rptmode=2
#comment-55872&rptmode=2
#comment-55872&rptmode=2

Yeah, good stuff. As a side note, if you are one of the billions of people who lives their entire life without feeling the need to probe innocent websites for “rptmode” vulnerabilities, consider yourself lucky. There are obviously a handful of hungry desperadoes out there who feel compelled to lower themselves into this utterly sad virtual arena. Needless to say, the illuminated careers of these lost souls effectively ends with the 4G Blacklist.

Also on the menu, the mysteriously ubiquitous “macromates” probe. It’s like, “WTF” — I pity the poor holes out there who spend their time here on earth searching and scanning for macromates vulnerabilities, of all things. The payoff must be huge! Whatever. Here are a few representative lines, taken from thousands of logged attempted exploits:

macromates.com

macromates.com/screencasts/
macromates.com/screencasts/
macromates.com/screencasts/

Another interesting set of character strings commonly seen in the root portion of URLs targets WordPress database tables directly:

wp_options
wp_posts
wp_terms

Would love to get my hands on the creeps who do this kind of stuff! Oh well, in case that day never comes, the least I can do is prevent the bastards from getting anywhere near my database tables by throwing down the following directive in the 4G Blacklist:

RedirectMatch 404 wp\_

And finally, an otherwise random collection of poisonous little strings that are used in a wide range of vulnerability scans:

_vpi
http%
http;//
/query/
/(null)/
/Table/Latest/index.php

By blocking these arbitrary snippets from URL requests, we effectively eliminate countless attack vectors. Bonus points and honorable mention for identifying which of these strings is not blocked by the 4G Blacklist.

Query Strings

At the heart of any effective firewall technique is the fine art of query-string filtering. Because of the influential and sensitive nature of query strings, cleaning up query-string input is an essential part of any serious website security strategy. By manipulating query strings, the savvy attacker may gain access to, take control of, and ultimately corrupt or destroy your files, database, and the even server itself. As you can imagine, a great deal of time, effort, and research has gone into the development of the “query-string” portion of the 4G Blacklist.

For the average site, when it comes to cleaning cracker crumbs from the query string, we may scrub with broad, sweeping strokes. By blocking a select handful of character strings, we can disinfect a vast majority of maliciously requested query strings. Due to the way in which query strings function, many different types of attacks contain similar characters. Consider this collection of malicious query strings that were harvested from actual access/error logs (note: line breaks inserted for readability):

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),concat(0x3a,user_login,
0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

st_newsletter/stnl_iframe.php?newsletter=-9999+UNION+SELECT+
concat(0x3a,user_login,0x3a,user_pass,0x3a)+FROM+wp_users--

wpSS/ss_load.php?ss_id=1+and+(1=0)+union+select+1,concat(0x3a,
user_login,0x3a,user_pass,0x3a),3,4+from+wp_users--&display=plain

wp-download.php?dl_id=null/**/union/**/all/**/select/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/from/**/wp_users/*

forums?forum=1&topic=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forum=1&topic=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

sf-forum?forum=-99999/**/UNION/**/SELECT/**/concat(0x3a,
user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

sf-forum?forum=-99999/**/UNION/**/SELECT/**/0,concat(0x3a,
user_login,0x3a,user_pass,0x3a),0,0,0,0,0/**/FROM/**/wp_users/*

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),concat(0x3a,user_login,
0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

wp-adserve/adclick.php?id=-1%20union%20select%20concat(0x3a,
user_login,0x3a,user_pass,0x3a)%20from%20wp_users

fim_rss.php?album=-1%20union%20select%201,concat(0x3a,
user_login,0x3a,user_pass,0x3a),3,4,5,6,7%20from%20wp_users--

wp-cal/functions/editevent.php?id=-1%20union%20select%201,concat
(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6%20from%20wp_users--

wp-content/plugins/wp-cal/functions/editevent.php?id=-1%20union%20select%201,
concat(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6%20from%20wp_users--

wp-content/plugins/fgallery/fim_rss.php?album=-1%20union%20select%201,
concat(0x3a,user_login,0x3a,user_pass,0x3a),3,4,5,6,7%20from%20wp_users--

wp-content/plugins/wp-adserve/adclick.php?id=-1%20union%20select
%20concat(0x3a,user_login,0x3a,user_pass,0x3a)%20from%20wp_users

wp-content/plugins/st_newsletter/stnl_iframe.php?newsletter=-9999
+UNION+SELECT+concat(0x3a,user_login,0x3a,user_pass,0x3a)+FROM+wp_users--

wp-content/plugins/wp-download/wp-download.php?dl_id=null/**/union/**/all
/**/select/**/concat(0x3a,user_login,0x3a,user_pass,0x3a)/**/from/**/wp_users/*

wp-content/plugins/wpSS/ss_load.php?ss_id=1+and+(1=0)+union+select+1,concat
(0x3a,user_login,0x3a,user_pass,0x3a),3,4+from+wp_users--&display=plain

wordspew-rss.php?id=-998877/**/UNION/**/SELECT/**/0,1,concat(0x3a,user_login,0x3a,
user_pass,0x3a),concat(0x3a,user_login,0x3a,user_pass,0x3a),4,5/**/FROM/**/wp_users

sf-forum?forum=-99999/**/UNION/**/SELECT/**/0,concat(0x3a,
user_login,0x3a,user_pass,0x3a),0,0,0,0,0/**/FROM/**/wp_users/*

web/sf-forum?forum=-99999/**/UNION/**/SELECT/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forums?forum=1&topic=-99999/**/UNION/**/SELECT/**/concat
(0x3a,user_login,0x3a,user_pass,0x3a)/**/FROM/**/wp_users/*

forum_feed.php?thread=-99999+union+select+1,2,3,concat(char(37),char(95),char(37),char(95),char(37),
user_login,char(37),char(95),char(37),char(95),char(37),user_pass,char(37),char(95),char(37),char(95),
char(37),user_email,char(37),char(95),char(37),char(95),char(37)),5,6,7+from+wp_users/*

These URLs are some of the worst that I have encountered, each one scanning for specific database-related vulnerabilities via uniquely configured query strings. As you can see, this type of exploit scanning utilizes a wide range of character strings. Fortunately, there are common elements that may be identified, targeted, and subsequently blocked via the 4G Blacklist. Thus, by including the following directive in our QUERY_STRING directives, we immediately eliminate the entire array of these types of attacks:

RewriteCond %{QUERY_STRING} ^.*(select|union).* [NC]

Powerful stuff. And, as there is no reason for these terms to appear in legitimate query-string constructs, we may secure our sites quietly and effectively and with zero affect on proper functionality. A similar broad-sweeping blacklist directive for query strings is the highly targeted “mosConfig” parameter. There are many scripts that maliciously attempt to set mosConfig values through the URL. Here are some actual examples (note: line breaks inserted for the last entry):

index.php?option=com_peoplebook&Itemid=&mosConfig_absolute_path=?
index.php?option=com_peoplebook&Itemid=&mosConfig_absolute_path=?
index.php?option=com_joomap&view=google&no_html=&mosConfig_absolute_path=?

/path/?option=com_simpleboard&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?
/path/?option=com_remository&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?
/path/?option=com_remository&task=cat_view&gid=28&Itemid=&mosConfig_absolute_path=?

http%%20-%2037k%20-https://perishablepress.com/press/2006/08/28/spamless-email-address-via-javascript/
%20-%2037k%20-/https://perishablepress.com/press/2008/03/08/blacklist-candidate-number-2008-03-09/
index.php?option=com_remository&Itemid=&mosConfig_absolute_path=?

Each of these entries was logged as coming from a unique IP address, and each group of entries (first, second, and third) was recorded by way of a different user agent. Further, no referrer information was associated with any of these exploit attempts. But that’s “okay” with us, because we have identified and immunized against the common patterns found among the various query strings themselves. The first and most obvious character string that has been added to the 4G Blacklist is the mosConfig parameter. Revisit the previous examples and see how blocking that single term will prevent all future occurrences of any similar sort of “mosConfig”-type attack.

The astute reader will have also noticed a second common element within our mosConfig collection: the terminating question mark ( ? ). The question mark is not a viable blacklist candidate, however, because it is not reliably present in every mosConfig attack. Even so, blocking the question mark in the query string eliminates an even greater subset of malicious URLs. For example, scrubbing the query string of all question marks is an effective and efficient way to flush all of the following turds (taken from actual access/error logs):

?3,f.@45
?3,f.@45
?3,f.@45
?3,f.@45

index.php?=?
index.php?url=?
login.php?dir=?
setup.php?dir=?
index.php?mode=?
ask_password.php?dir=?
index.php?DOCUMENT_ROOT=?

error.php?error=uid=48(apache)%20gid=48(apache)%20groups=48(apache)%0A?

And that’s just a sampling. Blocking question marks is an excellent way to clean up an enormous amount of dangerous exploit attempts. This is the line of thinking that went into the development of the 4G Blacklist. There are many illicit characters not allowed in the query string that are frequently used by attackers while scanning for potential exploits. Thus, by blacklisting their presence, we apply another strong layer of defense to our website.

In addition to blocking illicit characters, the QUERY_STRING directives of the 4G Blacklist also protect against a wide variety of more sophisticated attacks. For example, all query-string attempts to leverage base64 encoding, <script> tags, PHP globals, REQUEST variables, and so forth are blocked via 4G. Also neutralized is the threat of this sort of nonsense:

index.php?loopback
index.php?localhost
index.php?127.0.0.1

Of course, the QUERY_STRING directives of the 4G Blacklist are designed to protect your site against a wide variety of malicious attacks. Needless to say, a more comprehensive exploration of the methods and strategies involved with this part of the Blacklist would be major overkill, if not already ;)

Other Modifications

Briefly, and for the sake of future reference, here are some of my notes concerning the important differences between the 3G and 4G Blacklist.

Updated Directives

RedirectMatch 403 \/\,
RedirectMatch 403 \.\.\.
RedirectMatch 403 \_vpi\.xml
RedirectMatch 403 ImpEvData\.
RedirectMatch 403 blank\.php
RedirectMatch 403 errors\.php
RedirectMatch 403 config\.php
RedirectMatch 403 include\.php
RedirectMatch 403 display\.php
RedirectMatch 403 register\.php
Redirectmatch 403 password\.php
RedirectMatch 403 maincore\.php
RedirectMatch 403 authorize\.php
RedirectMatch 403 doeditconfig\.
RedirectMatch 403 function\.main
RedirectMatch 403 function\.mkdir
RedirectMatch 403 function\.opendir
RedirectMatch 403 function\.require
RedirectMatch 403 \/wp\-signup\.php
RedirectMatch 403 function\.array\-rand
RedirectMatch 403 comment\-template\.php
RedirectMatch 403 function\.require\-once

Removed Directives


RedirectMatch 403 f\-\.
RedirectMatch 403 ftp\:
RedirectMatch 403 ttp\:
RedirectMatch 403 blank\.
Redirectmatch 403 xmlrpc\.
RedirectMatch 403 et\.html
RedirectMatch 403 news\.php
RedirectMatch 403 menu\.php
RedirectMatch 403 main\.php
RedirectMatch 403 home\.php
RedirectMatch 403 view\.php
RedirectMatch 403 about\.php
RedirectMatch 403 block\.php
RedirectMatch 403 order\.php
RedirectMatch 403 search\.php
RedirectMatch 403 button\.php
RedirectMatch 403 middle\.php
RedirectMatch 403 \/login\.php
RedirectMatch 403 contact\.php
RedirectMatch 403 threads\.php
RedirectMatch 403 path\_to\_script
RedirectMatch 403 send\_reminders\.
RedirectMatch 403 syntax\_highlight\.
RedirectMatch 403 \/themes\/
RedirectMatch 403 \/plugins\/
RedirectMatch 403 \/modules\/
RedirectMatch 403 \/classes\/
RedirectMatch 403 \/scripts\/
RedirectMatch 403 \/includes\/
RedirectMatch 403 \/components\/
RedirectMatch 403 \/administrator\/

Redundant Directives

RedirectMatch 403 alt\=
RedirectMatch 403 \.\$url
RedirectMatch 403 \/\$url
RedirectMatch 403 \/\$link

The 5G Blacklist

During the development of the 4G Blacklist, I encountered a number of maliciously employed character strings that I could not block without invoking a more complicated set of rewrite directives. The items discussed below were not included in the 4G version, but may be integrated into the eventual 5G Blacklist.

Dot Nonsense

This type of nonsense is a very prevalent nuisance:

a.1
a.2
a.3
a.n
a.cross-link

..ad nauseam. The general pattern that would be useful in preventing this type of lunacy looks something like this:

"single alphanumeric character" . "single alphanumeric character" (or) "any sequence of alphanumeric characters that contains an invalid or unexpected character"

So, until I find time to craft something along those lines, the “dot-nuisance” entries will continue to plague teh access logz! Oh well..

Another stinky turd that just won’t go away on it’s own involves the myriad mutations of this little monster:

-moz-grabbing

How desperate these script kiddies must be to have to resort to such meaningless idiocy! Get a life, losers!

Other character strings that appear frequently but that are not readily blocked via existing methodology include the following:

(
)
-1/
cfrm/
/null

Also, there are a host of exploits referring to files that actually exist for various software packages, blogging platforms, and other web applications. Of course, for sites that do not use these files, blacklisting is a legitimate solution, but we certainly wouldn’t want to block any files that are actually in use. Two possibilities exist for defending against malicious requests that contain actual file names. We could either block any requests for such files that are not coming from our own server, or block any requests that deviate from the actual file path. In either case, the rules required to accomplish this transcend the current functionality provided by the 4G Blacklist.

navmenu.js
external.js
autoclear.js

page.html
info.html
noscript.html
no-javascript.html
forum_summary.html
static-content.html
nested-frameset.html

edit.php
categories.php
edit-comments.php
wp-admin/upgrade.php
wp-admin/wp-login.php

And finally, I leave you with this question: what are the consequences of blacklisting single versus double backslashes?

\ vs. \\

Somebody Stop Me

That’s it for this fun-filled article. I would be surprised if anyone actually read all the way through, but then again, that’s not really the point of the exercise. After developing the 4G Blacklist, I wanted a clear, concise summary of all the notes and thinking that went into its creation. By combining everything into this single post, I save myself time and effort when referring to this information in the future. And hopefully, by publishing this lengthy diatribe on the Web, it will be of some educational value to others as well. In either case, stay tuned, because the 4G Blacklist is coming up in the next article..

apache blacklist firewall ip mod_rewrite nG tips websites

About the Author

Jeff Starr = Web Developer. Book Author. Secretly Important.

35 responses to “Building the Perishable Press 4G Blacklist”

Donace | The Nexus 2009/03/10 12:17 pm

I was a passenger for a 2.5hr drive yesterday, reading this article helped me pass some time :p.

I may have touched on this before in a previous comment; but as Louis states and you yourself state a list on its own is not a ‘do-all’ ‘end-of’ defence.

Security is about layers, I think its safe to say using this, along with integration with blocking ip’s from project honey pot etc redundant useragents etc etc it will be a defence against miscreants though only the ‘un-knowledgeable’ new guys (a significant %) though the people who want to do harm will find a way.
Louis 2009/03/10 12:23 pm

if the request is for a legitimate, existing resource, allow access

How do you know if the URL is legitimate without using a list, and thus, a database? Static files are ok to figure out, but how do you verify dynamic URL?
Louis 2009/03/10 12:36 pm

But Apache does not handle dynamic URL fully, I mean, if we add ?whatever after a valid URL, it becomes a valid URL too for Apache. That may be problematic as it lets the door open for pirates.

Anyways, you seemed enthousiastic on this comment. Does that mean you may consider working on a whitelist solution? :^)
Jeff Starr 2009/03/10 11:40 am • Post Author

@Myra: Coolness! That makes my day :) I am guessing that many people will “skim” through and catch the main points, but the article is rather long, so kudos to you for taking the time to read through. Your kind words are encouraging, Myra — thanks for taking the time to respond! :)
Jeff Starr 2009/03/10 11:55 am • Post Author

@Louis: Alright, to play devil’s advocate.. It would be possible to create a URL whitelist that essentially says:

if the URL request is from the server, allow access
if the URL request is not from the server, then do the following:
if the request is for a legitimate, existing resource, allow access
if the request if for anything else, deny access or redirect

So the question is, why wouldn’t something like this work?
Jeff Starr 2009/03/10 12:14 pm • Post Author

Yes, using a database would require a scripting language, which then could also be used to control traffic one way or another for various site resources. Such a database-based solution would probably improve performance, but even then I don’t think it would be as elegant or efficient as the logic presented in my previous comment, which may very well be possible to construct from within Apache.
Jeff Starr 2009/03/10 12:29 pm • Post Author

Easy. We take advantage of Apache’s 404 protocol: any request otherwise resulting in a 404 error is not allowed access or redirected. No database necessary :)
Louis 2009/03/10 1:31 pm

Yes it would be worth it, I guaranty! The blacklist may protect from 99% of the attacks, but you have this little 1% in your head that makes you wake up in the night yelling “AAAaaAaaaah not the spam bots! Leave me alone!!!”. If you manage to get a whitelist mecanism working, the war will be over and you will have won.

Now, thank you for your kind words; they are really touching. I missed discussing with you too! The thing is, I’ve really dropped interest in WordPress and PHP. These are two important themes on PP, so I guess I had fewer comments to make recently. But when this post came in my RSS reader, I knew I had to write a little something – it’s one of your favorite topics after all :).

I’m also not as much interested by the internet as I used to be – the technical internet I mean, from the builders side of the thing. For example, I loved to write on Mootools on my blog, but now, with the arrival of the new JavaScript engines natively in the browsers, I feel I have lost my time. The same thing occured with PHP when I played with Ruby – not even RoR, just Ruby. Simple and powerful solutions are emerging, and I feel like I’ve wasted a lot of time doing some great work, yes, but a work on a problem that didn’t had to be in the first place. I’m not sure if I’m clear here, but the idea is that I enjoy less reading tech posts these days.

And the worst thing is that you seem to provide some great readings on MindFeed, some readings that don’t deal with 0 and 1, that are more human oriented. But I simply can’t enjoy these, because they are written in English. That’s so sad. I could read them, but it would become a pain to translate that quantity of text, and I don’t want reading you to become a pain :)

But enough complaining about everything! This post was awesome as usual, and with a little luck, the spammers with stop spamming and the pirates will stop being assholes to people. You say they won’t? Okay, but summer is coming anyways and my batteries are recharging :)
Jeff Starr 2009/03/10 12:37 pm • Post Author

@Donace: Awesome — glad I helped you pass some time! ;) And you are absolutely correct that security is all “about layers.” Firewalls and blacklists can be very effective at filtering out bad requests and other external nonsense, but you also have to keep form input clean as well. A solid server configuration combined with a securely developed website is always important, regardless of any extra protection afforded by firewalls.

As for blocking individual IPs, user agents, and referrers, I think that it is best to do so in moderation and for extreme and/or cases of well-known maliciousness. Avoid never-ending lists of appended entities and only block the items for as long as is necessary. My opinion, anyway :P
Jeff Starr 2009/03/10 12:45 pm • Post Author

@Louis: Quite honestly, I was more enthused about corresponding with you again. It has been awhile and I was beginning to wonder if everything was okay with you (so don’t scare me like that!). So when I saw that you were back in town, so to speak, I dropped everything and threw myself into the discussion. Not to mention the fact that the topic is one of my favorites, as you are well aware. But, even so, you have given me something to think about and I will certainly be doing some experimenting to see if such a whitelist is indeed a lucrative idea. As you say, the query string would be a bit of a challenge, but well worth it.
Jeff Starr 2009/03/10 3:19 pm • Post Author

I figured something like that was up (either that or school, which can be a huge burden) — I am glad that it wasn’t because of anything serious or traumatic, like illness or something. To tell you the truth, I have been doing a little soul-searching myself about all of this Web stuff. Just as you mention, it seems as if a lot of the stuff that I spend my time on, such as WordPress for example, is just becoming pointless due to the millions of people who are now writing about the same thing.

But not just that, it seems like, for technical web dev stuff in general, it is just too competitive too make anything worthwhile. I don’t like the feeling that everything I do is sheer and utter vanity, because there will always be a bigger, better site doing work on the exact same stuff. As you know, I do this at my own expense and collect no revenue from my efforts, so when the inherent rewards of providing novel information become increasingly difficult, well, let’s just say that I am also looking into other, more satisfying endeavors.

I hope you continue reading mindfeed (and Perishable Press), as I will definitely be heading in that direction more as the days unwind.. and don’t give me that crap about not being able to follow along because of the language barrier — you understand English better than half of the people I know.

In any case, thanks for taking the time to chime in and discuss this post — it is always a pleasure exploring topics via critical and contemplative discussion.
Sam Beale 2009/03/12 2:16 am

Wow, thankyou!

That was the most interesting read I have encountered on the internet, probably ever. I’m actually supposed to be working right now but stumbled across your site and haven’t been able to tear myself away from this article or the discussion afterwards.

Thankyou for providing a page that despite the scarily small scrollbar on the right kept me hooked from start to finish!

Cheers,
Sam.
Ps. I looked through your previous site designs and this one just blows me away – amazing!

« Previous Comments • 123 • Newer Comments »

Comments are closed for this post. Something to add? Let me know.