Celebrating 20 years online :)
Web Dev + WordPress + Security

Building the Perishable Press 4G Blacklist

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

[ Building the Hoover Dam, Part 2 ]

Encouraged by the results of the 2G Blacklist and determined to further improve site security, I continued collecting data, testing directives, and refining my strategy. Work on the next generation of the blacklist — the 3G — required many weeks of research, testing, and analysis. During the development process, five major improvements were implemented. Using pattern recognition, access immunization, and multiple layers of protection, the 3G Blacklist serves as an extremely effective security strategy for preventing a vast majority of common exploits. The list consists of four distinct parts, providing multiple layers of protection that synergize into a comprehensive defense mechanism. Further, as discussed in previous articles, the 3G Blacklist is designed to be as lightweight and flexible as possible, thereby facilitating periodic cultivation and maintenance. Once finished with the development and testing of the 3G, it was finally released for public use. Since then, many people have implemented the 3G Blacklist and the overall results have been very positive.

[ Building the Hoover Dam, Part 3 ]

But the work didn’t stop there. The 3G is very effective at preventing a majority of malicious exploits, but new and/or previously undetected attacks continue to hit the server. As much as I hate to say it, there are people in the world who have nothing better to do than to go around and try to mess with other people’s stuff. Especially on teh Web, this just seems to be a fact of life. Automated attacks, cracker exploits, and spam will never stop. And so I find myself diligently scouring my access and error logs in search of new patterns and methods to defend against. I spend two or three hours each week scanning my logs — line by line — taking notes, following leads, and researching the clues left behind by the brainless lice that continue to plague my defenses. Now, after several months of careful research, analysis and development, I combine this new insight and information with an improved understanding of HTAccess functionality to produce a completely reformulated security strategy referred to as the 4G Blacklist.

[ Building the Hoover Dam, Part 4 ]

For the previous generation of the Blacklist, I spent a great deal of time elucidating the ideas, methods, and data involved with its development. This information remains quite relevant and certainly applies to our current discussion of the upcoming 4G Blacklist. In this article, I share some of the thinking and analysis that went into the creation of the 4G, while outlining the development of its various subsections. All of this foreplay then finally pays off in the next article here at Perishable Press, as the much-anticipated 4G Blacklist is finally released. For now, let’s take a delightful romp through the building of the 4G Blacklist..

Forbidden Characters

One of the most commonly seen exploits involves the use of restricted or forbidden characters to manipulate the environment, trigger errors, or induce vulnerable behavior. These characters are seen both in the root portion of URLs and in query strings. Any character that is not one of the following must be encoded in order to appear legitimately within URLs:

Regular-use characters - allowed unencoded within URLs

$ - _ . + ! * ' ( ) ,

0 1 2 3 4 5 6 7 8 9

a b c d e f g h i j k l m n o p q r s t u v w x y z

Of these, the dollar sign ( $ ) and the comma ( , ) are commonly used in exploitative attacks, and have thus been blocked in the 4G Blacklist. The period ( . ) itself is required for file-name extensions and should not be blocked, however multiple periods are frequently used in directory-traversal exploit attempts and may be blocked with zero liability. Thus, the 4G Blacklist employs this directive:

RedirectMatch 403 \.\.

..to account for any and all of the following cases:


Likewise, the humble comma:

RedirectMatch 403 \,

..eliminating this sort of nonsense:




And so on and so forth. In addition to these regular-use characters, there are also “special-use” and “unsafe” characters. The special-use characters play a specific role in the URL when present unencoded, and thus need to be encoded when present in literal form.

Special-use characters - literal use must be encoded

$ & + , / : ; = ? @

These characters are typically used in the construction of query strings and are rarely used legitimately in the root portion of the URL. These characters are commonly seen in the URLs associated with exploitative behavior, however it is only safe to block most of them from appearing in the root portion of the URL. In the 4G Blacklist, all but the ampersand ( & ), plus sign ( + ), and question mark ( ? ) are blocked for root-portion URLs.

[ Building the Hoover Dam, Part 5 ]

There are also characters that are considered “unsafe” for unencoded use in URLs because of the possibility of misinterpretation. The following characters should never appear in their literal form in any portion of the URL:

Unsafe characters - literal use must always be encoded

space " < > # % { } | \ ^ ~ [ ] `

As you might suspect, these characters are the most commonly seen entities in malicious attacks. The backslash ( \ ) is used to escape various characters, including the tab return ( \r ) and new line ( \n ) commands used in scripting. Thus, by blocking the backslash, we immediately clean up any exploits involving this sort of crud:






Most of the other characters are also blocked in the 4G Blacklist. For obvious reasons, the only “unsafe” characters that aren’t blocked are the blank space, the pound sign ( # ), and the percentage symbol ( % ). By blocking everything else, we effectively eliminate all future occurrences of this type of nonsense:


Encoded Characters

There are also a number of escaped hexadecimal ASCII characters that have no business appearing in the URL. There are several different groups of these characters that are blocked in the 4G Blacklist. The first group contains hexadecimal-encoded characters beginning with “%0”, for example:

null  %00 - %07
bsp   %08
tab   %09
\n    %0A
null  %0B
null  %0C
\r    %0D

These characters are not used in “normal” URLs and are the tools of many types of malicious exploits. Fortunately, the entire lot of these characters is easily blocked with the following directive:

RedirectMatch 403 \%0

The next group of blocked character codes includes the following hexadecimal representations:

+  %2B
<  %3C
>  %3E
?  %3F
[  %5B
\  %5C
]  %5D
{  %7B
|  %7C
}  %7D
"  %22
'  %27
(  %28
)  %29

These hexadecimal entities are blocked from the root portion of the URL, the query string, or both. These encoded characters are not required for proper URL formation, and are commonly seen in malicious attacks. Thus, in blocking this small subset of hexadecimal representations, the 4G Blacklist puts an end to a great amount of malicious behavior, including the following types of exploits (taken from actual log entries):


These attack strings are a common site in unfiltered access and error logs. They are generally appended or otherwise inserted into arrays of legitimate URLs, typically occurring in repetitive fashion over various time intervals. Obviously, these types of URLs do not exist on normal, everyday sites, and are better off blocked to conserve valuable resources and protect against malicious exploits.

[ Building the Hoover Dam, Part 6 ]

The third group of hexadecimal codes that are blocked by the 4G Blacklist consist of all the encoded representations for uncommon entities such as the following:

¡ - %A1
¢ - %A2
£ - %A3
¤ - %A4
Â¥ - %A5
¦ - %A6
§ - %A7

These characters certainly have their place, but there is typically no reason for them to occur in the URL construct. All of these rare characters are represented by hexadecimal encodings that begin with the first six letters of the alphabet. In the example above, only a handful of the “%A’s” are shown, but rest assured there are oodles of these oddities, each beginning with one of the following character pairs:

%A %B %C %D &E %F

Blocking these little rascals prevents the remainder of commonly employed hex strings from serving the villains. Here are a few strays cherry-picked after installing the 3G:


With the 4G in place, hapless crud like that just bounces off the walls. Moving on..

Common Patterns & Specific Exploits

Identifying common patterns and specific exploits is essential to maintaining an effective firewall. Over the course of the past few months, I have identified, extracted, and consolidated a wide variety of character strings, scripts, and file names commonly used by attackers when scanning for exploits. Common patterns were then identified and targeted via regular expressions in the 4G Blacklist in order to inoculate your site against future attacks. In this section of the article, we examine some of the patterns and exploits commonly appearing in the root portion of the variously targeted URLs. The following section will then examine patterns and trends observed in query strings.

[ Building the Hoover Dam, Part 7 ]

One of the most important things to keep in mind when examining these data is the inherent bias in the sample population due to the presence of the previous, 3G Blacklist. The 3G directives are effective at eliminating a vast portfolio of potential attacks, such that the exploit data cultivated for the 4G Blacklist represents a much narrower spectrum of malicious activity. Thus, the evolution of the Perishable Press Blacklist is in fact cumulative in nature, with each successive generation building on previously existing security.

That said, let’s examine some of the character-string exploits observed within my sphere of online domains. Some of the most commonly observed patterns involve character-strings representing various scripting functions, database queries, and template tags. Here are some examples of frequently seen attack strings (taken from actual error/access logs):


These types of strings generally appear appended to URLs along with more context-specific vulnerability-scanning characters. These strings represent the common patterns present within many types of exploit scanning and have no business in the URLs of typical websites. Incidentally, all of these items are duly blocked in the 4G Blacklist.

Other commonly seen character strings apparently target various XML vulnerabilities. Here are some typical examples extracted from the access/error logs:



Unfortunately, the “xmlrpc.php” string represents an actual file used by WordPress and thus cannot be blocked. While scanning for exploits, attackers will append this file name to variously targeted URLs. The xmlrpc.php file is called consistently via the <head> section of the web document, such that there may be a way to target all illicit references by using some advanced mod_rewrite blacklisting directives. If this string is frequently appearing in your server logs, this may be something to investigate.

Moving on, we bring this section to a close with a smorgasbord of recurring exploit patterns. First up, unexplained requests for URLs appended with the following character strings:



Yeah, good stuff. As a side note, if you are one of the billions of people who lives their entire life without feeling the need to probe innocent websites for “rptmode” vulnerabilities, consider yourself lucky. There are obviously a handful of hungry desperadoes out there who feel compelled to lower themselves into this utterly sad virtual arena. Needless to say, the illuminated careers of these lost souls effectively ends with the 4G Blacklist.

[ Building the Hoover Dam, Part 8 ]

Also on the menu, the mysteriously ubiquitous “macromates” probe. It’s like, “WTF” — I pity the poor holes out there who spend their time here on earth searching and scanning for macromates vulnerabilities, of all things. The payoff must be huge! Whatever. Here are a few representative lines, taken from thousands of logged attempted exploits:



Another interesting set of character strings commonly seen in the root portion of URLs targets WordPress database tables directly:


Would love to get my hands on the creeps who do this kind of stuff! Oh well, in case that day never comes, the least I can do is prevent the bastards from getting anywhere near my database tables by throwing down the following directive in the 4G Blacklist:

RedirectMatch 404 wp\_

And finally, an otherwise random collection of poisonous little strings that are used in a wide range of vulnerability scans:


By blocking these arbitrary snippets from URL requests, we effectively eliminate countless attack vectors. Bonus points and honorable mention for identifying which of these strings is not blocked by the 4G Blacklist.

Query Strings

At the heart of any effective firewall technique is the fine art of query-string filtering. Because of the influential and sensitive nature of query strings, cleaning up query-string input is an essential part of any serious website security strategy. By manipulating query strings, the savvy attacker may gain access to, take control of, and ultimately corrupt or destroy your files, database, and the even server itself. As you can imagine, a great deal of time, effort, and research has gone into the development of the “query-string” portion of the 4G Blacklist.

[ Building the Hoover Dam, Part 9 ]

For the average site, when it comes to cleaning cracker crumbs from the query string, we may scrub with broad, sweeping strokes. By blocking a select handful of character strings, we can disinfect a vast majority of maliciously requested query strings. Due to the way in which query strings function, many different types of attacks contain similar characters. Consider this collection of malicious query strings that were harvested from actual access/error logs (note: line breaks inserted for readability):
























These URLs are some of the worst that I have encountered, each one scanning for specific database-related vulnerabilities via uniquely configured query strings. As you can see, this type of exploit scanning utilizes a wide range of character strings. Fortunately, there are common elements that may be identified, targeted, and subsequently blocked via the 4G Blacklist. Thus, by including the following directive in our QUERY_STRING directives, we immediately eliminate the entire array of these types of attacks:

RewriteCond %{QUERY_STRING} ^.*(select|union).* [NC]

Powerful stuff. And, as there is no reason for these terms to appear in legitimate query-string constructs, we may secure our sites quietly and effectively and with zero affect on proper functionality. A similar broad-sweeping blacklist directive for query strings is the highly targeted “mosConfig” parameter. There are many scripts that maliciously attempt to set mosConfig values through the URL. Here are some actual examples (note: line breaks inserted for the last entry):




Each of these entries was logged as coming from a unique IP address, and each group of entries (first, second, and third) was recorded by way of a different user agent. Further, no referrer information was associated with any of these exploit attempts. But that’s “okay” with us, because we have identified and immunized against the common patterns found among the various query strings themselves. The first and most obvious character string that has been added to the 4G Blacklist is the mosConfig parameter. Revisit the previous examples and see how blocking that single term will prevent all future occurrences of any similar sort of “mosConfig”-type attack.

The astute reader will have also noticed a second common element within our mosConfig collection: the terminating question mark ( ? ). The question mark is not a viable blacklist candidate, however, because it is not reliably present in every mosConfig attack. Even so, blocking the question mark in the query string eliminates an even greater subset of malicious URLs. For example, scrubbing the query string of all question marks is an effective and efficient way to flush all of the following turds (taken from actual access/error logs):




And that’s just a sampling. Blocking question marks is an excellent way to clean up an enormous amount of dangerous exploit attempts. This is the line of thinking that went into the development of the 4G Blacklist. There are many illicit characters not allowed in the query string that are frequently used by attackers while scanning for potential exploits. Thus, by blacklisting their presence, we apply another strong layer of defense to our website.

[ Building the Hoover Dam, Part 10 ]

In addition to blocking illicit characters, the QUERY_STRING directives of the 4G Blacklist also protect against a wide variety of more sophisticated attacks. For example, all query-string attempts to leverage base64 encoding, <script> tags, PHP globals, REQUEST variables, and so forth are blocked via 4G. Also neutralized is the threat of this sort of nonsense:


Of course, the QUERY_STRING directives of the 4G Blacklist are designed to protect your site against a wide variety of malicious attacks. Needless to say, a more comprehensive exploration of the methods and strategies involved with this part of the Blacklist would be major overkill, if not already ;)

Other Modifications

Briefly, and for the sake of future reference, here are some of my notes concerning the important differences between the 3G and 4G Blacklist.

Updated Directives

RedirectMatch 403 \/\,
RedirectMatch 403 \.\.\.
RedirectMatch 403 \_vpi\.xml
RedirectMatch 403 ImpEvData\.
RedirectMatch 403 blank\.php
RedirectMatch 403 errors\.php
RedirectMatch 403 config\.php
RedirectMatch 403 include\.php
RedirectMatch 403 display\.php
RedirectMatch 403 register\.php
Redirectmatch 403 password\.php
RedirectMatch 403 maincore\.php
RedirectMatch 403 authorize\.php
RedirectMatch 403 doeditconfig\.
RedirectMatch 403 function\.main
RedirectMatch 403 function\.mkdir
RedirectMatch 403 function\.opendir
RedirectMatch 403 function\.require
RedirectMatch 403 \/wp\-signup\.php
RedirectMatch 403 function\.array\-rand
RedirectMatch 403 comment\-template\.php
RedirectMatch 403 function\.require\-once

Removed Directives

RedirectMatch 403 f\-\.
RedirectMatch 403 ftp\:
RedirectMatch 403 ttp\:
RedirectMatch 403 blank\.
Redirectmatch 403 xmlrpc\.
RedirectMatch 403 et\.html
RedirectMatch 403 news\.php
RedirectMatch 403 menu\.php
RedirectMatch 403 main\.php
RedirectMatch 403 home\.php
RedirectMatch 403 view\.php
RedirectMatch 403 about\.php
RedirectMatch 403 block\.php
RedirectMatch 403 order\.php
RedirectMatch 403 search\.php
RedirectMatch 403 button\.php
RedirectMatch 403 middle\.php
RedirectMatch 403 \/login\.php
RedirectMatch 403 contact\.php
RedirectMatch 403 threads\.php
RedirectMatch 403 path\_to\_script
RedirectMatch 403 send\_reminders\.
RedirectMatch 403 syntax\_highlight\.
RedirectMatch 403 \/themes\/
RedirectMatch 403 \/plugins\/
RedirectMatch 403 \/modules\/
RedirectMatch 403 \/classes\/
RedirectMatch 403 \/scripts\/
RedirectMatch 403 \/includes\/
RedirectMatch 403 \/components\/
RedirectMatch 403 \/administrator\/

Redundant Directives

RedirectMatch 403 alt\=
RedirectMatch 403 \.\$url
RedirectMatch 403 \/\$url
RedirectMatch 403 \/\$link

The 5G Blacklist

During the development of the 4G Blacklist, I encountered a number of maliciously employed character strings that I could not block without invoking a more complicated set of rewrite directives. The items discussed below were not included in the 4G version, but may be integrated into the eventual 5G Blacklist.

Dot Nonsense

This type of nonsense is a very prevalent nuisance:


..ad nauseam. The general pattern that would be useful in preventing this type of lunacy looks something like this:

"single alphanumeric character" . "single alphanumeric character" (or) "any sequence of alphanumeric characters that contains an invalid or unexpected character"

So, until I find time to craft something along those lines, the “dot-nuisance” entries will continue to plague teh access logz! Oh well..

Another stinky turd that just won’t go away on it’s own involves the myriad mutations of this little monster:


How desperate these script kiddies must be to have to resort to such meaningless idiocy! Get a life, losers!

Other character strings that appear frequently but that are not readily blocked via existing methodology include the following:


Also, there are a host of exploits referring to files that actually exist for various software packages, blogging platforms, and other web applications. Of course, for sites that do not use these files, blacklisting is a legitimate solution, but we certainly wouldn’t want to block any files that are actually in use. Two possibilities exist for defending against malicious requests that contain actual file names. We could either block any requests for such files that are not coming from our own server, or block any requests that deviate from the actual file path. In either case, the rules required to accomplish this transcend the current functionality provided by the 4G Blacklist.




And finally, I leave you with this question: what are the consequences of blacklisting single versus double backslashes?

\ vs. \\
[ Building the Hoover Dam, Part 11 ]

Somebody Stop Me

That’s it for this fun-filled article. I would be surprised if anyone actually read all the way through, but then again, that’s not really the point of the exercise. After developing the 4G Blacklist, I wanted a clear, concise summary of all the notes and thinking that went into its creation. By combining everything into this single post, I save myself time and effort when referring to this information in the future. And hopefully, by publishing this lengthy diatribe on the Web, it will be of some educational value to others as well. In either case, stay tuned, because the 4G Blacklist is coming up in the next article..

About the Author
Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.
WP Themes In Depth: Build and sell awesome WordPress themes.

35 responses to “Building the Perishable Press 4G Blacklist”

  1. Deb Phillips 2009/03/12 12:46 pm

    Wow, what a scary, vicious Internet world we live in. It’s a world where “Live and let live” is a Pollyanna concept. I found this post content completely compelling and a confirmation of some of my suspicions regarding certain activity I’ve noticed on my server.

    Thank you for all your work, and for sharing it with us. I really look forward to the 4G Blacklist — and I can’t wait to implement it.

    All the best,

  2. Jeff Starr 2009/03/15 4:52 pm

    @Sam Beale: Well I’m certainly glad the tiny scrollbar didn’t keep you from reading! Thanks for the excellent feedback — it is my honor to share my work with you. Thanks for reading! :)

    @Deb Phillips: Ah yes, don’t even get me started on what a crazy world this is! It’s too bad that there are people out there who just don’t “get it”, and insist on messing with other people and their stuff. Before I fly off on a tangent here, allow me just to say “thank you” for taking the time to comment on the article — it is much appreciated. And, if all goes well, the 4G should be released early this week.


  3. As a non technical person I have persevered and read the whole thing all the way through, referring to various other sources to try to understand better what you are doing.
    I look forward to the 4g list and my attempt at installing it on my Joomla site.
    All I can say is WOW!
    Brilliant work, keep the faith.

  4. Jeff Starr 2009/04/06 1:11 pm

    @Mike: Thanks Mike, much appreciated. The 4G Blacklist is now available.


  5. Thank you for this most informative read. I took the time to read this article because I am preparing to harden my framework. I am always trying to make applications that are as interoperable as possible. I am now trying to decide if I should simply implement your blacklist our if I should work to build my own.

    One reason I have for building it myself, but using your valuable insight is to of course learn. Another reason is that my framework is hand coded and does not use Word Press or any other CMS, although Drupal is a model for the design, but the implementation is unique to my needs.

    Either way i thank you for your insight and for taking the time to write such an article.

  6. Jeff Starr 2009/10/13 8:26 am

    @WolfRage: Thanks for the feedback.

    I encourage anyone who is interested to do their own research and build their own customized blacklist. There is no single “perfect” blacklist that will suit all situations, so assembling one according to your specific needs seems like an ideal strategy.

    If you do end up with something solid, I encourage you to share it with the community to help others who may be running a similar setup.

    Hopefully my work on this and my many other blacklists will prove useful.

  7. I will be sure to share what I learn with others just as you have with us.

    The depth of your site is quite impressive, how long have you been at this? I ask only because every time I come back, I find something new and interesting.

    I will have to work my way through your articles over the coming months so please excuse; if my comments pop up on older articles.

  8. Jeff Starr 2009/10/19 7:47 am

    Hi WolfRage, I have been working online since around 2000, and began this site specifically in late 2005. As you can see, I have spent a great deal of time on it.

    Looking forward to hearing more from you on other posts. Some of the older ones are now closed, but there are still plenty open for comments.

  9. michael soriano 2011/12/12 10:22 am

    do you recomment pasting this code in the root wp folder only? or subdirectories as well such as “wp-admin” “wp-content” etc.

    also, has anybody tried this with vaultpress plugin?

  10. Yeah the root folder enables the rules to protect everything. Moving it to another location is fine also, but the execution is top-down, so anything outside of the directory won’t be protected by 5G. No need to include multiple copies in any subdirectories (wp-admin, et al) — one copy in root should do the trick.

  11. michael soriano 2011/12/13 8:42 am

    Thanks Jeff. You can add Vaultpress in your plugin list. It works fine.

Comments are closed for this post. Something to add? Let me know.
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Blackhole Pro: Trap bad bots in a virtual black hole.
Crazy that we’re almost halfway thru 2024.
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.