Blank Space / Whitespace Character for .htaccess

.htaccess made easy

Working on the next version of the G-Series Blacklist, I needed a way to match a wide variety of UTF-8-encoded (hex) character strings. Those familiar with their site’s traffic will recognize this particular type of URI request string, which is typically associated with malicious server scanning, exploits, and other malicious behavior. As I explain in this post, pattern-matching and blocking the blank-space, or whitespace character in URL-requests is an effective way to improve the security of your website.

Skip the explanation and grab the code

Examples of blank-space characters in URL requests

Here is a selection of malicious URL patterns that I want to match and block using 6G blacklist techniques (via the UTF-8 (hex) encoder):

UTF-8 encoded Decoded request
http://example.com/hack%20*/ http://example.com/hack */
http://example.com/%3Ca%20href= http://example.com/<a href=
http://example.com/%5bNext%20URL%20in%20series%5d http://example.com/[Next URL in series]
http://example.com/XHTML%20Document%20Header%20Resource http://example.com/XHTML Document Header Resource
http://example.com/%22%20title=%22%22%20rel=%22nofollow http://example.com/" title="" rel="nofollow
http://example.com/Apache%20Module%20mod_authz_host http://example.com/Apache Module mod_authz_host
http://example.com/%27.%20get_permalink()%20. http://example.com/'. get_permalink() .
http://example.com/search/%20%20%20/page/13/ http://example.com/search/ /page/13/
http://example.com/%20%20%20/page/8/ http://example.com/ /page/8/
http://example.com/%3Ca%20href= http://example.com/<a href=
http://example.com/%20*/ http://example.com/ */

This gives you an idea of what these encoded requests are targeting using the UTF-8 (hex)-encoded characters. According to HTTP Specification, any character that is not one of the following must be encoded in order to appear legitimately within URLs:

Regular-use characters - allowed unencoded within URLs

$ - _ . + ! * ' ( ) ,

0 1 2 3 4 5 6 7 8 9

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Not included in this “safe-character” list, the humble white space (or blank space) must be encoded when included in the URL. As explained by the Network Working Group:

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs.

Looking back at our target URLs, we find that the least common denominator is the encoded whitespace character, %20. Oh sure, there are plenty of other encoded characters that could be targeted, but zeroing in on blank spaces in the URL is an effective way to catch and block many of these types of malicious requests.

How to match the blank-space/whitespace character with .htaccess

Now that we have a reason to do so, let’s use .htaccess to match and block all URL requests that include one or more whitespace characters. It’s as simple as adding this line to your root .htaccess file

<IfModule mod_alias.c>
 RedirectMatch 403 \s
</IfModule>

So the punchline to this diatribe is that an escaped “s” character (\s) is the regex to match blank spaces when using .htaccess directives via mod_alias (RedirectMatch) and mod_rewrite (RewriteRule). Here is an example using Apache’s mod_rewrite:

<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} !^/$
 RewriteCond %{REQUEST_URI} \s
 RewriteRule .* http://perishablepress.com/ [R=301,L]
</IfModule>

This example will redirect any requests that include whitespace to the home page (edit to match your own URL). To block them instead, replace the RewriteRule with this:

RewriteRule .* - [F,L]

Note that it doesn’t matter if the initial requests are encoded or not — the end result of any encoded request is the un-encoded, canonical URL (not including the query string), so targeting literal whitespace in the request URI is effective. In fact, you should only use this method if you know what you are doing and are certain that none of your URLs contain whitespace or blank spaces.

Matching whitespace in query strings

Wrapping up, here is how to block blank spaces in the query-string portion of the URL, which is impossible using either of the previous two examples. Using mod_rewrite, we can target the %{QUERY_STRING} variable to catch any whitespace:

<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} !^/$
 RewriteCond %{QUERY_STRING} \s
 RewriteRule .* - [F,L]
</IfModule>

No editing required — just drop into your .htaccess file and good to go. As always, comments and questions welcome, and thanks for reading! :)