Blank Space / Whitespace Character for .htaccess
Working on the next version of the G-Series Blacklist, I needed a way to match a wide variety of UTF-8-encoded (hex) character strings. Those familiar with their site’s traffic will recognize this particular type of URI request string, which is typically associated with malicious server scanning, exploits, and other malicious behavior. As I explain in this post, pattern-matching and blocking the blank-space, or whitespace character in URL-requests is an effective way to improve the security of your website.
Examples of blank-space characters in URL requests
Here is a selection of malicious URL patterns that I want to match and block using 6G blacklist techniques (via the UTF-8 (hex) encoder):
UTF-8 encoded | Decoded request |
---|---|
http://example.com/hack%20*/ |
http://example.com/hack */ |
http://example.com/%3Ca%20href= |
http://example.com/<a href= |
http://example.com/%5bNext%20URL%20in%20series%5d |
http://example.com/[Next URL in series] |
http://example.com/XHTML%20Document%20Header%20Resource |
http://example.com/XHTML Document Header Resource |
http://example.com/%22%20title=%22%22%20rel=%22nofollow |
http://example.com/" title="" rel="nofollow |
http://example.com/Apache%20Module%20mod_authz_host |
http://example.com/Apache Module mod_authz_host |
http://example.com/%27.%20get_permalink()%20. |
http://example.com/'. get_permalink() . |
http://example.com/search/%20%20%20/page/13/ |
http://example.com/search/ /page/13/ |
http://example.com/%20%20%20/page/8/ |
http://example.com/ /page/8/ |
http://example.com/%3Ca%20href= |
http://example.com/<a href= |
http://example.com/%20*/ |
http://example.com/ */ |
This gives you an idea of what these encoded requests are targeting using the UTF-8 (hex)-encoded characters. According to HTTP Specification, any character that is not one of the following must be encoded in order to appear legitimately within URLs:
Regular-use characters - allowed unencoded within URLs
$ - _ . + ! * ' ( ) ,
0 1 2 3 4 5 6 7 8 9
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Not included in this “safe-character” list, the humble white space (or blank space) must be encoded when included in the URL. As explained by the Network Working Group:
Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs.
Looking back at our target URLs, we find that the least common denominator is the encoded whitespace character, %20
. Oh sure, there are plenty of other encoded characters that could be targeted, but zeroing in on blank spaces in the URL is an effective way to catch and block many of these types of malicious requests.
How to match the blank-space/whitespace character with .htaccess
Now that we have a reason to do so, let’s use .htaccess to match and block all URL requests that include one or more whitespace characters. It’s as simple as adding this line to your root .htaccess file
<IfModule mod_alias.c>
RedirectMatch 403 \s
</IfModule>
So the punchline to this diatribe is that an escaped “s” character (\s
) is the regex to match blank spaces when using .htaccess directives via mod_alias
(RedirectMatch
) and mod_rewrite
(RewriteRule
). Here is an example using Apache’s mod_rewrite
:
<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{REQUEST_URI} \s
RewriteRule .* https://perishablepress.com/ [R=301,L]
</IfModule>
This example will redirect any requests that include whitespace to the home page (edit to match your own URL). To block them instead, replace the RewriteRule
with this:
RewriteRule .* - [F,L]
Note that it doesn’t matter if the initial requests are encoded or not — the end result of any encoded request is the un-encoded, canonical URL (not including the query string), so targeting literal whitespace in the request URI is effective. In fact, you should only use this method if you know what you are doing and are certain that none of your URLs contain whitespace or blank spaces.
Matching whitespace in query strings
Wrapping up, here is how to block blank spaces in the query-string portion of the URL, which is impossible using either of the previous two examples. Using mod_rewrite
, we can target the %{QUERY_STRING}
variable to catch any whitespace:
<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{QUERY_STRING} \s
RewriteRule .* - [F,L]
</IfModule>
No editing required — just drop into your .htaccess file and good to go. As always, comments and questions welcome, and thanks for reading! :)
2 responses to “Blank Space / Whitespace Character for .htaccess”
Excellent idea!!!
I’m running my sites on a VPS with configserver firewall and was wondering how much nefarious stuff would be caught before .htaccess kicks in.
I would imagine including the wrong things in .htaccess would either expose a person or actually shoot them in the foot.
Excellent code! But I can not make it work. When someone agrees to:
mysite.com/whatever-content/% 20%20;
receive an “Access Forbidden”. But if I agree to:mysite.com/whatever-content/
(space invisible, no “%20%20
”) is redirected properly. Any ideas? This has me somewhat concerned, there are many people that you can not see my site … = (Thanks!