.htaccess Cleanup
Once again I am cleaning up my sites’ .htaccess files. I do this from time to time to remove old redirects, refresh blacklists, and update security and SEO-related directives. It’s tedious work, but the performance and security benefits make it all worthwhile. This post shares some of the techniques that were added, removed, or replaced from .htaccess, and explains the reasoning behind each decision. I do this for the sake of reference, and hopefully it will give you some ideas for your own .htaccess cleanups.
Menu
- Control Request Methods
- Redirect Robots.txt Requests
- Block Bad Referrers
- Allow POST for Admin Only
- Log PHP Errors
- Add MIME Types
- Block Bad Hosts and User Agents
- Block POST for User Agent
- Secure Login Page
- Protect xmlrpc.php
- Block Nuisance Requests
Control Request Methods
After years of including this snippet at Perishable Press, I finally decided to remove it. It remains in place on some of my other sites, because it is a useful technique. But for this site, which is over 12 years old, the .htaccess file is quite large, so I need to really scrutinize each and every directive in order to optimize for performance as much as possible.
# Control Request Methods
<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_METHOD} ^(delete|head|trace|track) [NC]
RewriteRule .* - [F,L]
</IfModule>
For more in-depth discussion about this technique, check out my tutorial, Control Request Methods.
Redirect Robots.txt Requests
What could be easier than finding a site’s robots.txt file? By design, it’s located in the root directory of the site, right? Unfortunately this is far too difficult for certain scripts and scanners, which endlessly probe for robots.txt
in all the wrong places, for example:
http://example.com/maybe/robots.txt
http://example.com/or/maybe/here/robots.txt
http://example.com/okay/its/gotta/be/here/robots.txt
http://example.com/well/then/surely/its/located/here/robots.txt
http://example.com/please/help/i/cant/find/your/sites/robots.txt
.
.
.
Finally getting sick of idiotic scripts that can’t seem to locate the site’s robots.txt
file, I decided to craft an .htaccess technique to redirect all requests for robots.txt to the actual file. Here is the code:
# Redirect Robots.txt Requests
<IfModule mod_alias.c>
RedirectMatch 301 (?<!^)/robots.txt$ /robots.txt
</IfModule>
No editing is required, just add to your site’s root .htaccess file and enjoy the extra server resources. To see how it works, try appending robots.txt
to any URL on this domain; you’ll immediately be redirected to the site’s robots.txt
file.
Block Bad Referrers
I’ve spent a LOT of time writing .htaccess rules to block bad referrers. While giant blacklists of bad referrers can be effective, they’re not ideal in terms of maintainability or performance. So for the past several years, my strategy is to just block bad referrers on an as-needed basis. Like if some creepy scumbag keeps spamming my site from “soprotivlenie”, I can immediately block all access with a simple technique.
# Block Bad Referrers
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_REFERER} soprotivlenie [NC]
RewriteRule .* - [F,L]
</IfModule>
This can be very effective, but really is only required for short-term protection. Eventually, most spammy referrers disappear or change their reported referrer information, so blacklisting by referrer isn’t as feasible as other protective methods.
Allow POST for Admin Only
Here is a technique that allows POST requests only for the admin. This is useful when the site only has one or maybe a few users who should be able to make POST requests (i.e., submit forms using the POST method). For example, if I am the only admin, and work from an IP of 123.456.789.000
, I can whitelist the address using the following code:
# ADMIN POST ONLY
<IfModule mod_rewrite.c>
RewriteCond %{THE_REQUEST} ^POST(.*)HTTP/1\.(0|1|2|3)$ [NC]
RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000
RewriteRule .* - [F,L]
</IfModule>
Here we are checking if the request is a POST request. If so, then it checks that the IP address is NOT my own. If both conditions are true, the POST request is denied via 403 “Forbidden” response. This technique ensures that nobody other than you can POST data to your site. Really strong way of blocking bad requests and keeping things secure.
What if you have a contact form for which you want to allow POST requests for all users. It only makes sense, and it is easily accomplished by adding a few lines:
# ADMIN POST ONLY
<IfModule mod_rewrite.c>
RewriteCond %{THE_REQUEST} ^POST(.*)HTTP/1\.1$ [NC]
RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000
RewriteCond %{REQUEST_URI} !/contact/ [NC]
RewriteCond %{REQUEST_URI} !/contact-process.php [NC]
# RewriteCond %{HTTP_REFERER} !your-domain.tld [NC]
RewriteRule .* - [F,L]
</IfModule>
Here we are whitelisting POST requests for my contact form. You can edit the URIs to match whatever is required by your contact form (or whatever form). Keep in mind that forms may submit their data to a script/URI that is not the same as the page URI. This is why there are two URIs added to the previous rules:
/contact/
= web page that displays the form
/contact-process.php
= actual script that processes the form
Also, to allow more IPs, simply repeat the entire REMOTE_ADDR
line and change the IP to whichever address you want to allow.
Log PHP Errors
This .htaccess snippet was removed and replaced by a PHP-based error/log script. To log PHP errors, add the following code to your site’s root .htaccess file:
# LOG PHP ERRORS
php_flag display_startup_errors off
php_flag display_errors off
php_flag html_errors off
php_flag log_errors on
php_value error_log /var/www/example.com/public/php-errors.log
Remember to edit the path specified for error_log
to match your own. Also make sure the file that you are using exists and is writable. For more information on logging errors with PHP and .htaccess, check out these tutorials:
- How to Enable PHP Error Logging via htaccess
- Advanced PHP Error Handling via PHP
- Advanced PHP Error Handling via htaccess
Add MIME Types
Every server is different in terms of which MIME types are supported. On my server, for example, I needed to add support for some video, image, and font formats:
# MIME TYPES
<IfModule mod_mime.c>
# ADD MIME TYPE: VIDEO
AddType video/ogg .ogv
AddType video/mp4 .mp4
AddType video/webm .webm
# ADD MIME TYPE: IMAGES
AddType image/svg+xml svg
# ADD MIME TYPE: FONTS
AddType application/vnd.ms-fontobject .eot
AddType font/ttf .ttf
AddType font/otf .otf
AddType application/x-font-woff .woff
AddType application/font-woff .woff
</IfModule>
Your needs will vary depending on server setup. Check out more useful MIME types that can be added via .htaccess.
Block Bad Hosts and User Agents
Apart from my work with the 6G Firewall and other blacklists, I usually block bad requests as they happen, so each of my sites has a working section in its .htaccess file that looks something like this:
# BLOCK BAD REQUESTS
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_HOST} (.*)reverse\.softlayer\.com(.*) [NC,OR]
RewriteCond %{HTTP_HOST} (.*)\.crimea\.com(.*) [NC,OR]
RewriteCond %{HTTP_HOST} (.*)s368\.loopia\.se(.*) [NC,OR]
RewriteCond %{HTTP_HOST} (.*)kanagawa\.ocn(.*) [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} QuerySeekerSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} undefined [NC,OR]
RewriteCond %{HTTP_USER_AGENT} siclab [NC]
RewriteRule .* - [F,L]
</IfModule>
Here I am blocking numerous bad hosts and user agents. I use this technique for short-term traffic control on an as-needed basis. Note that it’s easy to block additional hosts and/or user agents by simple duplicating a line and changing the targeted pattern to whatever you would like to block. Be mindful, however that the last RewriteCond
does NOT include the OR
flag.
Note: the rule that blocks ia_archiver
is disabled because you may not want to block that particular agent. To block it, remove the #
from the beginning of the directive.
Block POST for User Agent
Here is a simple code snippet that blocks the Super_Spam_Sucker
(notorious spammer) from posting any data to the site.
# BLOCK SPAMMER
<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{HTTP_USER_AGENT} Super_Spam_Sucker [NC]
RewriteRule .* - [F,L]
</IfModule>
Blocking POST requests and user agents is covered in previous sections of this tutorial. This code demonstrates how to combine the two techniques. That is all.
Secure Login Page
Your site’s login page is one of the most heavily targeted by malicious scripts, spammers, and other resource-wasting scum. For sites where only one or a few users needs to log in, you can secure the login page with a simple slice of .htaccess:
# SECURE LOGIN PAGE
<Files wp-login.php>
Order Deny,Allow
Deny from all
Allow from 123.456.789.000
Allow from 000.987.654.321
</Files>
There are two requirements to implement this code:
- Change the IP addresses to match your own
- The code needs to be included in the .htaccess file that is located in your WP root directory (which may be different than the root directory of your site)
I use this technique here at Perishable Press and at other sites. It’s super effective at keeping the login page nice and secure.
Protect xmlrpc.php
If your site is not using any XMLRPC functionality, you can tighten security by denying access to your xmlrpc.php
file. For example, here at Perishable Press, I do not use xmlrpc.php
for anything, so I add the following rule to the WP root .htaccess file:
# PROTECT xmlrpc.php
<IfModule mod_alias.c>
RedirectMatch 403 (?i)/xmlrpc\.php
</IfModule>
This blocks all requests for the WordPress xmlrpc.php
file. No editing is required, strictly plug-&-play. For more information about why this technique is useful, check out my articles, Protect Against WordPress Brute Force Amplification Attack and Protection for WordPress Pingback Vulnerability.
Block Nuisance Requests
Here are some additional blocking rules that I updated or removed from this site’s .htaccess file. For some unknown reason, I kept getting requests for router.php
and ).html(
. Knowing for certain that none of the site’s URLs contain either of those strings, I solved the problem with the following code snippet:
# BLOCK IDIOT REQUESTS
<IfModule mod_alias.c>
RedirectMatch 403 router\.php
RedirectMatch 403 /\)\.html\(
</IfModule>
I also kept getting bizarre query-string requests for Google’s humans.txt
file. Seriously annoying and quite voluminous, so I blocked the request with this:
# BLOCK IDIOT REQUESTS
<IfModule mod_rewrite.c>
RewriteCond %{QUERY_STRING} http\:\/\/www\.google\.com\/humans\.txt\? [NC]
RewriteRule .* - [F,L]
</IfModule>
That checks the query-string of each request. If it contains the specified string, the request is blocked via 403 “Forbidden” response. This is a good example of targeting the query string; to target other aspects of the request, check out Eight Ways to Blacklist with Apache’s mod_rewrite.