.htaccess Cleanup

.htaccess made easy

Once again I am cleaning up my sites’ .htaccess files. I do this from time to time to remove old redirects, refresh blacklists, and update security and SEO-related directives. It’s tedious work, but the performance and security benefits make it all worthwhile. This post shares some of the techniques that were added, removed, or replaced from .htaccess, and explains the reasoning behind each decision. I do this for the sake of reference, and hopefully it will give you some ideas for your own .htaccess cleanups.

Menu

Control Request Methods

After years of including this snippet at Perishable Press, I finally decided to remove it. It remains in place on some of my other sites, because it is a useful technique. But for this site, which is over 12 years old, the .htaccess file is quite large, so I need to really scrutinize each and every directive in order to optimize for performance as much as possible.

# Control Request Methods
<IfModule mod_rewrite.c>
	RewriteCond %{REQUEST_METHOD} ^(delete|head|trace|track) [NC]
	RewriteRule .* - [F,L]
</IfModule>

For more in-depth discussion about this technique, check out my tutorial, Control Request Methods.

Redirect Robots.txt Requests

What could be easier than finding a site’s robots.txt file? By design, it’s located in the root directory of the site, right? Unfortunately this is far too difficult for certain scripts and scanners, which endlessly probe for robots.txt in all the wrong places, for example:

http://example.com/maybe/robots.txt
http://example.com/or/maybe/here/robots.txt
http://example.com/okay/its/gotta/be/here/robots.txt
http://example.com/well/then/surely/its/located/here/robots.txt
http://example.com/please/help/i/cant/find/your/sites/robots.txt
.
.
.

Finally getting sick of idiotic scripts that can’t seem to locate the site’s robots.txt file, I decided to craft an .htaccess technique to redirect all requests for robots.txt to the actual file. Here is the code:

# Redirect Robots.txt Requests
<IfModule mod_alias.c>
	RedirectMatch 301 (?<!^)/robots.txt$ /robots.txt
</IfModule>

No editing is required, just add to your site’s root .htaccess file and enjoy the extra server resources. To see how it works, try appending robots.txt to any URL on this domain; you’ll immediately be redirected to the site’s robots.txt file.

Block Bad Referrers

I’ve spent a LOT of time writing .htaccess rules to block bad referrers. While giant blacklists of bad referrers can be effective, they’re not ideal in terms of maintainability or performance. So for the past several years, my strategy is to just block bad referrers on an as-needed basis. Like if some creepy scumbag keeps spamming my site from “soprotivlenie”, I can immediately block all access with a simple technique.

# Block Bad Referrers
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_REFERER} soprotivlenie [NC]
	RewriteRule .* - [F,L]
</IfModule>

This can be very effective, but really is only required for short-term protection. Eventually, most spammy referrers disappear or change their reported referrer information, so blacklisting by referrer isn’t as feasible as other protective methods.

Allow POST for Admin Only

Here is a technique that allows POST requests only for the admin. This is useful when the site only has one or maybe a few users who should be able to make POST requests (i.e., submit forms using the POST method). For example, if I am the only admin, and work from an IP of 123.456.789.000, I can whitelist the address using the following code:

# ADMIN POST ONLY
<IfModule mod_rewrite.c>
	RewriteCond %{THE_REQUEST} ^POST(.*)HTTP/1\.(0|1|2|3)$ [NC]
	RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000
	RewriteRule .* - [F,L]
</IfModule>

Here we are checking if the request is a POST request. If so, then it checks that the IP address is NOT my own. If both conditions are true, the POST request is denied via 403 “Forbidden” response. This technique ensures that nobody other than you can POST data to your site. Really strong way of blocking bad requests and keeping things secure.

What if you have a contact form for which you want to allow POST requests for all users. It only makes sense, and it is easily accomplished by adding a few lines:

# ADMIN POST ONLY
<IfModule mod_rewrite.c>
	RewriteCond %{THE_REQUEST} ^POST(.*)HTTP/1\.1$ [NC]
	RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000
	RewriteCond %{REQUEST_URI} !/contact/ [NC]
	RewriteCond %{REQUEST_URI} !/contact-process.php [NC]
	# RewriteCond %{HTTP_REFERER} !your-domain.tld [NC]
	RewriteRule .* - [F,L]
</IfModule>

Here we are whitelisting POST requests for my contact form. You can edit the URIs to match whatever is required by your contact form (or whatever form). Keep in mind that forms may submit their data to a script/URI that is not the same as the page URI. This is why there are two URIs added to the previous rules:

/contact/ = web page that displays the form
/contact-process.php = actual script that processes the form

Also, to allow more IPs, simply repeat the entire REMOTE_ADDR line and change the IP to whichever address you want to allow.

Log PHP Errors

This .htaccess snippet was removed and replaced by a PHP-based error/log script. To log PHP errors, add the following code to your site’s root .htaccess file:

# LOG PHP ERRORS
php_flag display_startup_errors off
php_flag display_errors off
php_flag html_errors off
php_flag log_errors on
php_value error_log /var/www/example.com/public/php-errors.log

Remember to edit the path specified for error_log to match your own. Also make sure the file that you are using exists and is writable. For more information on logging errors with PHP and .htaccess, check out these tutorials:

Add MIME Types

Every server is different in terms of which MIME types are supported. On my server, for example, I needed to add support for some video, image, and font formats:

# MIME TYPES
<IfModule mod_mime.c>
	# ADD MIME TYPE: VIDEO
	AddType video/ogg .ogv
	AddType video/mp4 .mp4
	AddType video/webm .webm

	# ADD MIME TYPE: IMAGES
	AddType image/svg+xml svg

	# ADD MIME TYPE: FONTS
	AddType application/vnd.ms-fontobject .eot
	AddType font/ttf .ttf
	AddType font/otf .otf
	AddType application/x-font-woff .woff
	AddType application/font-woff .woff
</IfModule>

Your needs will vary depending on server setup. Check out more useful MIME types that can be added via .htaccess.

Block Bad Hosts and User Agents

Apart from my work with the 6G Firewall and other blacklists, I usually block bad requests as they happen, so each of my sites has a working section in its .htaccess file that looks something like this:

# BLOCK BAD REQUESTS
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_HOST} (.*)reverse\.softlayer\.com(.*) [NC,OR]
	RewriteCond %{HTTP_HOST} (.*)\.crimea\.com(.*) [NC,OR]
	RewriteCond %{HTTP_HOST} (.*)s368\.loopia\.se(.*) [NC,OR]
	RewriteCond %{HTTP_HOST} (.*)kanagawa\.ocn(.*) [NC,OR]
	# RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} QuerySeekerSpider [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} undefined [NC,OR]
	RewriteCond %{HTTP_USER_AGENT} siclab [NC]
	RewriteRule .* - [F,L]
</IfModule>

Here I am blocking numerous bad hosts and user agents. I use this technique for short-term traffic control on an as-needed basis. Note that it’s easy to block additional hosts and/or user agents by simple duplicating a line and changing the targeted pattern to whatever you would like to block. Be mindful, however that the last RewriteCond does NOT include the OR flag.

Note: the rule that blocks ia_archiver is disabled because you may not want to block that particular agent. To block it, remove the # from the beginning of the directive.

Block POST for User Agent

Here is a simple code snippet that blocks the Super_Spam_Sucker (notorious spammer) from posting any data to the site.

# BLOCK SPAMMER
<IfModule mod_rewrite.c>
	RewriteCond %{REQUEST_METHOD} POST
	RewriteCond %{HTTP_USER_AGENT} Super_Spam_Sucker [NC]
	RewriteRule .* - [F,L]
</IfModule>

Blocking POST requests and user agents is covered in previous sections of this tutorial. This code demonstrates how to combine the two techniques. That is all.

Secure Login Page

Your site’s login page is one of the most heavily targeted by malicious scripts, spammers, and other resource-wasting scum. For sites where only one or a few users needs to log in, you can secure the login page with a simple slice of .htaccess:

# SECURE LOGIN PAGE
<Files wp-login.php>
	Order Deny,Allow
	Deny from all
	Allow from 123.456.789.000
	Allow from 000.987.654.321
</Files>

There are two requirements to implement this code:

  • Change the IP addresses to match your own
  • The code needs to be included in the .htaccess file that is located in your WP root directory (which may be different than the root directory of your site)

I use this technique here at Perishable Press and at other sites. It’s super effective at keeping the login page nice and secure.

Protect xmlrpc.php

If your site is not using any XMLRPC functionality, you can tighten security by denying access to your xmlrpc.php file. For example, here at Perishable Press, I do not use xmlrpc.php for anything, so I add the following rule to the WP root .htaccess file:

# PROTECT xmlrpc.php
<IfModule mod_alias.c>
	RedirectMatch 403 (?i)/xmlrpc\.php
</IfModule>

This blocks all requests for the WordPress xmlrpc.php file. No editing is required, strictly plug-&-play. For more information about why this technique is useful, check out my articles, Protect Against WordPress Brute Force Amplification Attack and Protection for WordPress Pingback Vulnerability.

Block Nuisance Requests

Here are some additional blocking rules that I updated or removed from this site’s .htaccess file. For some unknown reason, I kept getting requests for router.php and ).html(. Knowing for certain that none of the site’s URLs contain either of those strings, I solved the problem with the following code snippet:

# BLOCK IDIOT REQUESTS
<IfModule mod_alias.c>
	RedirectMatch 403 router\.php
	RedirectMatch 403 /\)\.html\(
</IfModule>

I also kept getting bizarre query-string requests for Google’s humans.txt file. Seriously annoying and quite voluminous, so I blocked the request with this:

# BLOCK IDIOT REQUESTS
<IfModule mod_rewrite.c>
	RewriteCond %{QUERY_STRING} http\:\/\/www\.google\.com\/humans\.txt\? [NC]
	RewriteRule .* - [F,L]
</IfModule>

That checks the query-string of each request. If it contains the specified string, the request is blocked via 403 “Forbidden” response. This is a good example of targeting the query string; to target other aspects of the request, check out Eight Ways to Blacklist with Apache’s mod_rewrite.