Perishable Press HTAccess Spring Cleaning, Part 2
Before Summer arrives, I need to post the conclusion to my seasonal article, Perishable Press HTAccess Spring Cleaning, Part 1. As explained in the first post, I recently spent some time to consolidate and optimize the Perishable Press site-root and blog-root HTAccess files. Since the makeover, I have enjoyed better performance, fewer errors, and cleaner code. In this article, I share some of the changes made to the blog-root HTAccess file and provide a brief explanation as to their intended purpose. Granted, most of the blog-root directives affected by the renovation involve redirecting broken/missing URLs, but there are some other gems mixed in as well. In sharing these deprecated excerpts, I hope to inspire others to improve their own HTAccess and/or configuration files. What an excellent way to wrap up this delightful Spring season! :)
Step 1: Eliminate Duplicate Code
Comparing my site-root HTAccess file to my blog-root HTAccess file, I noticed several repetitious code blocks. As HTAccess directives affect all subordinate directories, the following directives are no longer necessary because they are included in the site’s root web directory:
PHP error display and logging rules:
# disable display of php errors
php_flag display_startup_errors off
php_flag display_errors off
php_flag html_errors off
# PHP error logging
php_flag log_errors on
php_value error_log /the/path/to/php_error.log
Basic configuration and default character set:
# basic configurationz
Options +FollowSymLinks
Options All -Indexes
ServerSignature Off
RewriteEngine on
# default character set
AddDefaultCharset UTF-8
AddLanguage en-US .html .htm .css .js
Step 2: Rethink Existing Code
An excellent way to stop comment spam involves blocking no-referrer requests. The following code had been in place since the publication of the corresponding article. Somewhere along the way, the method was revamped to exclude several major user-agents from the block. Of course, this is just silly and I honestly don’t remember why I felt it was important. Needless to say, I replaced the following, bloated method with the original version:
Block comment spam by denying access to no-referrer requests
# block no-referrer requests
<ifmodule mod_rewrite.c>
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .*wp-comments-post\.php
RewriteCond %{HTTP_REFERER} !.*perishablepress.com.* [OR,NC]
RewriteCond %{HTTP_USER_AGENT} !^.*mozilla.* [OR,NC]
RewriteCond %{HTTP_USER_AGENT} !^.*google.* [OR,NC]
RewriteCond %{HTTP_USER_AGENT} !^.*slurp.* [OR,NC]
RewriteCond %{HTTP_USER_AGENT} !^.*msn.* [NC]
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]
RewriteRule .* - [F,L]
</ifmodule>
Step 3: Optimize Existing Code
Several years ago, while beginning my journey into the fascinating realms of HTAccess, I embraced the importance of protecting my image and multimedia content from the grubby hands of unscrupulous hotlinkers. At the time, I had established an admittedly barbaric anti-hotlinking strategy, which gradually evolved into this ghastly behemoth:
Anti-hotlinking directives
# anti-hotlinking
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https://monzillamedia.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://monzillamedia.com$ [NC]
RewriteCond %{HTTP_REFERER} !^https://www.monzillamedia.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://www.monzillamedia.com$ [NC]
RewriteCond %{HTTP_REFERER} !^https://perishablepress.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://perishablepress.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://labs.perishablepress.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://labs.perishablepress.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.perishablepress.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.perishablepress.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://planetwordpress.planetozh.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://planetwordpress.planetozh.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.google.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.google.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.netvibes.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.netvibes.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.google.com/reader/view/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.google.com/reader/m/view/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.feedburner.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://feeds.feedburner.com/yourfeedname$ [NC]
RewriteCond %{HTTP_REFERER} !^https://feeds.feedburner.com/yourfeednamecomments$ [NC]
RewriteRule .*\.(gif|jpg|jpeg|png|bmp|js|css|zip|mp3|avi|wmv|mpg|mpeg|tif|tiff|raw|swf)$ https://perishablepress.com/hotlink.jpe [R,NC,L]
# RewriteRule .*\.(gif|jpg|jpeg|png|bmp|js|css|zip|mp3|avi|wmv|mpg|mpeg|tif|tiff|raw|swf)$ - [F,NC]
Without getting into it, suffice it to say that this is major overkill. Since implementing and expanding this madness, I have studied the technique in-depth and developed an optimal anti-hotlinking strategy. Thus, this HTAccess nightmare is now replaced with a much leaner, more accurate ruleset.
Step 4: Remove Deprecated Redirects
Over the course of time, any responsible webmaster inevitably will be faced with excessively large numbers of HTAccess redirects. Either permanent or temporary, individual redirects are important for resolving 404 errors resulting from misplaced or relocated files, misdirected external links, and so on. At some point, webmasters have two options: leave the rules in place and keep adding to them as needed, or purge and prune the rules as much as possible. Needless to say, I decided to clean things up a bit. After much testing and research, I managed to reduce my collection of HTAccess redirects by around 75%. Granted, after clearing things out, I experienced a significant increase in 404 errors, however, the situation is slowing improving as the search engines continue to update their databases. Here is a peek at the collection of 301 redirects that have been removed from my blog root HTAccess file:
Audi 5000 G
That’s it for this fun-filled adventure into HTAccess land. Hopefully, this article has inspired you to optimize and streamline your own HTAccess strategy. With a little time, a focused mind, and a knowledgeable guide ;), improving your site’s performance is easily accomplished. — Happy Spring Cleaning! :)
8 responses to “Perishable Press HTAccess Spring Cleaning, Part 2”
Hey Jeff,
Couldn’t those two lines be merged into one ?
Options +FollowSymLinks
Options All -Indexes
to something like
Options All -Indexes +FollowSymLinks
?
Also, I don’t have the time to developp it a lot right now, but in short : I’m starting to wonder about the conditionnal calls you place all over your HTAccess rules.
I mean, what is the best behavior we would want in case of emergency : the page is not rendered at all (Internal Error), or the page is rendered but the HTAccess rules are broken.
It seems to me that the first case is lot more secure. What do you think ?
Hi Louis, good call on the merging of
Options
parameters. I will be adopting the practice henceforth. Thank you!As for using “conditional calls” throughout HTAccess, you raise an interesting point. Perhaps the conditional calls should not be used for security-related directives(?) For less-critical functionality, however, it may come down to personal preference. For example, many of the conditional calls I use “all over” my HTAccess files involve redirects and other cosmetic enhancements. For things such as caching (
mod_expires
), setting headers (mod_headers
), and checking spelling (mod_speling
), I would rather not have my entire site crashed and unavailable until I fix it. Instead, I would much rather deliver pages to visitors and worry about the missing modules after checking my error logs. For security-related directives, however, it may be better to deliver 500 errors than to leave a gaping security hole for the world to see..I’m glad you share my point of view on the security issue. I think it’s underestimated, I mean, I’ve never read a post on that topic on any important blog.
We build these rules (my HTAccess is starting to get pretty big) for security reason most of the time, so having them disable without any warning is a serious problem.
Apache has a terrible error handling. It’s like unobstrusive JavaScript, but in the wrong context! I don’t want the visitors to be served a content that does not correspond to what I’ve decided it to be. The wrong headers, the wrong URL rewritting; these are not critical security issues, but for a serious website I think it’s as bad.
You may argue that a serious website should pay for a hosting plan that does not suffer such Apache errors (happen often when rebooting server farms).
Mmm… I have the biggest difficulty to express myself in english these days. Sorry if I’m not very clear :(
I want to give an example of where I think your approch finds its limits.
Let’s say you are using my static caching technic on your site; your CSS & JS files are therefore served compressed to visitors.
Now let’s say Apache is recompiled by your hosting company and the HTAccess rules break. Your have chosen the conditionnal approch so your site doesn’t show any errors, but the files are served uncompressed.
The site is not broken, but is not as responsive as it should’ve been. This lack of responsivness may make you lose visitors bored to wait.
This example itself could be criticised, but you get the idea : some rules that are not security oriented may still be essential.
But don’t get me wrong, though I think your approch is not the more demanding in general, I’m convinced that it’s the best compromised you can have on a shared hosting.
No, you are making perfect sense here, Louis. I am in the process of rethinking my approach to using the
<IfModule>
conditional directives. Of course, the best-case scenario is knowing exactly which modules are installed on your server (and that they won’t unexpectedly and suddenly change in the middle of the night without so much as a peep of warning, but I digress), however I think the next best thing would be to omit such conditions for anything remotely related to site security. Given your thinking, I understand your zeal to stop using<IfModule>
containers completely, but I am still convinced that they remain beneficial for “cosmetic” and/or “aesthetic” Apache functionality. Especially for multiple-domain accounts, where a central configuration or root HTAccess file is used to set directives for multiple different sites, checking for the required Apache module would prevent all associated sites from crashing should the module fail to exist. Given this, I think your reasoning becomes more generally applicable as you begin to inwardly traverse the directory structure.I hear you, Louis, and your point is well taken, however I still think that you have a better chance of receiving, helping, and maintaining visitors if your site is configured to load without all of the bells and whistles than not at all. At least with uncompressed content, for example, users are still able to get the information and content they need without having to go elsewhere. Conversely, visitors will have no choice if the site is not loading at all. Many webmasters have neither the time nor the resources to babysit all of their sites in order to catch such errors and resolve them as they happen. If a site goes down because of a missing or corrupt Apache module, it could be many hours, days or longer before proper attention may be applied. Especially for e-commerce sites, this is not an acceptable strategy. Further, as for compression and other optimization features, sites should be configured to load efficiently and operate optimally without relying on the wondrous powers of Apache. I think the principle of progressive enhancement applies here..
I’m 100% okay with all that, except for the very last bit.
A site should be “operate optimally” without Apache ? But how the heck do you do URL rewriting, compression, expire headers, and so on, without Apache ?
Either I haven’t understand what you meant, or there is a secret way I’ve never been told of…
Anyway, thanks for detailing your point of view :-)
No problem, I am basically saying that sites should be built to load as quickly as possible to begin with — avoid code bloat, web standards, optimized CSS and JavaScript, etc. Such content then benefits further from the compression provided via Apache. Then, when/if circumstance changes, and compression fails to exist through Apache (for whatever reason), the site will still load quickly because it has been built to do so. If you read carefully, you will see that I was referring specifically to compression and other optimization features, meaning anything and everything that we can control before applying similarly functioning Apache directives. Again, I think this concept is best grasped if you think of progressive enhancement. For example, this is demonstrated when a JavaScript-heavy site still works fine in browsers without JavaScript support. I hope that makes sense! ;)