Universal www-Canonicalization via htaccess

[ ~{*}~ ] During my previous rendezvous involving comprehensive canonicalization for WordPress, I offer my personally customized technique for ensuring consistently precise and accurate URL delivery. That particular method targets WordPress exclusively (although the logic could be manipulated for general use), and requires a bit of editing to adapt the code to each particular configuration. In this follow-up tutorial, I present a basic www-canonicalization technique that accomplishes the following:

  • requires or removes the www prefix for all URLs
  • absolutely no editing when requiring the www prefix
  • minimal amount of editing when removing the www prefix
  • minimal amount of code used to execute either technique

I have found this “universal” www-canonicalization technique extremely useful in its simplicity and elegance. Especially when requiring the www prefix, nothing could be easier: simply copy, paste, done — absolutely no hard-coding necessary!

Require the www prefix

To ensure that all URLs of a given domain present with the www prefix, open the domain’s root htaccess file and add the following chunk of code (no editing required!):

# universal www canonicalization via htaccess
# require www prefix for all urls of any domain - no editing required
# http://perishablepress.com/press/2008/04/30/universal-www-canonicalization-via-htaccess/

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.               [NC]
RewriteCond %{HTTP_HOST} ^([^.]+\.[a-z]{2,6})$ [NC]
RewriteRule ^(.*)$       http://www.%1/$1      [R=301,L]

Again, this htaccess code will ensure that all of your URLs display with the www prefix. The first three lines are comments explaining the purpose of the code. The next two lines initialize Apache’s mod_rewrite module and specify the base path for the operation. Note that you may not need to include the RewriteEngine directive if it has been included previously in the htaccess document. The final three lines of code provide the desired canonical functionality as follows:

RewriteCond %{HTTP_HOST} !^www\. [NC]
This directive is a condition that checks for the presence of the www prefix in the URL. Processing stops here if the URL already contains the www prefix. The [NC] flag renders the string as case-insensitive.
RewriteCond %{HTTP_HOST} ^([^.]+\.[a-z]{2,6})$ [NC]
This directive is a condition that matches the general pattern of a domain name. The regular expression matches any string of valid characters that is followed by a literal dot ( . ) and an alphabetic string containing two to six characters. For example, the common example of a domain name, domain.tld, will be matched by the regex. Likewise, the condition is designed to match any domain name — thus the term “universal” in the title of this post. ;)
RewriteRule ^(.*)$ http://www.%1/$1 [R=301,L]
This directive is where the actual URL rewriting takes place. Whenever both of the previous conditions prove true, the RewriteRule directs Apache to rewrite the URL such that it includes the www prefix. The ^(.*)$ pattern matches any valid character string proceeding the domain name (and top-level domain). Finally, the http://www.%1/$1 serves as the pattern for the rewritten URL. The [R=301,L] flag signals that the change is permanent (i.e., 301), and also that this happens to be the last directive in this sequence of Rewrite rules.

Remove the www prefix

To ensure that all URLs of a given domain present without the www prefix, open the domain’s root htaccess file and add the following chunk of code:

# universal www canonicalization via htaccess
# remove www prefix for all urls - replace all domain and tld with yours
# http://perishablepress.com/press/2008/04/30/universal-www-canonicalization-via-htaccess/

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^domain\.tld$       [NC]
RewriteRule ^(.*)$       http://domain.tld/$1 [R=301,L]

When using this code to remove the www prefix, this technique requires two simple edits: change both instances of “domain” and “tld” to match the target domain name and top-level domain name, respectively. For example, if your domain was located at “sweetdomain.com”, you would edit the code as follows:

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^sweetdomain\.com$ [NC]
RewriteRule ^(.*)$ http://sweetdomain.com/$1 [R=301,L]

And that’s all there is to it. Again, this code will remove the www prefix from your URLs. Essentially, this works along the same general lines as the previous method, only this time the code matches any URL that doesn’t already exclude the www prefix. For all such matches, the code then rewrites the URL in non-www format.

Ground Control to Major Tom

After uploading either of the methods, remember to test your URLs vigorously. If you haven’t already discovered the immense power of Apache’s mod_rewrite, rest assured that even the slightest error will immediately crash your website and celebrate by serving an unlimited supply of 500-Error hors d'oeuvres to all of your visitors. Or something. The point, again, is to upload and test as many different URL configurations as possible. Nothing should go wrong, but never assume that it won’t! ;)