Latest TweetsWordPress and the Blank Target Vulnerability (aka rel noopener + noreferrer): perishablepress.com/wordpress-… #WordPress #security #html
Perishable Press

Universal www-Canonicalization via htaccess

[ Universal www-Canonicalization via htaccess ] During my previous rendezvous involving comprehensive canonicalization for WordPress, I offer my personally customized technique for ensuring consistently precise and accurate URL delivery. That particular method targets WordPress exclusively (although the logic could be manipulated for general use), and requires a bit of editing to adapt the code to each particular configuration. In this follow-up tutorial, I present a basic www-canonicalization technique that accomplishes the following:

  • requires or removes the www prefix for all URLs
  • absolutely no editing when requiring the www prefix
  • minimal amount of editing when removing the www prefix
  • minimal amount of code used to execute either technique

I have found this “universal” www-canonicalization technique extremely useful in its simplicity and elegance. Especially when requiring the www prefix, nothing could be easier: simply copy, paste, done — absolutely no hard-coding necessary!

Require the www prefix

To ensure that all URLs of a given domain present with the www prefix, open the domain’s root htaccess file and add the following chunk of code (no editing required!):

# universal www canonicalization via htaccess
# require www prefix for all urls of any domain - no editing required
# https://perishablepress.com/press/2008/04/30/universal-www-canonicalization-via-htaccess/

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.               [NC]
RewriteCond %{HTTP_HOST} ^([^.]+\.[a-z]{2,6})$ [NC]
RewriteRule ^(.*)$       http://www.%1/$1      [R=301,L]

Again, this htaccess code will ensure that all of your URLs display with the www prefix. The first three lines are comments explaining the purpose of the code. The next two lines initialize Apache’s mod_rewrite module and specify the base path for the operation. Note that you may not need to include the RewriteEngine directive if it has been included previously in the htaccess document. The final three lines of code provide the desired canonical functionality as follows:

RewriteCond %{HTTP_HOST} !^www\. [NC]
This directive is a condition that checks for the presence of the www prefix in the URL. Processing stops here if the URL already contains the www prefix. The [NC] flag renders the string as case-insensitive.
RewriteCond %{HTTP_HOST} ^([^.]+\.[a-z]{2,6})$ [NC]
This directive is a condition that matches the general pattern of a domain name. The regular expression matches any string of valid characters that is followed by a literal dot ( . ) and an alphabetic string containing two to six characters. For example, the common example of a domain name, domain.tld, will be matched by the regex. Likewise, the condition is designed to match any domain name — thus the term “universal” in the title of this post. ;)
RewriteRule ^(.*)$ http://www.%1/$1 [R=301,L]
This directive is where the actual URL rewriting takes place. Whenever both of the previous conditions prove true, the RewriteRule directs Apache to rewrite the URL such that it includes the www prefix. The ^(.*)$ pattern matches any valid character string proceeding the domain name (and top-level domain). Finally, the http://www.%1/$1 serves as the pattern for the rewritten URL. The [R=301,L] flag signals that the change is permanent (i.e., 301), and also that this happens to be the last directive in this sequence of Rewrite rules.

Remove the www prefix

To ensure that all URLs of a given domain present without the www prefix, open the domain’s root htaccess file and add the following chunk of code:

# universal www canonicalization via htaccess
# remove www prefix for all urls - replace all domain and tld with yours
# https://perishablepress.com/press/2008/04/30/universal-www-canonicalization-via-htaccess/

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^domain\.tld$       [NC]
RewriteRule ^(.*)$       http://domain.tld/$1 [R=301,L]

When using this code to remove the www prefix, this technique requires two simple edits: change both instances of “domain” and “tld” to match the target domain name and top-level domain name, respectively. For example, if your domain was located at “sweetdomain.com”, you would edit the code as follows:

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^sweetdomain\.com$ [NC]
RewriteRule ^(.*)$ http://sweetdomain.com/$1 [R=301,L]

And that’s all there is to it. Again, this code will remove the www prefix from your URLs. Essentially, this works along the same general lines as the previous method, only this time the code matches any URL that doesn’t already exclude the www prefix. For all such matches, the code then rewrites the URL in non-www format.

Ground Control to Major Tom

After uploading either of the methods, remember to test your URLs vigorously. If you haven’t already discovered the immense power of Apache’s mod_rewrite, rest assured that even the slightest error will immediately crash your website and celebrate by serving an unlimited supply of 500-Error hors d'oeuvres to all of your visitors. Or something. The point, again, is to upload and test as many different URL configurations as possible. Nothing should go wrong, but never assume that it won’t! ;)

Awesome space image courtesy of NASA.

Jeff Starr
About the Author Jeff Starr = Web Developer. Security Specialist. WordPress Buff.
Archives
12 responses
  1. Geld Lenen May 6, 2008 @ 9:30 am

    I use this code in my .htacces too. It works like a charm!

  2. Jeff Starr

    Thanks for the code confirmation, Geld — glad to know that everything is functioning properly. Cheers!

  3. i have a problem with my mod-rewrite:
    google have indexed this page
    http://www.domain.com///index.html and i would like with only a backslash… any idea?

  4. Jeff Starr

    Try this:

    RedirectMatch 301 ^///index.html$ http://www.domain.com/index.html

    Add that to your root htaccess (or Apache config file) and check the results. Of course, I haven’t tested this code specifically, but it should solve the issue. You may need to remove one of the forward slashes from the match condition for it to work.

    Cheers,
    Jeff

  5. Cezary Tomczyk December 30, 2008 @ 6:35 am

    There is simplest way:

    RewriteRule ^ - [E=via:http]
    RewriteCond %{HTTPS} =on
    RewriteRule ^ - [E=via:https]
    RewriteCond %{HTTP_HOST} !^www
    RewriteRule (.*) %{ENV:via}://www.%{HTTP_HOST}/$1 [L,R=301]

    Rules ensure http, https and about.

  6. Jeff Starr

    @Cezary Tomczyk: Thanks for sharing! Looking forward to trying it out! :)

  7. August Klotz January 14, 2009 @ 4:53 pm

    Here is the method I use, similar to the one provided in the article:

    RewriteCond %{HTTP_HOST} !^www\.[a-z-]+\.[a-z]{2,6} [NC]
    RewriteCond %{HTTP_HOST} ([a-z-]+\.[a-z]{2,6})$ [NC]
    RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]

    Works like a dream.

  8. Cezary Tomczyk January 15, 2009 @ 12:14 am

    @August Klotz: Yes, Your method works well, but with domain http://www.one.two. Nothing more. But domains contains somethimes more words ;-)


    Cezary Tomczyk

  9. Persoonlijke lening February 3, 2009 @ 11:35 am

    Thanx for this great and efficient code. And if I understand it correctly it will automatically 301 redirect my original (old) urls?

  10. Hi Jeff! :D

    I tried this rule but Firefox says (I am translating into English):
    “This site does not redirect in correct mode”.

    Where do you think could be the problem?

    Here the rule I tried to use:

    RewriteEngine On
    RewriteBase /
    RewriteCond %{HTTP_HOST} !^aldolat.it$ [NC]
    RewriteRule ^(.*)$ http://aldolat.it/$1 [R=301,L]

  11. Jeff Starr

    @Persoonlijke lening: Yes, precisely. 301 redirect according to your canonicalizational preferences (either remove or add www prefix).

    @Aldo: Try it without the RewriteBase directive:

    RewriteEngine On
    RewriteCond %{HTTP_HOST} !^domain\.tld$ [NC]
    RewriteRule ^(.*)$ http://domain.tld/$1 [R=301,L]

  12. lening bkr code March 30, 2009 @ 7:27 am

    Thanx for this technique; very useful to me.

[ Comments are closed for this post ]