Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

Bulletproof Sitemap Redirects via .htaccess

[ Bulletproof Sitemap Redirection ] Sitemaps have been shown to help search engines and other visitors understand and navigate your website. This tutorial gives you a simple yet powerful .htaccess technique for ensuring that search engines and other visitors can easily find your sitemap files. So even if they are looking for your sitemap in the wrong location, they’ll always be redirected to the actual, existing sitemap for your site. This strategy helps to improve consistency, minimize 404 errors, and save server resources. So it’s good for performance and SEO.

Update! WordPress now provides its own sitemaps baked right into core. For more information, check the update section later in this article.

Help visitors find your sitemap

Typically, your sitemap is located in the root directory of your website, for example:

https://perishablepress.com/sitemap.xml

In general, sitemaps are served in XML format, but they also come in zipped flavor, typically via GZ (g-zip compressed) format. So for most sites, it is common to provide two versions of your sitemap: one with an .xml extension, and another with an .xml.gz extension.

https://perishablepress.com/sitemap.xml
https://perishablepress.com/sitemap.xml.gz

Yet regardless of where it’s located or how it’s formatted, your sitemap is useless unless search engines and other visitors can find it. Thus, it is highly recommended that your site include a robots.txt file that includes the location of your sitemap. Something like this added to robots.txt works great:

Sitemap: https://perishablepress.com/sitemap.xml

For compliant search engines that know what they’re doing, this simple robots declaration is all that’s needed. The spiders will hit your robots.txt file before the crawl, locate your sitemap, and continue to crawl your site accordingly. Sounds easy, right?

Unfortunately, things don’t always go according to plan. The spiders and scripts crawling your site may not recognize or obey your robots.txt directives. In my experience, very few actually do. Bad bots and malicious scripts relentlessly will pound your server looking for the elusive sitemap in myriad subdirectories, hoping to detect a vulnerability. And, if you happen to keep your sitemap in an unconventional location, the confusion can get real ugly. Here are some examples of sitemap URLs requested by “lost” bots who are scanning for vulnerabilities (or just being stupid):

http://example.com/something/random_sitemap.htm
http://example.com/another/guess/whatever_sitemap.html
http://example.com/just/cant/find/the/elusive_sitemap.xml
.
.
.

If you examine your site’s access and error logs, you may find all sorts of 404 “Not Found” errors for these sorts of requests. This type of apparently random scanning for “secret” sitemaps happens constantly around the Web, wasting valuable server resources like memory and bandwidth. It’s the sort of malicious activity that’s a real nuisance for anyone paying attention. Fortunately there is no need to tolerate such lunacy..

.htaccess to the rescue

Fortunately, we can make absolutely certain that our sitemaps are always found by anyone or anything that is requesting it from any location on our site. All you need is the ability to create and/or edit your site’s root .htaccess file (or server configuration file). If this is possible, you’re in business. Here are the requirements for the “bulletproof” sitemap technique provided in this tutorial:

  • Apache server (any version) with .htaccess and mod_alias enabled
  • Sitemap(s) located in the site’s root directory (e.g., example.com/sitemap.xml)
  • Sitemap(s) served only in XML and g-zip format

These requirements cover most setups. For example, many WordPress sites use a plugin to automatically generate their sitemap(s) in two formats: XML and g-zip, exactly what’s required for the bulletproof technique to work properly. For example, here at Perishable Press, I use the popular Google XML Sitemaps plugin, which generates the following set of sitemaps (and sub-sitemaps):

/sitemap.xml
/sitemap.xml.gz

/sitemap-pt-post-2016-03.xml
/sitemap-pt-post-2016-03.xml.gz
.
.
.

So in the site root directory is my main sitemap, which includes lots of “sub-sitemaps”. Pretty sure this is the most common (and recommended) structure for sitemaps, but let me know if I’m sorely mistaken about this. Hopefully this scenario covers your own setup; if unsure, you can examine your sitemap(s) and verify accordingly.

So if that sounds like you, make a quick backup of your site’s root .htaccess file and get ready for the magic bullet..

Bulletproof sitemap redirects

The actual implementation of this redirect technique couldn’t be easier. Simply include the following .htaccess directive in your site’s root .htaccess file:

# Bulletproof sitemap redirects
<IfModule mod_alias.c>
	RedirectMatch 301 (?i)(?<!^)/(.*)?sitemap(.*)?\.(htm|html|xml)(\.gz)? /sitemap.xml$4
</IfModule>

Then save changes, upload to the server, and you’re good to go. No modifications are required — strictly plug-&-play. Here is how this code works:

  1. Checks if mod_alias is available
  2. Sets the regex to case-insensitive
  3. Skips the redirect if at site root
  4. Matches any request that includes “sitemap” followed by “.htm”, “html”, or “xml”
  5. Optionally matches if the request is appended with “.gz”
  6. If the request fits the conditions, it is redirected to the root sitemap (either XML or g-zip version)

Looks simple, but this directive literally is years in the making. I’ve been fine-tuning the technique since, oh, back in 2008, after I wrote Redirect All Requests for a Nonexistent File to the Actual File. Tweaking things a little bit with each iteration of my .htaccess file, until finally now it’s perfect and ready for public consumption :)

Test before going live

After implementing this technique, you can (and should) verify that everything is working properly by requesting the following URLs:

http://example.com/sitemap.xml
http://example.com/sitemap.xml.gz

http://example.com/random/sitemap.htm
http://example.com/random/sitemap.html
http://example.com/random/sitemap.xml

http://example.com/random/sitemap.htm.gz
http://example.com/random/sitemap.html.gz
http://example.com/random/sitemap.xml.gz

http://example.com/random/random_sitemap.htm
http://example.com/random/random_sitemap.html
http://example.com/random/random_sitemap.xml

http://example.com/random/random_sitemap.htm.gz
http://example.com/random/random_sitemap.html.gz
http://example.com/random/random_sitemap.xml.gz

http://example.com/random/random_sitemap_random.htm
http://example.com/random/random_sitemap_random.html
http://example.com/random/random_sitemap_random.xml

http://example.com/random/random_sitemap_random.htm.gz
http://example.com/random/random_sitemap_random.html.gz
http://example.com/random/random_sitemap_random.xml.gz

With these examples, you can change each instance of “random” with any string. You can also prepend more directories to each path. Go ahead and try to break it: you can’t because the code is bulletproof (insert maniacal laughter). Also, remember to edit the “example.com” to match your own domain. Test until satisfied ;)

Update: WP Sitemaps

As explained in these posts, WordPress 5.5 and beyond features built-in sitemaps that are enabled by default. And because WordPress handles redirection for “near-miss” URL requests, it should redirect all sitemap requests to the correct location, with no extra .htaccess necessary. At least, that’s how it should work; in practice your results may vary.

For example, I did some testing on my own sites, to see if WordPress was automatically redirecting near-miss requests to the correct sitemaps. For most of the sites tested, the redirection is working fine. For this site however, sitemap redirects were not working as expected.

So if running WordPress 5.5 or better, check if your site is redirecting sitemap requests properly. It should be working fine. If not, here is an updated bulletproof technique that you can add to your site’s root/public .htaccess file:

RedirectMatch 301 (?i)^/sitemap(.*).(html|xml(\.gz)?)/? /wp-sitemap.xml

That code redirects all sitemap requests to the new WordPress Sitemaps. As always, test thoroughly before going live.

About the Author
Jeff Starr = Web Developer. Security Specialist. WordPress Buff.
WP Themes In Depth: Build and sell awesome WordPress themes.

2 responses to “Bulletproof Sitemap Redirects via .htaccess”

  1. Monique Vidal 2017/06/22 1:43 pmReply

    Dear Jeff,

    I’m facing a problem that maybe related to .htaccess and i’ve read your posts trying decipher if it’s the problem.

    The permalinks of many images that I have in my website are not the same that the file url and it’s creating many 404 errors. I’ve checked the sitemap and I don’t understand why this is happening.

    Example:

    Permalink: http://www.globalaircrafts.com/globalaircrafts/aircrafts/2011-agusta-a109s-grand/agusta-a109s-gra…aft-venda-sale-8/

    FILE URL: http://www.globalaircrafts.com/globalaircrafts/wp-content/uploads/2017/06/AGUSTA-A109S-GRAND-HELICOPTER-TURBINE-GLOBAL-AIRCRAFT-VENDA-SALE-8.jpg

    Any ideas how can I fix it?

    Thanks

    Ps: I’m not very familiar with coding.. But I guess you already noticed that :)

    • Jeff Starr 2017/06/23 8:10 am Reply

      Hi Monique, It looks like something is interfering with normal functionality. My best advice would be to troubleshoot your plugins and theme to determine if there is any issue. Also investigate any .htaccess rules that you may have in place, to see if any of them affect related URLs.

Leave a reply

Name and email required. Email kept private. Basic markup allowed. Please wrap any small/single-line code snippets with <code> tags. Wrap any long/multi-line snippets with <pre><code> tags. For more info, check out the Comment Policy and Privacy Policy.

Subscribe to comments on this post

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
The Tao of WordPress: Master the art of WordPress.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.