Latest Tweets404 Fix: Block Nuisance Requests for Non-Existent Files: perishablepress.com/block-nuis…
Perishable Press

Stop 404s for Mobile Versions of Your Site

[ Stop 404 Requests for Mobile Sites ] If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files:

  • http://domain.tld/apple-touch-icon.png
  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m

So some bot comes along, assumes that your site includes a mobile version, and then tries its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories. If this only happens once in awhile, it’s no big deal. But these days I’ve been seeing many different bots requesting these nonexistent resources.

Even worse, these mobile-hungry bots can’t seem to remember where they’ve been – they typically request the same resources repeatedly, and in multiple locations within the directory structure. I frequently see hundreds of these types of requests in my weekly error-log analyses. Needless to say, this is an incredible waste of time, bandwidth, and server resources.

It would be so nice..

So what’s the best solution? Well, obviously the ideal scenario would involve bots and scripts stopping this malicious behavior. Here are just a few ideas for confused bot masters:

  • Stop programming your bots to “assume and guess.”
  • Perform the crawl using a recognized “mobile” user-agent.
  • Remember the results of your initial crawl to avoid repeat 404 requests.

Basically, the engineers programming this sort of behavior into their bots need to realize:

  • If I take the time to setup a mobile site, rest assured that I’ll tell you where it’s located. Otherwise it doesn’t exist, so stop guessing.
  • From the server’s perspective, there is no difference between guessing for directories and scanning for exploits.
  • By constantly scanning websites for nonexistent directories, you are wasting everyone’s time, money, and resources.

Unfortunately, we both know that this sort of nefarious scumbaggery is not going to stop. So that means is up to us as administrators to protect against this sort of maliciousness and resolve the issue ourselves.

Robots.txt vs Bad Bots

In a perfect world, 404 errors don’t exist and all bots obey robots.txt directives. But it’s not, and something like this that should work, doesn’t work:

User-agent: *
Disallow: /*/iphone/$
Disallow: /*/iphone$
Disallow: /*/mobile/$
Disallow: /*/mobile$
Disallow: /*/mobi/$
Disallow: /*/mobi$
Disallow: /*/m/$
Disallow: /*/m$

Unfortunately, very few bots obey robots.txt rules. Google does. Yahoo certainly doesn’t, and neither do other bad bots. It’s pretty much impossible to stop malicious requests using the robots.txt file. Fortunately, it’s easy to do with a few lines of HTAccess.. ;)

How to resolve the “I’m a confused bot that can’t find your mobile site” problem

First of all, note that there are probably many ways of dealing with this nonsense. It doesn’t make sense to block IPs or user-agents because they are always changing and/or easily spoofed. But we can either block any requests for nonexistent “mobile-ish” resources, or else redirect such requests to a common location, such as the Home Page. Let’s examine both of these techniques – they are quite similar.

Deny all requests for non-existent mobile content

To use this technique, you’ll need access to your site’s root HTAccess file. In it, place the following code1:

# BLOCK 404 MOBILE REQUESTS
<ifmodule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /m/?$ [NC]
 RewriteRule (.*) - [F,L]
</ifmodule>

Properly placed, this code will deny all requests for any of the following resources:

  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m
  • http://domain.tld/iphone/
  • http://domain.tld/mobile/
  • http://domain.tld/mobi/
  • http://domain.tld/m/
  • http://domain.tld/any/path/iphone
  • http://domain.tld/any/path/mobile
  • http://domain.tld/any/path/mobi
  • http://domain.tld/any/path/m
  • http://domain.tld/any/path/iphone/
  • http://domain.tld/any/path/mobile/
  • http://domain.tld/any/path/mobi/
  • http://domain.tld/any/path/m/

Basically the rules will match any request where /mobile et al is included at the end of the requested URI. And it does not matter if the trailing slash is included or not. So if you would rather only match /mobile et al when it occurs at the beginning of the request URI, we can modify the original rules like so:

 RewriteCond %{REQUEST_URI} ^/iphone/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} ^/mobile/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} ^/mobi/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} ^/m/?$ [NC]

These rewrite conditions are identical to the previous, with the exception of the caret symbol, ^, which is prepended to each regex pattern. The caret instructs Apache to match only when the pattern is present at the beginning of the request string.

Note that I’m not including the apple-touch icon because it is better to actually create that file for users who would like to bookmark your site on an Apple device. Additional directories and/or files are easily added to the list by emulating this pattern:

RewriteCond %{REQUEST_URI} ^/whatever/?$ [NC,OR]

Replace the “whatever” string to, well, whatever you would like to match against. Then, include the line before the others in your code.

I have tested this on Apache 2.2.14 and it works perfectly. Even so, I recommend testing that it works for your particular setup just to be sure it works as advertised.

Redirect all requests for non-existent mobile content

Rather than denying access to mobile-ish requests, we can always redirect them to the page of your choice. Here is how to redirect such requests to your Home Page:

# REDIRECT 404 MOBILE REQUESTS
<ifmodule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /m/?$ [NC]
 RewriteRule (.*) http://domain.tld/ [R=301,L] 
</ifmodule>

Same as before, only here we change the RewriteRule in the last line to redirect to “http://domain.tld/”. Simply change the domain to that of your own and you’re all set.

As before, and always, test thoroughly.

How does it work?

The logic for either of these methods goes something like this:

  1. If the request is for /iphone or /iphone/ OR…
  2. If the request is for /mobile or /mobile/ OR…
  3. If the request is for /mobi or /mobi/ OR…
  4. If the request is for /m or /m/
  5. Then either deny the request or redirect it to the Home Page (depending on which method you are using)

Pretty simple, but very effective for eliminating malicious mobile requests.

El Wrapz

This technique is useful for saving bandwidth and server resources, not just for non-existent mobile-ish requests, but also for any resource that you would like to block – just add a RewriteCond with the target character string of your choice. Hopefully this technique will help you run a cleaner, safer, and more secure website.

Footnotes

  • 1 If you are using WordPress, place the HTAccess rules before the permalink rules.

Jeff Starr
About the Author Jeff Starr = Web Developer. Security Specialist. WordPress Buff.
Archives
23 responses
  1. Nice article. I would have mentioned one more opssibility though: feed the bots, make them happy, create a mobile version of your site! :)

    If you’re using WordPress, there are more and more easy ways to do that (see WPTouch, mobify, WordPress mobile edition, …)

  2. @Jeff Starr You’re totally right.

    I’d have another question related to that topic then: when you create a mobile version of you site on a different subdomain, should you allow or disallow crawling on that optimized version? Would a duplicated version of your site have a bad effect on SEO, would it be seen as duplicated content?

  3. Jeff Starr

    @jeherve: Yes, certainly — I should have mentioned that in the article. But keep in mind that even if you setup a mobile version of your site at, say, domain.tld/m/, it’s not going to stop bad bots from requesting other nonexistent mobile resources. In this case, I would suggest redirecting to your mobile site instead of the home page.

  4. Okay, this is all nice and what not, but I’m not a programmer/ backend guy and this made only a little bit of sense to me, so in an “Idiots Guide”, what do I need to do to make my site mobile friendly (seen without the bots screwing around?). My reading this gives me the impression that if the bot comes and can’t find anything, then a mobile user can’t find your site. If you place these items, then you’re going to deny any listing for mobile users looking for a site like yours.

  5. Jeff Starr

    Hi Arlen, as jeherve points out in the first comment, there are some easy ways to setup a mobile version of your site, including WordPress plugins, third-party services, or even a simple stylesheet.

    Bots that are requesting nonexistent files do so for their own purposes – whether or not they locate a mobile version of your site has absolutely no impact on the accessibility of your site to human visitors.

    All we’re doing here is preventing bad bots from wasting your bandwidth snooping around for files that don’t exist.

  6. Jeff Starr

    @jeherve: Great question. I am pretty sure that Google is capable of distinguishing between mobile content and regular web-content, but can’t say for sure at this point.

    If you block the mobile version of your site, then it won’t be crawlable and thus won’t appear in the search engines. The question is, does Google et al have two different indexes – one for mobile and one for non-mobile content?

    On the other side of the coin, mobile content could be considered duplicate content. So again the question gets at how the search engines treat mobile content. If they are indexing it along with regular, non-mobile content, it might be considered duplicate. Otherwise they either aren’t crawling the mobile version or else aren’t indexing the results along with the regular non-mobile content.

  7. Thomas Scholz April 26, 2010 @ 12:30 pm

    I don’t waste time with a regex for these cases. And I prefer to communicate very clearly that any further request is useless. So I’ve set up 410 messages:

    Redirect gone /mobile/
    Redirect gone /mobi/
    Redirect gone /iphone/

    Fast, clean and clear. :)

  8. Jeff Starr

    Nice, and maybe we can use RedirectMatch instead to catch the same requests without the trailing slash, which are just as common:

    RedirectMatch gone /mobile/?$
    RedirectMatch gone /mobi/?$
    RedirectMatch gone /iphone/?$

    This also ensures clean-up of non-root/subdirectory requests for the same patterns.

    Cheers :)

  9. Thomas Scholz April 26, 2010 @ 1:00 pm

    Hm … assuming we want to catch these requests …

    /mobi/
    /mobi/index
    /mobile

    … but not …

    /mobile-css-explained/

    … leads me to the following rules:

    RedirectMatch gone /mobi(le)?($|/)
    RedirectMatch gone /iphone($|/)

    Not so clear anymore but more flexible. Objections?

  10. Jeff Starr

    Interesting, but the only one not accounted for with my set is the trailing index, which I have not seen. Also, we’re terminating the match at either the trailing slash or the last character in each string. So stuff like /mobile-css-explained/ isn’t an issue.

    Does your latest code here match against /index or /m? I don’t see how.. It looks like you are saying:

    1. match /mobi or /mobile with or without the trailing slash
    2. match for /iphone with or without the trailing slash

    But with RedirectMatch, you will also catch subdirectory requests:

    /mobi/about/
    /mobi/welcome-post/
    /mobi/redirecting-reverse-proxy

    ..which seems like a good improvement. With a little fine-tuning, we can include all of the original patterns, including /m:

    RedirectMatch gone /m(obi|obile)?($|/)
    RedirectMatch gone /iphone($|/)

    Clear is relative :)

  11. Jeff Starr

    Hold up! Looks like I hit the bong a little too hard this morning — notice that I managed to think through your code while writing, and ended up figuring it out in the second half. It obviously does match against index and all other subdirectory content. I just didn’t see that at first read. Either way, the conversation is nice ;)

  12. Im with you on the redirecting version. But checking _every_ request with several regexes just to send an 404 with mod_rewrite before apache sends it itself anyway doesn’t sound very resource saving to me. I’d rather have a couple hundred logentries a month and save the extra cpu cycles on preprocessing thousands/hundretthousands/millions request.

[ Comments are closed for this post ]