Stop 404 Requests for Mobile Versions of Your Site

by Jeff Starr on Monday, April 26, 2010 21 Responses

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files:

  • http://domain.tld/apple-touch-icon.png
  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m

So some bot comes along, assumes that your site includes a mobile version, and then tries its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories. If this only happens once in awhile, it’s no big deal. But these days I’ve been seeing many different bots requesting these nonexistent resources.

Even worse, these mobile-hungry bots can’t seem to remember where they’ve been – they typically request the same resources repeatedly, and in multiple locations within the directory structure. I frequently see hundreds of these types of requests in my weekly error-log analyses. Needless to say, this is an incredible waste of time, bandwidth, and server resources.

It would be so nice..

So what’s the best solution? Well, obviously the ideal scenario would involve bots and scripts stopping this malicious behavior. Here are just a few ideas for confused bot masters:

  • Stop programming your bots to “assume and guess.”
  • Perform the crawl using a recognized “mobile” user-agent.
  • Remember the results of your initial crawl to avoid repeat 404 requests.

Basically, the engineers programming this sort of behavior into their bots need to realize:

  • If I take the time to setup a mobile site, rest assured that I’ll tell you where it’s at.
  • From the server’s perspective, there is no difference between guessing for directories and scanning for exploits.
  • By constantly scanning websites for nonexistent directories, you are wasting everyone’s time, money, and resources.

Unfortunately, we both know that this sort of nefarious scumbaggery is not going to stop. So that means is up to us as administrators to protect against this sort of maliciousness and resolve the issue ourselves.

Robots.txt vs Bad Bots

In a perfect world, 404 errors don’t exist and all bots obey robots.txt directives. But it’s not, and something like this that should work, doesn’t work:

User-agent: *
Disallow: /*/iphone/$
Disallow: /*/iphone$
Disallow: /*/mobile/$
Disallow: /*/mobile$
Disallow: /*/mobi/$
Disallow: /*/mobi$
Disallow: /*/m/$
Disallow: /*/m$

Unfortunately, very few bots obey robots.txt rules. Google does. Yahoo certainly doesn’t, and neither do other bad bots. It’s pretty much impossible to stop malicious requests using the robots.txt file. Fortunately, it’s easy to do with a few lines of HTAccess.. ;)

How to resolve the “I’m a confused bot that can’t find your mobile site” problem

First of all, note that there are probably many ways of dealing with this nonsense. It doesn’t make sense to block IPs or user-agents because they are always changing and/or easily spoofed. But we can either block any requests for nonexistent “mobile-ish” resources, or else redirect such requests to a common location, such as the Home Page. Let’s examine both of these techniques – they are quite similar.

Deny all requests for non-existent mobile content

To use this technique, you’ll need access to your site’s root HTAccess file. In it, place the following code 1:

# BLOCK 404 MOBILE REQUESTS
<ifmodule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /m/?$ [NC]
 RewriteRule (.*) - [F,L]
</ifmodule>

Properly placed, this code will deny all requests for the following resources:

  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m

Note that I’m not including the apple-touch icon here because it is better to actually create that file for users who would like to bookmark your site on an Apple device. Additional directories and/or files are easily added to the list by emulating this pattern:

RewriteCond %{REQUEST_URI} ^/whatever/?$ [NC,OR]

Replace the “whatever” string to, well, whatever you would like to match against. Then, include the line before the others in your code.

I have tested this on Apache 2.2.14 and it works perfectly. Even so, I recommend testing that it works for your particular setup just to be sure it works as advertised.

Redirect all requests for non-existent mobile content

Rather than denying access to mobile-ish requests, we can always redirect them to the page of your choice. Here is how to redirect such requests to your Home Page:

# REDIRECT 404 MOBILE REQUESTS
<ifmodule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
 RewriteCond %{REQUEST_URI} /m/?$ [NC]
 RewriteRule (.*) http://domain.tld/ [R=301,L] 
</ifmodule>

Same as before, only here we change the RewriteRule in the last line to redirect to “http://domain.tld/”. Simply change the domain to that of your own and you’re all set.

As before, and always, test thoroughly.

How does it work?

The logic for either of these methods goes something like this:

  1. If the request is for /iphone or /iphone/ OR…
  2. If the request is for /mobile or /mobile/ OR…
  3. If the request is for /mobi or /mobi/ OR…
  4. If the request is for /m or /m/
  5. Then either deny the request or redirect it to the Home Page (depending on which method you are using)

Pretty simple, but very effective for eliminating malicious mobile requests.

El Wrapz

This technique is useful for saving bandwidth and server resources, not just for non-existent mobile-ish requests, but also for any resource that you would like to block – just add a RewriteCond with the target character string of your choice. Hopefully this technique will help you run a cleaner, safer, and more secure website.

Note

1 Note: If you are using WordPress, place the HTAccess rules before the permalink rules.

About the author

[ Jeff Starr ]

Jeff Starr is a web developer, graphic designer and content producer with over 10 years of experience and a passion for quality and detail. Jeff is co-author of the book Digging into WordPress and strives to help people be the best they can be on the Web. + Follow Jeff on Twitter and subscribe to Perishable Press for quality web-design content delivered fresh.


21 Responses

Add a comment

[ Gravatar Icon ]

jeherve#1

Nice article. I would have mentioned one more opssibility though: feed the bots, make them happy, create a mobile version of your site! :)

If you’re using WordPress, there are more and more easy ways to do that (see WPTouch, mobify, WordPress mobile edition, …)

[ Gravatar Icon ]

Jeff Starr#2

@jeherve: Yes, certainly — I should have mentioned that in the article. But keep in mind that even if you setup a mobile version of your site at, say, domain.tld/m/, it’s not going to stop bad bots from requesting other nonexistent mobile resources. In this case, I would suggest redirecting to your mobile site instead of the home page.

[ Gravatar Icon ]

Arlen#3

Okay, this is all nice and what not, but I’m not a programmer/ backend guy and this made only a little bit of sense to me, so in an “Idiots Guide”, what do I need to do to make my site mobile friendly (seen without the bots screwing around?). My reading this gives me the impression that if the bot comes and can’t find anything, then a mobile user can’t find your site. If you place these items, then you’re going to deny any listing for mobile users looking for a site like yours.

[ Gravatar Icon ]

Jeff Starr#4

Hi Arlen, as jeherve points out in the first comment, there are some easy ways to setup a mobile version of your site, including WordPress plugins, third-party services, or even a simple stylesheet.

Bots that are requesting nonexistent files do so for their own purposes – whether or not they locate a mobile version of your site has absolutely no impact on the accessibility of your site to human visitors.

All we’re doing here is preventing bad bots from wasting your bandwidth snooping around for files that don’t exist.

[ Gravatar Icon ]

jeherve#5

@Jeff Starr You’re totally right.

I’d have another question related to that topic then: when you create a mobile version of you site on a different subdomain, should you allow or disallow crawling on that optimized version? Would a duplicated version of your site have a bad effect on SEO, would it be seen as duplicated content?

[ Gravatar Icon ]

Jeff Starr#6

@jeherve: Great question. I am pretty sure that Google is capable of distinguishing between mobile content and regular web-content, but can’t say for sure at this point.

If you block the mobile version of your site, then it won’t be crawlable and thus won’t appear in the search engines. The question is, does Google et al have two different indexes – one for mobile and one for non-mobile content?

On the other side of the coin, mobile content could be considered duplicate content. So again the question gets at how the search engines treat mobile content. If they are indexing it along with regular, non-mobile content, it might be considered duplicate. Otherwise they either aren’t crawling the mobile version or else aren’t indexing the results along with the regular non-mobile content.

[ Gravatar Icon ]

Thomas Scholz#7

I don’t waste time with a regex for these cases. And I prefer to communicate very clearly that any further request is useless. So I’ve set up 410 messages:

Redirect gone /mobile/
Redirect gone /mobi/
Redirect gone /iphone/

Fast, clean and clear. :)

[ Gravatar Icon ]

Jeff Starr#8

Nice, and maybe we can use RedirectMatch instead to catch the same requests without the trailing slash, which are just as common:

RedirectMatch gone /mobile/?$
RedirectMatch gone /mobi/?$
RedirectMatch gone /iphone/?$

This also ensures clean-up of non-root/subdirectory requests for the same patterns.

Cheers :)

[ Gravatar Icon ]

Thomas Scholz#9

Hm … assuming we want to catch these requests …

/mobi/
/mobi/index
/mobile

… but not …

/mobile-css-explained/

… leads me to the following rules:

RedirectMatch gone /mobi(le)?($|/)
RedirectMatch gone /iphone($|/)

Not so clear anymore but more flexible. Objections?

[ Gravatar Icon ]

Jeff Starr#10

Interesting, but the only one not accounted for with my set is the trailing index, which I have not seen. Also, we’re terminating the match at either the trailing slash or the last character in each string. So stuff like /mobile-css-explained/ isn’t an issue.

Does your latest code here match against /index or /m? I don’t see how.. It looks like you are saying:

  1. match /mobi or /mobile with or without the trailing slash
  2. match for /iphone with or without the trailing slash

But with RedirectMatch, you will also catch subdirectory requests:

/mobi/about/
/mobi/welcome-post/
/mobi/redirecting-reverse-proxy

..which seems like a good improvement. With a little fine-tuning, we can include all of the original patterns, including /m:

RedirectMatch gone /m(obi|obile)?($|/)
RedirectMatch gone /iphone($|/)

Clear is relative :)

[ Gravatar Icon ]

Jeff Starr#11

Hold up! Looks like I hit the bong a little too hard this morning — notice that I managed to think through your code while writing, and ended up figuring it out in the second half. It obviously does match against index and all other subdirectory content. I just didn’t see that at first read. Either way, the conversation is nice ;)

[ Gravatar Icon ]

MK#12

Im with you on the redirecting version. But checking _every_ request with several regexes just to send an 404 with mod_rewrite before apache sends it itself anyway doesn’t sound very resource saving to me. I’d rather have a couple hundred logentries a month and save the extra cpu cycles on preprocessing thousands/hundretthousands/millions request.

[ Gravatar Icon ]

Jeff Starr#13

@MK: I hear your point, and I agree with it in general, but in this case, we are either redirecting via 301 or denying via 403. Apache’s default behavior is as you say to simply 404 such requests. This method provides more flexibility in terms of handling traffic and blocking mischief. Redirecting is useful for keeping mobile users where you want them. Denying access is similarly useful for sending a clear message. In fact, as most sites deliver elaborate 404-Error pages, kicking things out at the server level may save bandwidth and other resources. It’s been awhile since I’ve seen a custom 403-Forbidden page ;)

[ Gravatar Icon ]

Thomas Scholz#14

@MK: My WordPress 404 page does a lot more than just saying ›no‹. It searches the database up and down for every possible match, and if it doesn’t find anything I get a mail. That is slow.

Besides, my .htaccess is currently 86 kiB big – without any measurable performance effects. As long as you avoid mod_rewrite and expensive regexes it’s better to let Apache handle the errors. The server alone is always faster than the server plus WordPress plus MySQL.

[ Gravatar Icon ]

Kent#15

I found a website that makes the apple icon for you
http://www.flavorstudios.com/iphone-icon-generator

[ Gravatar Icon ]

TheAL#16

Much thanks!

[ Gravatar Icon ]

Logic#17

Just FYI, for the rewrite rule you don’t need “(.*)” you can save some CPU cycles just by using “.?

[ Gravatar Icon ]

Al Kamal Md. Razib#18

Thank you every one for your nice complements.

[ Gravatar Icon ]

Timothy Warren#19

For those of us who use lighttpd, nginx, or any other web servers, I just want to clarify what your doing with these requests.

For the first example, you are blocking requests? Does this mean sending a 403, or something else?

[ Gravatar Icon ]

Jeff Starr#20

Hi Timothy,

Yes, we’re denying the request and issuing a 403 response.

Trackbacks / Pingbacks
  1. uberVU - social comments
Share your thoughts..

Read Comment Policy

Comment Rules: No spam. No profanity. Use your real name. You may use simple HTML tags for style. Wrap all code in <code> tags. Learn more.



Attention: Do NOT follow this link!