Stop 404s for Mobile Versions of Your Site
If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files:
http://domain.tld/apple-touch-icon.png
http://domain.tld/iphone
http://domain.tld/mobile
http://domain.tld/mobi
http://domain.tld/m
So some bot comes along, assumes that your site includes a mobile version, and then tries its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories. If this only happens once in awhile, it’s no big deal. But these days I’ve been seeing many different bots requesting these nonexistent resources.
Even worse, these mobile-hungry bots can’t seem to remember where they’ve been – they typically request the same resources repeatedly, and in multiple locations within the directory structure. I frequently see hundreds of these types of requests in my weekly error-log analyses. Needless to say, this is an incredible waste of time, bandwidth, and server resources.
It would be so nice..
So what’s the best solution? Well, obviously the ideal scenario would involve bots and scripts stopping this malicious behavior. Here are just a few ideas for confused bot masters:
- Stop programming your bots to “assume and guess.”
- Perform the crawl using a recognized “mobile” user-agent.
- Remember the results of your initial crawl to avoid repeat 404 requests.
Basically, the engineers programming this sort of behavior into their bots need to realize:
- If I take the time to setup a mobile site, rest assured that I’ll tell you where it’s located. Otherwise it doesn’t exist, so stop guessing.
- From the server’s perspective, there is no difference between guessing for directories and scanning for exploits.
- By constantly scanning websites for nonexistent directories, you are wasting everyone’s time, money, and resources.
Unfortunately, we both know that this sort of nefarious scumbaggery is not going to stop. So that means is up to us as administrators to protect against this sort of maliciousness and resolve the issue ourselves.
Robots.txt vs Bad Bots
In a perfect world, 404 errors don’t exist and all bots obey robots.txt directives. But it’s not, and something like this that should work, doesn’t work:
User-agent: *
Disallow: /*/iphone/$
Disallow: /*/iphone$
Disallow: /*/mobile/$
Disallow: /*/mobile$
Disallow: /*/mobi/$
Disallow: /*/mobi$
Disallow: /*/m/$
Disallow: /*/m$
Unfortunately, very few bots obey robots.txt
rules. Google does. Yahoo certainly doesn’t, and neither do other bad bots. It’s pretty much impossible to stop malicious requests using the robots.txt
file. Fortunately, it’s easy to do with a few lines of HTAccess.. ;)
How to resolve the “I’m a confused bot that can’t find your mobile site” problem
First of all, note that there are probably many ways of dealing with this nonsense. It doesn’t make sense to block IPs or user-agents because they are always changing and/or easily spoofed. But we can either block any requests for nonexistent “mobile-ish” resources, or else redirect such requests to a common location, such as the Home Page. Let’s examine both of these techniques – they are quite similar.
Deny all requests for non-existent mobile content
To use this technique, you’ll need access to your site’s root HTAccess file. In it, place the following code1:
# BLOCK 404 MOBILE REQUESTS
<ifmodule mod_rewrite.c>
RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
RewriteCond %{REQUEST_URI} /m/?$ [NC]
RewriteRule (.*) - [F,L]
</ifmodule>
Properly placed, this code will deny all requests for any of the following resources:
http://domain.tld/iphone
http://domain.tld/mobile
http://domain.tld/mobi
http://domain.tld/m
http://domain.tld/iphone/
http://domain.tld/mobile/
http://domain.tld/mobi/
http://domain.tld/m/
http://domain.tld/any/path/iphone
http://domain.tld/any/path/mobile
http://domain.tld/any/path/mobi
http://domain.tld/any/path/m
http://domain.tld/any/path/iphone/
http://domain.tld/any/path/mobile/
http://domain.tld/any/path/mobi/
http://domain.tld/any/path/m/
Basically the rules will match any request where /mobile
et al is included at the end of the requested URI. And it does not matter if the trailing slash is included or not. So if you would rather only match /mobile
et al when it occurs at the beginning of the request URI, we can modify the original rules like so:
RewriteCond %{REQUEST_URI} ^/iphone/?$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/mobile/?$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/mobi/?$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/m/?$ [NC]
These rewrite conditions are identical to the previous, with the exception of the caret symbol, ^
, which is prepended to each regex pattern. The caret instructs Apache to match only when the pattern is present at the beginning of the request string.
Note that I’m not including the apple-touch icon because it is better to actually create that file for users who would like to bookmark your site on an Apple device. Additional directories and/or files are easily added to the list by emulating this pattern:
RewriteCond %{REQUEST_URI} ^/whatever/?$ [NC,OR]
Replace the “whatever
” string to, well, whatever you would like to match against. Then, include the line before the others in your code.
I have tested this on Apache 2.2.14 and it works perfectly. Even so, I recommend testing that it works for your particular setup just to be sure it works as advertised.
Redirect all requests for non-existent mobile content
Rather than denying access to mobile-ish requests, we can always redirect them to the page of your choice. Here is how to redirect such requests to your Home Page:
# REDIRECT 404 MOBILE REQUESTS
<ifmodule mod_rewrite.c>
RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
RewriteCond %{REQUEST_URI} /m/?$ [NC]
RewriteRule (.*) http://domain.tld/ [R=301,L]
</ifmodule>
Same as before, only here we change the RewriteRule
in the last line to redirect to “http://domain.tld/
”. Simply change the domain to that of your own and you’re all set.
As before, and always, test thoroughly.
How does it work?
The logic for either of these methods goes something like this:
- If the request is for
/iphone
or/iphone/
OR… - If the request is for
/mobile
or/mobile/
OR… - If the request is for
/mobi
or/mobi/
OR… - If the request is for
/m
or/m/
… - Then either deny the request or redirect it to the Home Page (depending on which method you are using)
Pretty simple, but very effective for eliminating malicious mobile requests.
El Wrapz
This technique is useful for saving bandwidth and server resources, not just for non-existent mobile-ish requests, but also for any resource that you would like to block – just add a RewriteCond
with the target character string of your choice. Hopefully this technique will help you run a cleaner, safer, and more secure website.
Footnotes
- 1 If you are using WordPress, place the HTAccess rules before the permalink rules.
23 responses to “Stop 404s for Mobile Versions of Your Site”
@MK: I hear your point, and I agree with it in general, but in this case, we are either redirecting via 301 or denying via 403. Apache’s default behavior is as you say to simply 404 such requests. This method provides more flexibility in terms of handling traffic and blocking mischief. Redirecting is useful for keeping mobile users where you want them. Denying access is similarly useful for sending a clear message. In fact, as most sites deliver elaborate 404-Error pages, kicking things out at the server level may save bandwidth and other resources. It’s been awhile since I’ve seen a custom 403-Forbidden page ;)
@MK: My WordPress 404 page does a lot more than just saying “no”. It searches the database up and down for every possible match, and if it doesn’t find anything I get a mail. That is slow.
Besides, my .htaccess is currently 86 kiB big — without any measurable performance effects. As long as you avoid mod_rewrite and expensive regexes it’s better to let Apache handle the errors. The server alone is always faster than the server plus WordPress plus MySQL.
I found a website that makes the apple icon for you
http://www.flavorstudios.com/iphone-icon-generator
Much thanks!
Just FYI, for the rewrite rule you don’t need “
(.*)
” you can save some CPU cycles just by using “.?
”Thank you every one for your nice complements.
For those of us who use lighttpd, nginx, or any other web servers, I just want to clarify what your doing with these requests.
For the first example, you are blocking requests? Does this mean sending a 403, or something else?
Hi Timothy,
Yes, we’re denying the request and issuing a 403 response.
Is it possible to send blocked IPs, char strings & bots to a different 403 error page than the original, while still keeping a custom error page for them?
I’ve added a .htaccess in my error page folder that allows access to all, but I want the blocked stuff to go to a different 403 page. Is that possible?
Bro, isn’t there any easy way ? I just couldn’t set it up. I get almost 1000+ 404 errors everyday for appletouchicon.png
Can’t I make an image file named that and use it for that purpose?
Yes absolutely, if that’s the file that is being requested :)