2013 User Agent Blacklist
The 2013 User Agent Blacklist blocks hundreds of the worst bots while ensuring open-access for normal traffic, major search engines (Google, Bing, et al), good browsers (Chrome, Firefox, Opera, et al), and everyone else. Compared to blocking threats by IP, blocking by user-agent is more effective as a general security strategy. Although it’s trivial to spoof any user agent, many bad requests continue to report user-agent strings that are known to be associated with malicious activity. For example, the notorious “httrack” user agent has been widely blocked since at least 2007, yet it continues to plague sites to this day. Fortunately, it doesn’t matter if it’s the “real” httrack harassing your site or something pretending to be httrack — you don’t want anything to do it. Implementing a user-agent blacklist is a free and simple way to filter out a large percentage of bad traffic while freeing up valuable server resources for legitimate visitors.
The 2013 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.
Better Performance, Better SEO
Search engines such as Google are placing more weight on speedy, fast-loading websites. If your site is plagued with resource-devouring, bandwidth-wasting bots, it’s performance is probably not as good as it should be. Even if your site looks fine on the surface, without proper protection bad bots can gobble your bandwidth and leech your server resources. A single malicious bot can make hundreds and thousands of requests in a very short period of time while scanning and probing for vulnerabilities. If Google visits while bad bots are hitting your site, your site’s SEO could suffer. Fortunately, the 2013 UA Blacklist protects your site against hundreds of nefarious bots, thereby fostering maximum performance for the search engines.
2013 User Agent Blacklist
A few months ago, pentag0 posted an effective user-agent blacklist that I’ve went ahead and incorporated into the 2013 User Agent Blacklist. Essentially this new 2013 UA blacklist is the combination of pentag0’s list, the 2010 UA Blacklist, and the worst ua’s personally collected over the past few years.
Note that this blacklist should replace any similar lists to avoid redundancy and optimize performance (see more important notes below). So here it is, presented as three sets of
.htaccess directives with 100% no editing required:
To implement the UA Blacklist, simply paste into your site’s root
htaccess file (or even better, the Apache configuration file). Upload, test, and stay current with updates and news.
Some notes about differences between 2013 and 2010 versions:
- After much deliberation and feedback,
libwwwis removed from the 2013 UA Blacklist.
- To accommodate Facebook (and others) traffic, the empty/blank user-agent is no longer blocked, which is unfortunate because of its effectiveness at blocking bad requests.
- Plus the usual optimizing via regex consolidation and syntax improvements.
The UA Blacklist uses hundreds of regular expressions to block bad bots based on their user-agent. Each of these regular expressions can match many different user-agents. Care has been taken to ensure that only bad bots are blocked, but false positives are inevitable. If you know of a user-agent that should be removed from the list, please let me know. I will do my best to update things asap.
Bottom line: Only use this code if you know what you are doing. It’s not a “fix-it-and-forget” situation, especially for production sites. It’s more like a “fix-it-and-keep-an-eye-on-it” kind of thing, meant for those who understand how it works. As mentioned in the comments, the User Agent Blacklist is a work in progress. As with any code, please use with caution and at your own risk. By obtaining this freely offered code, you assume all responsibility for its use.
As with IP blacklists, user-agent blacklists are meant to be updated in order to stay effective. But with user-agents, rather than replacing the entire list, it’s more a matter of updating it by adding new strings or removing any that have fallen by the wayside. In general, a good user-agent blacklist will remain beneficial for years, but it’s efficiency will inevitably decrease unless updated to stay current.
More Security Tools
For those new to Perishable Press, please check out some of my other security resources:
Security is an important part of what I do around here, so please chime in with any suggestions, ideas, and comments. Thank you for visiting Perishable Press.
Thank you to the following people for the input, feedback, and help in improving the 2013 UA Blacklist:
- Jeremy Fairbrass
Bonus: Mini UA Blacklist
Here is an alternate “mini” version of the UA Blacklist:
# 2013 MINI UA BLACKLIST <IfModule mod_setenvif.c> SetEnvIfNoCase User-Agent (\<|\>|\'|\$x0|\%0A|\%0D|\%27|\%3C|\%3E|\%00|\+select|\+union|\<) keep_out SetEnvIfNoCase User-Agent (binlar|casper|checkprivacy|cmsworldmap|comodo|curious|diavol|doco) keep_out SetEnvIfNoCase User-Agent (dotbot|feedfinder|flicky|ia_archiver|kmccrew|libwww|nutch) keep_out SetEnvIfNoCase User-Agent (planetwork|purebot|pycurl|skygrid|sucker|turnit|vikspid|zmeu|zune) keep_out <limit GET POST PUT> Order Allow,Deny Allow from all Deny from env=keep_out # Deny from all # Deny from 111.222.333 </limit> </IfModule>
Note: this should be used separately as a lightweight alternative to the full UA Blacklist. It also serves as an example of an alternate syntax/method that may be used for blocking user agents on Apache. If
mod_rewrite isn’t available on your server, try this method instead (replace the
(user|agent|patterns) in the mini list with the
(user|agent|patterns) from the full list.
For me, with the first UA 2013 Blacklist, Opera Next 16 can’t access to the web site and obtain a 401….
What is the exact user-agent and I’ll update the list asap..
Thanks for efforts!
Should I delete section # 5G:[USER AGENTS] from my .htaccess file if I wish to add the UA List?
Yes, this UA list provides much more coverage than the default 5G UA stuffs :)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36 OPR/16.0.1196.55 (Edition Next)
Opera browser is built on its core Presto.
UA type Browser
UA name Opera 16.0.1196.55 External link
UA family Opera
UA producer Opera Software ASA. External link
OS name Windows 7 External link
OS family Windows
OS producer Microsoft Corporation. External link
Known fragments UA:
Mozilla/5.0 They claim that it is based on Mozilla user agent (only true for Gecko browsers). It is now used only for historical reasons.
Windows NT 6.1 OS signature
WOW64 Windows running on a 64-bit processor signature
AppleWebKit/537.36 Open source application framework ver. 537.36
(KHTML, like Gecko) HTML layout engine developed by the KDE project
Chrome/29.0.1547.57 browser signature
Safari/537.36 browser signature
OPR/16.0.1196.55 browser signature
Unknown fragments UA:
(Edition Next) unknown
Compare to other services:
Browser Capabilities Project External link Chrome 29.0 run on Win7
Browserscope project External link Chrome 29.0.1547 run on Windows 7
UserAgentString.com External link Browser: Chrome 29.0.1547.57 run on Windows 7
Excellent, thanks for the info.. I’ve removed some potentially problematic strings from the list.. please check again with new version and let me know. Thank you.
I have added UA List and can’t access my wordpress login section any longer with Firefox browser.
I have looked to the List and there are various browsers UA there: Opera, Firefox, IE. I presume that access from these browsers is blocked. Moreover, we have a mobile version of the website. What about UA of the mobile devices? Are they also listed there?
Also, I can see Googlebot and Yandex (biggest RU search engine) in the list. Also “uptime” UA there, i.e. all website uptime monitoring tools will be rejected which is not cool.
Presuming can get you into trouble.. Please test before commenting and then let me know, as belline has done, specifically which browser/agent is being blocked.. that will save everyone time without causing any unnecessary alarm. Also keep in mind that this is like the 7th incarnation of this list, and is used on many sites without issue. Yes, occasionally user agent information changes, and the list needs to be updated, which is why I definitely appreciate your feedback.
As for Yandex, that is deliberate based on negative experience with that particular user agent. Feel free to remove it, and any others, to accommodate your specific needs.
I think removing uptime from the list is a good call, and will do so in the morning (it’s late here, but I wanted to respond).
Lastly, “Googlebot” is definitely not on the list — please take a closer look at the patterns, and again, feel free to remove any that you disagree with. Thanks.
Hi Jeff, thank you for your comments. Noted.
I’ve tried latest Opera – it’s OK
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 OPR/15.0.1147.153
Tried latest Firefox – can’t access to login section and main page template doesn’t load correctly.
Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0
Tried latest IE – the same problems
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0; BOIE9;RURU)
Tried a few mobile devices – it’s OK
Thanks Alexander, that info is very helpful. I’ve updated the list to allow for these browsers. Please check again and let me know if any further issues with anything.
Also, “uptime” now removed from the list :)
I’ve tried updated List. The same problems with FF and IE. However when I delete section 1/3 from the list everything goes well.
Meantime, why your template cuts words in comments? It looks odd and for sure doesn’t help to read, especially for non-english people like me.
I’m not sure what this is referring to..
For Firefox and IE, I’ll take another look and see if I can find the matching string.. Or even better, if you get a chance, try halving the code to identify any problematic strings in minutes.
Ok it’s working fine now for Fx and IE.. The offending strings:
I used an online user-agent simulator to verify both browsers.
I installed your list, I think it may be tad overzealous. I get 410 Gone when I try to access my homepage from inside our work network with Firefox.
If I use IE from inside our work network it works. If I use my iPhone and 3G I can get my homepage to load.
I don’t like spammers, scrappers, and other online assholes, but I can’t really have my website show 410 gone. It is only the main domain that can’t be loaded, my blog which is on a subdomain with it’s own .htaccess file is still accessible via Firefox form work.
Oddly enough I’m posting from Firefox at work right now, so maybe it isn’t the .htaccess script, but that is the only thing I can imagine that I changed on the weekend that could result in a 410 Gone.
Lol, yes the UA list is overzealous! Seriously though it sounds like it’s not a good fit for some servers, for a number of potential reasons.. if you are experiencing any issues, please remove the entire list, or maybe just add one or two sections instead of the whole enchilada. It should not be producing any 410 errors at all, so that’s a good sign that it’s just too much for that particular server.
Also, others have also reported about Firefox et al, and the list has been updated accordingly.. if you decide to try again, please use the latest version.
Thanks for the feedback!
Do you have a clear definition of what qualifies a bot to be included on the blacklist, or any record of the reasons for blocking individual entries? I’ve got bots appearing in my log that I’m curious about; they’re on your list, but I haven’t seen them doing anything abusive yet. I’d rather not block them without knowing why I’m doing so.
I don’t have time to keep a changelog for the list, but Google is your friend, and can reveal lots of information about most any bot that you are wondering about ;)
Thank you for your sharing your hard work with everyone.
Recently my prod wordpress sites have been hit with brute force attacks and i decided to upgrade my 3G Blacklist to 5G and i also added your 5G UA to my prod and staging .htaccess files.
I also upgraded wordpress to 3.6 on most of my sites.
After doing all this, everything seemed to be working, fine, until a couple of days ago, i was testing one my staging sites using Win 7 SP 1 and IE10 and i was getting 410 errors.
Unfortunately the same thing happened when i tested my prod sites which are hosted somewhere else, using a different OS, etc
After doing a lot of troubleshooting using different Win 7 coming for different geographical locations, i narrowed down the problem to be your second block: # 2013 UA BLACKLIST [2/3]
As soon as this block was removed all from all prod and test sites, we were able to access the sites using IE10 on Win 7 SP1.
Let me know if you want more details if you want to recreate this problem.
Yes please, if you can get the specific user agent I can check and remove it from the list. And if you have a few minutes to isolate the blocked string that would be even better. Here is a technique for halving code to troubleshoot things.
Hi Jeff, this is the entry on the apache logs:
192.168.X.XX - - [26/Aug/2013:23:22:13 -0400] "GET / HTTP/1.1" 410 295 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
Let me know if you need more help.
Using a user-agent simulator, I’m unable to replicate the issue with that particular UA string.. it seems to allow open access with no errors.
Also, the list is updated thanks to feedback from Alexander and others, so if you decide to troubleshoot further, please grab a copy of the latest version. Thank you.
Hi Jeff, other people on the wordpress support forum were able to replicate the problem on their Windows 7 IE10 machines accessing my prod website. i don’t think it will take me long to figure out what the problem is..
That’s great, thanks. Like I said, it works fine using the straightforward emulator, but once IE10 itself gets involved, who knows what it’s doing/sending behind the scenes.. and to make things more difficult, IE10 isn’t working on Mac Parallels (Win emulator) on my machine, so your help is definitely appreciated.
sorry i am not an expert on .htaccess coding, and i am not sure if this solves anything, but i took out the following chunk of code from the beginning of the block and got section 2 working on Win 7, IE10
Windows\ NT\ 6\.1\;\ tr\;\ rv\:1\.9\.2\.6\)|mozilla\/0|mozilla\/1|mozilla\/2|mozilla\/3|mozilla\/4\.61\ \[en\]|mozilla\/firefox|mpf|msie\ 1|msie\ 2|msie\ 3|msie\ 4|msie\ 5|msie\ 6\.0\-|msie\ 6\.0b|msie\ 7\.0a1\;|msie\ 7\.0b\;|
Maybe you can help me narrow it down from here?
I am sorry, i wasn’t clear enough, i removed that code from section 2 of the UA Blacklist, and the remainder of the code works on Win 7 IE10.
Are you using the latest version as suggested? I ask because some of those strings (e.g.,
Windows\ NT) have been removed..