Latest TweetsGreat post about the latest power grab: www.eff.org/deeplinks/2018/09/…
Perishable Press

2013 User Agent Blacklist

[ 2013 User Agent Blacklist ] The 2013 User Agent Blacklist blocks hundreds of the worst bots while ensuring open-access for normal traffic, major search engines (Google, Bing, et al), good browsers (Chrome, Firefox, Opera, et al), and everyone else. Compared to blocking threats by IP, blocking by user-agent is more effective as a general security strategy. Although it’s trivial to spoof any user agent, many bad requests continue to report user-agent strings that are known to be associated with malicious activity. For example, the notorious “httrack” user agent has been widely blocked since at least 2007, yet it continues to plague sites to this day. Fortunately, it doesn’t matter if it’s the “real” httrack harassing your site or something pretending to be httrack — you don’t want anything to do it. Implementing a user-agent blacklist is a free and simple way to filter out a large percentage of bad traffic while freeing up valuable server resources for legitimate visitors.

Proven Security

The 2013 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.

Better Performance, Better SEO

Search engines such as Google are placing more weight on speedy, fast-loading websites. If your site is plagued with resource-devouring, bandwidth-wasting bots, it’s performance is probably not as good as it should be. Even if your site looks fine on the surface, without proper protection bad bots can gobble your bandwidth and leech your server resources. A single malicious bot can make hundreds and thousands of requests in a very short period of time while scanning and probing for vulnerabilities. If Google visits while bad bots are hitting your site, your site’s SEO could suffer. Fortunately, the 2013 UA Blacklist protects your site against hundreds of nefarious bots, thereby fostering maximum performance for the search engines.

2013 User Agent Blacklist

A few months ago, pentag0 posted an effective user-agent blacklist that I’ve went ahead and incorporated into the 2013 User Agent Blacklist. Essentially this new 2013 UA blacklist is the combination of pentag0’s list, the 2010 UA Blacklist, and the worst ua’s personally collected over the past few years.

Note that this blacklist should replace any similar lists to avoid redundancy and optimize performance (see more important notes below). So here it is, presented as three sets of .htaccess directives with 100% no editing required:

# 2013 UA BLACKLIST [1/3]
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_HOST} !^(127\.0\.0\.0|localhost) [NC]
	RewriteCond %{HTTP_USER_AGENT} (\<|\>|\'|\$x0E|\%0A|\%0D|\%27|\%3C|\%3E|\%00|\@\$x|\!susie|\_irc|\_works|\+select\+|\+union\+|\&lt;\?|1\,\1\,1\,|3gse|4all|4anything|5\.1\;\ xv6875\)|59\.64\.153\.|85\.17\.|88\.0\.106\.|a\_browser|a1\ site|abac|abach|abby|aberja|abilon|abont|abot|accept|access|accoo|accoon|aceftp|acme|active|address|adopt|adress|advisor|agent|ahead|aihit|aipbot|alarm|albert|alek|alexa\ toolbar\;\ \(r1\ 1\.5\)|alltop|alma|alot|alpha|america\ online\ browser\ 1\.1|amfi|amfibi|anal|andit|anon|ansearch|answer|answerbus|answerchase|antivirx|apollo|appie|arach|archive|arian|aboutoil|asps|aster|atari|atlocal|atom|atrax|atrop|attrib|autoh|autohot|av\ fetch|avsearch|axod|axon|baboom|baby|back|bali|bandit|barry|basichttp|batch|bdfetch|beat|beaut|become|bee|beij|betabot|biglotron|bilgi|binlar|bison|bitacle|bitly|blaiz|blitz|blogl|blogscope|blogzice|bloob|blow|bord|bond|boris|bost|bot\.ara|botje|botw|bpimage|brand|brok|broth|browseabit|browsex|bruin|bsalsa|bsdseek|built|bulls|bumble|bunny|busca|busi|buy|bwh3|cafek|cafi|camel|cand|captu|casper|catch|ccbot|ccubee|cd34|ceg|cfnetwork|cgichk|cha0s|chang|chaos|char|char\(|chase\ x|check\_http|checker|checkonly|checkprivacy|chek|chill|chttpclient|cipinet|cisco|cita|citeseer|clam|claria|claw|cloak|clshttp|clush|coast|cmsworldmap|code\.com|cogent|coldfusion|coll|collect|comb|combine|commentreader|common|comodo|compan|compatible\-|conc|conduc|contact|control|contype|conv|cool|copi|copy|coral|corn|cosmos|costa|cowbot|cr4nk|craft|cralwer|crank|crap|crawler0|crazy|cres|cs\-cz|cshttp|cuill|CURI|curl|curry|custo|cute|cyber|cz3|czx|daily|dalvik|daobot|dark|darwin|data|daten|dcbot|dcs|dds\ explorer|deep|deps|detect|dex|diam|diavol|diibot|dillo|ding|disc|disp|ditto|dlc|doco|dotbot|drag|drec|dsdl|dsok|dts|duck|dumb|eag|earn|earthcom|easydl|ebin|echo|edco|egoto|elnsb5|email|emer|empas|encyclo|enfi|enhan|enterprise\_search|envolk|erck|erocr|eventax|evere|evil|ewh|exac|exploit|expre|extra|eyen|fang|fast|fastbug|faxo|fdse|feed24|feeddisc|feedfinder|feedhub|fetch|filan|fileboo|fimap|find|firebat|firedownload\/1\.2pre\ firefox\/3\.6|firefox\/0|firs|flam|flash|flexum|flicky|flip|fly|focus|fooky|forum|forv|fost|foto|foun|fount|foxy\/1\;|free|friend|frontpage|fuck|fuer|futile|fyber|gais|galbot|gbpl|gecko\/2001|gecko\/2002|gecko\/2006|gecko\/2009042316|gener|geni|geo|geona|geth|getr|getw|ggl|gira|gluc|gnome|go\!zilla|goforit|goldfire|gonzo|google\ wireless|gosearch|got\-it|gozilla|grab|graf|greg|grub|grup|gsa\-cra|gsearch|gt\:\:www|guidebot|guruji|gyps|haha|hailo|harv|hash|hatena|hax|head|helm|herit|heritrix|hgre|hippo|hloader|hmse|hmview|holm|holy|hotbar\ 4\.4\.5\.0|hpprint|href\s|httpclient|httpconnect|httplib|httrack|human|huron|hverify|hybrid|hyper|ia_archiver|iaskspi|ibm\ evv|iccra|ichiro|icopy|ics\)|ida|ie\/5\.0|ieauto|iempt|iexplore\.exe|ilium|ilse|iltrov|indexer|indy|ineturl|infonav|innerpr|inspect|insuran|intellig|interget|internet\_explorer|internet\x|intraf|ip2|ipsel|irlbot|isc\_sys|isilo|isrccrawler|isspi|jady|jaka|jam|jenn|jet|jiro|jobo|joc|jupit|just|jyx|jyxo|kash|kazo|kbee|kenjin|kernel|keywo|kfsw|kkma|kmc|know|kosmix|krae|krug|ksibot|ktxn|kum|labs|lanshan|lapo|larbin|leech|lets|lexi|lexxe|libby|libcrawl|libcurl|libfetch|libweb|light|linc|lingue|linkcheck|linklint|linkman|lint|list|litefeeds|livedoor|livejournal|liveup|lmq|loader|locu|london|lone|loop|lork|lth\_|lwp|mac\_f|magi|magp|mail\.ru|main|majest|mam|mama|mana|marketwire|masc|mass|mata|mvi|mcbot|mecha|mechanize|mediapartners|metadata|metalogger|metaspin|metauri|mete|mib\/2\.2|microsoft\.url|microsoft\_internet\_explorer|mido|miggi|miix|mindjet|mindman|miner|mips|mira|mire|miss|mist|mizz|mj12|mlbot|mlm|mnog|moge|moje|mooz|more|mouse|mozdex) [NC]
	RewriteRule .* - [G]
</IfModule>

# 2013 UA BLACKLIST [2/3]
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_USER_AGENT} (mozilla\/0|mozilla\/1|mozilla\/4\.61\ \[en\]|mozilla\/firefox|mpf|msie\ 2|msie\ 3|msie\ 4|msie\ 5|msie\ 6\.0\-|msie\ 6\.0b|msie\ 7\.0a1\;|msie\ 7\.0b\;|msie6xpv1|msiecrawler|msnbot\-media|msnbot\-products|msnptc|msproxy|msrbot|musc|mvac|mwm|my\_age|myapp|mydog|myeng|myie2|mysearch|myurl|nag|name|naver|navr|near|netants|netcach|netcrawl|netfront|netinfo|netmech|netsp|netx|netz|neural|neut|newsbreak|newsgatorinbox|newsrob|newt|next|ng\-s|ng\/2|nice|nikto|nimb|ninja|ninte|nog|noko|nomad|norb|note|npbot|nuse|nutch|nutex|nwsp|obje|ocel|octo|odi3|oegp|offby|offline|omea|omg|omhttp|onfo|onyx|openf|openssl|openu|opera\ 2|opera\ 3|opera\ 4|opera\ 5|opera\ 6|opera\ 7|orac|orbit|oreg|osis|our|outf|owl|p3p\_|page2rss|pagefet|pansci|parser|patw|pavu|pb2pb|pcbrow|pear|peer|pepe|perfect|perl|petit|phoenix\/0\.|phras|picalo|piff|pig|pingd|pipe|pirs|plag|planet|plant|platform|playstation|plesk|pluck|plukkie|poe\-com|poirot|pomp|post|postrank|powerset|preload|privoxy|probe|program\_shareware|protect|protocol|prowl|proxie|proxy|psbot|pubsub|puf|pulse|punit|purebot|purity|pyq|pyth|query|quest|qweer|radian|rambler|ramp|rapid|rawdog|rawgrunt|reap|reeder|refresh|reget|relevare|repo|requ|request|rese|retrieve|rip|rix|rma|roboz|rocket|rogue|rpt\-http|rsscache|ruby|ruff|rufus|rv\:0\.9\.7\)|salt|sample|sauger|savvy|sbcyds|sbider|sblog|sbp|scagent|scan|scej\_|sched|schizo|schlong|schmo|scorp|scott|scout|scrawl|screen|screenshot|script|seamonkey\/1\.5a|search17|searchbot|searchme|sega|semto|sensis|seop|seopro|sept|sezn|seznam|share|sharp|shaz|shell|shelo|sherl|shim|shopwiki|silurian|simple|simplepie|siph|sitekiosk|sitescan|sitevigil|sitex|skam|skimp|skygrid|sledink|sleip|slide|sly|smag|smurf|snag|snapbot|snapshot|snif|snip|snoop|sock|socsci|sogou|sohu|solr|some|soso|spad|span|spbot|speed|sphere|spin|sproose|spurl|sputnik|spyder|squi|sqwid|sqworm|ssm\_ag|stack|stamp|statbot|state|steel|stilo|strateg|stress|strip|style|subot|such|suck|sume|sunos\ 5\.7|sunrise|superbot|superbro|supervi|surf4me|surfbot|survey|susi|suza|suzu|sweep|swish|sygol|synapse|sync2it|systems|szukacz|tagger|tagoo|tagyu|take|talkro|tamu|tandem|tarantula|tbot|tcf|tcs\/1|teamsoft|tecomi|teesoft|teleport|telesoft|tencent|terrawiz|test|texnut|thomas|tiehttp|timebot|timely|tipp|tiscali|titan|tmcrawler|tmhtload|tocrawl|todobr|tongco|toolbar\;\ \(r1|topic|topyx|torrent|track|translate|traveler|treeview|tricus|trivia|trivial|true|tunnel|turing|turnitin|tutorgig|twat|tweak|twice|tygo|ubee|uchoo|ultraseek|unavail|unf|universal|unknown|upg1|urlbase|urllib|urly|user\-agent\:|useragent|usyd|vagabo|valet|vamp|vci|veri\~li|verif|versus|via|vikspider|virtual|visual|void|voyager|vsyn|w0000t|w3search|walhello|walker|wand|waol|watch|wavefire|wbdbot|weather|web\.ima|web2mal|webarchive|webbot|webcat|webcor|webcorp|webcrawl|webdat|webdup|webgo|webind|webis|webitpr|weblea|webmin|webmoney|webp|webql|webrobot|webster|websurf|webtre|webvac|webzip|wells|wep\_s|wget|whiz|widow|win67|windows\-rss|windows\ 2000|windows\ 3|windows\ 95|windows\ 98|windows\ ce|windows\ me|winht|winodws|wish|wizz|worio|works|world|worth|wwwc|wwwo|wwwster|xaldon|xbot|xenu|xirq|y\!tunnel|yacy|yahoo\-mmaudvid|yahooseeker|yahooysmcm|yamm|yand|yandex|yang|yoono|yori|yotta|yplus\ |ytunnel|zade|zagre|zeal|zebot|zerx|zeus|zhuaxia|zipcode|zixy|zmao|zmeu|zune) [NC]
	RewriteRule .* - [G]
</IfModule>

# 2013 UA BLACKLIST [3/3] (pentag0)
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_USER_AGENT} (black\ hole|titan|webstripper|netmechanic|cherrypicker|emailcollector|emailsiphon|webbandit|emailwolf|extractorpro|copyrightcheck|crescent|wget|sitesnagger|prowebwalker|cheesebot|teleport|teleportpro|miixpc|telesoft|website\ quester|webzip|moget/2\.1|webzip/4\.0|websauger|webcopier|netants|mister\ pix|webauto|thenomad|www-collector-e|rma|libweb/clshttp|asterias|httplib|turingos|spanner|infonavirobot|harvest/1\.5|bullseye/1\.0|mozilla/4\.0\ \(compatible;\ bullseye;\ windows\ 95\)|crescent\ internet\ toolpak\ http\ ole\ control\ v\.1\.0|cherrypickerse/1\.0|cherrypicker\ /1\.0|webbandit/3\.50|nicerspro|microsoft\ url\ control\ -\ 5\.01\.4511|dittospyder|foobot|webmasterworldforumbot|spankbot|botalot|lwp-trivial/1\.34|lwp-trivial|wget/1\.6|bunnyslippers|microsoft\ url\ control\ -\ 6\.00\.8169|urly\ warning|wget/1\.5\.3|linkwalker|cosmos|moget|hloader|humanlinks|linkextractorpro|offline\ explorer|mata\ hari|lexibot|web\ image\ collector|the\ intraformant|true_robot/1\.0|true_robot|blowfish/1\.0|jennybot|miixpc/4\.2|builtbottough|propowerbot/2\.14|backdoorbot/1\.0|tocrawl/urldispatcher|webenhancer|tighttwatbot|suzuran|vci\ webviewer\ vci\ webviewer\ win32|vci|szukacz/1\.4|queryn\ metasearch|openfind\ data\ gathere|openfind|xenu\'s\ link\ sleuth\ 1\.1c|xenu's|zeus|repomonkey\ bait\ &\ tackle/v1\.01|repomonkey|zeus\ 32297\ webster\ pro\ v2\.9\ win32|webster\ pro|erocrawler|linkscan/8\.1a\ unix|keyword\ density/0\.9|kenjin\ spider|cegbfeieh) [NC]
	RewriteRule .* - [G]
</IfModule>

To implement the UA Blacklist, simply paste into your site’s root .htaccess file (or even better, the Apache configuration file). Upload, test, and stay current with updates and news.

Details

Some notes about differences between 2013 and 2010 versions:

  • After much deliberation and feedback, libwww is removed from the 2013 UA Blacklist.
  • To accommodate Facebook (and others) traffic, the empty/blank user-agent is no longer blocked, which is unfortunate because of its effectiveness at blocking bad requests.
  • Plus the usual optimizing via regex consolidation and syntax improvements.

Bolierplate

The UA Blacklist uses hundreds of regular expressions to block bad bots based on their user-agent. Each of these regular expressions can match many different user-agents. Care has been taken to ensure that only bad bots are blocked, but false positives are inevitable. If you know of a user-agent that should be removed from the list, please let me know. I will do my best to update things asap.

Bottom line: Only use this code if you know what you are doing. It’s not a “fix-it-and-forget” situation, especially for production sites. It’s more like a “fix-it-and-keep-an-eye-on-it” kind of thing, meant for those who understand how it works. As mentioned in the comments, the User Agent Blacklist is a work in progress. As with any code, please use with caution and at your own risk. By obtaining this freely offered code, you assume all responsibility for its use.</boilerplate>

Stay Current

As with IP blacklists, user-agent blacklists are meant to be updated in order to stay effective. But with user-agents, rather than replacing the entire list, it’s more a matter of updating it by adding new strings or removing any that have fallen by the wayside. In general, a good user-agent blacklist will remain beneficial for years, but it’s efficiency will inevitably decrease unless updated to stay current.

More Security Tools

For those new to Perishable Press, please check out some of my other security resources:

Security is an important part of what I do around here, so please chime in with any suggestions, ideas, and comments. Thank you for visiting Perishable Press.

Shouts out

Thank you to the following people for the input, feedback, and help in improving the 2013 UA Blacklist:

  • Jeremy Fairbrass
  • belline
  • Muskie
  • Alexander
  • GD

Bonus: Mini UA Blacklist

Here is an alternate “mini” version of the UA Blacklist:

# 2013 MINI UA BLACKLIST
<IfModule mod_setenvif.c>
	SetEnvIfNoCase User-Agent (\<|\>|\'|\$x0|\%0A|\%0D|\%27|\%3C|\%3E|\%00|\+select|\+union|\&lt) keep_out
	SetEnvIfNoCase User-Agent (binlar|casper|checkprivacy|cmsworldmap|comodo|curious|diavol|doco) keep_out
	SetEnvIfNoCase User-Agent (dotbot|feedfinder|flicky|ia_archiver|kmccrew|libwww|nutch) keep_out
	SetEnvIfNoCase User-Agent (planetwork|purebot|pycurl|skygrid|sucker|turnit|vikspid|zmeu|zune) keep_out
	<limit GET POST PUT>
		Order Allow,Deny
		Allow from all
		Deny from env=keep_out
		# Deny from all
		# Deny from 111.222.333
	</limit>
</IfModule>

Note: this should be used separately as a lightweight alternative to the full UA Blacklist. It also serves as an example of an alternate syntax/method that may be used for blocking user agents on Apache. If mod_rewrite isn’t available on your server, try this method instead (replace the (user|agent|patterns) in the mini list with the (user|agent|patterns) from the full list.

Jeff Starr
About the Author Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.
Archives
59 responses
  1. Hi Jeff
    i will try as you suggested.
    My fist instinct was to remove the “Windows\ NT” etc code, but that didn’t work, then i removed the code i pasted above and that worked, and then put the “Windows\ NT” code back and that worked as well, so the code above seems to be the culprit.

    Thanks

  2. Hi Jeff
    The new code you have posted now works.
    Thanks for your help!

  3. “cuts words in comments”

    This is screenshot. This is how it looks in IE and FF latest version – incorrect wrapping in the end of line. Latest Opera is OK.

    http://www.ex.ua/591123853065

  4. Hi Jeff,
    The updated list you have posted now works. Tested with FF, IE & Opera. Thanks for your time and efforts! I like your blog.

  5. Hi Jeff,

    I tried with this block list and I too get an error 410.

    It’s strange because I tried simulating from various sources (user agents). I can access directly to my website, I can see the full content (Chrome, Chromium, Safari, Opera, Mozilla).

    The problem is that I do a test with google PageSpeed and returns error 410. Also if I do a speed check test with webpagetest & GTmetrix. And also with some whois services.

    Finally, if I try with pingdom service everything is ok!

    I do not understand why the error occurs. Any suggestions?

    Many thanks!

    • Jeff Starr

      It sounds like the user agents reported by PageSpeed and GTmetrix are getting blocked.. the best way to resolve the issue is to locate the string/pattern in the list and remove it. Or if you can report the user-agent(s), I would be glad to take a look and see what’s up..

  6. I believe that when WP runs cron jobs e.g. REQUEST_URI: /wp-cron.php?doing_wp_cron=blah blah the user agent is blocked by 2013 UA BLACKLIST [2/3] as it contains “wordp” HTTP_USER_AGENT: WordPress/3.6;http://mydomain.com.

    When I deleted that test in the RewriteCond I didn’t see it blocked again, and on replacing it it was blocked again.

  7. Juan Pablo Laborde September 3, 2013 @ 6:41 am

    I having problems with FireFox. I must change to “MINI UA BLACKLIST” and now is working.
    Do you know what?

    • Jeff Starr

      Hi Juan, you’ll have to be a bit more specific before anyone can help. Please read through the first several comments on the post to get a sense of how things work and then try again with specific information (e.g., URL, UA+version, etc.). Thanks.

      • Juan Pablo Laborde September 5, 2013 @ 7:13 am

        Hi Jeff. The problem is using Firefox 23.0.1.
        I think is the las version of Firefox.

      • Jeff Starr

        Um, nope not latest Firefox (v23.0.1).. which is working fine on Mac and Windows..

        More than likely it has something to do with a specific URL(s). If you can narrow it down to a specific URL(s) that would be most helpful I think..

      • Juan Pablo Laborde September 5, 2013 @ 12:07 pm

        Jeff, you are making me to think :-)

        The page has some scripts using Ajax to load content. Something like that:

        jQuery(function() {
        jQuery("#blabla").load("/blabla/bla.php");
        });

        Is it possible tha the problem is with this content???

      • Jeff Starr

        It could be.. it really depends on the request that is being made, which is entirely up to you to figure out. To help people with diagnosing issues, I’ve written a post that explains how to troubleshoot .htaccess directives. That explains it all, but feel free to post the offending string once you’ve found it; that way I can update the UA Blacklist and everyone can benefit :)

      • Juan Pablo Laborde September 5, 2013 @ 1:00 pm

        I will check Jeff. Thanks for the info. I check my user agent with fiddler an it is:
        User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0

  8. Just so I understand correctly, the G5 blacklist and the UA blacklist handle different issues, right? Neither is a superset of the other?

  9. Jeff,

    I really, really, really appreciate all your hard work — and continuing updates! — on this, the 5G Blacklist and the IP Blacklist.

    THANK YOU VERY, VERY MUCH!

    Leslie

    P.S. Yeah, I know caps are shouty, but that’s my point! =)

    (And, yeah, I know I copied’n’pasted this comment from another thread; I want to make it clear how appreciative I am. Where’s your PayPal donate button?)

  10. I just copied in this list, I also seen a blackhole trap on your site from a post in 2010. Is it recommended to use the user blacklist and IP blacklist and blackhole to defend a wordpress site?

    I’m really new to WordPress and since someone else linked to me I’m having bots like crazy. I tried to add some stuff to the htaccess to block ezoom for one and they are still getting through.

    I just removed what I had and just put this list in. Thank you for your help. I’m really glad I found this site.

  11. Jesse Smith October 14, 2013 @ 4:23 pm

    Hi Jeff,
    I was just wondering why you eventually elected to omit “libwww” from the 2013 User-Agent Blacklist? I understand that on the one hand, some legitimate services use it, but on the other hand so do a lot of bad actors. Was there a primary factor that eventually influenced your decision? (And btw I see that “libwww” is still included in the “Mini UA Blacklist” at the bottom there.)

    Thanks for all you have done through the years to help keep the world’s web servers a little safer. Cheers!

    • Jeff Starr

      The decision to remove libwww from the blacklist was mostly based on user feedback.. although I continue to block it for some of my sites, as you’ve seen in the mini UA blacklist :)

  12. Thomas Oliver October 22, 2013 @ 7:39 pm

    Hi Jeff, I wanted to comment on your 5G Blacklist 2013, but you have comments closed. I have a suggestion. How about changing this:

    RewriteCond %{QUERY_STRING} (javascript:).*(\;) [NC,OR]

    to this:

    RewriteCond %{QUERY_STRING} script:[^;]*; [NC,OR]

    Should be an improvement in efficiency and speed. Because you have items in groups. This causes the REGEX parser to capture them when it really doesn’t need to. Also, a semi-colon has no meaning in REGEX or mod_rewrite, so there is no need to escape it. And there is no need to use the everything code .* when you can use what I suggested.

[ Comments are closed for this post ]