2013 User Agent Blacklist

[ 2013 User Agent Blacklist ] The 2013 User Agent Blacklist blocks hundreds of the worst bots while ensuring open-access for normal traffic, major search engines (Google, Bing, et al), good browsers (Chrome, Firefox, Opera, et al), and everyone else. Compared to blocking threats by IP, blocking by user-agent is more effective as a general security strategy. Although it’s trivial to spoof any user agent, many bad requests continue to report user-agent strings that are known to be associated with malicious activity. For example, the notorious “httrack” user agent has been widely blocked since at least 2007, yet it continues to plague sites to this day. Fortunately, it doesn’t matter if it’s the “real” httrack harassing your site or something pretending to be httrack — you don’t want anything to do it. Implementing a user-agent blacklist is a free and simple way to filter out a large percentage of bad traffic while freeing up valuable server resources for legitimate visitors.

Proven Security

The 2013 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.

Better Performance, Better SEO

Search engines such as Google are placing more weight on speedy, fast-loading websites. If your site is plagued with resource-devouring, bandwidth-wasting bots, it’s performance is probably not as good as it should be. Even if your site looks fine on the surface, without proper protection bad bots can gobble your bandwidth and leech your server resources. A single malicious bot can make hundreds and thousands of requests in a very short period of time while scanning and probing for vulnerabilities. If Google visits while bad bots are hitting your site, your site’s SEO could suffer. Fortunately, the 2013 UA Blacklist protects your site against hundreds of nefarious bots, thereby fostering maximum performance for the search engines.

2013 User Agent Blacklist

A few months ago, pentag0 posted an effective user-agent blacklist (404 link removed 2013/11/13) that I’ve went ahead and incorporated into the 2013 User Agent Blacklist. Essentially this new 2013 UA blacklist is the combination of pentag0’s list, the 2010 UA Blacklist, and the latest and worst ua’s personally collected over the past few years.

Note that this blacklist should replace any similar lists to avoid redundancy and optimize performance (see more important notes below). So here it is, presented as three sets of .htaccess directives with 100% no editing required:

# 2013 UA BLACKLIST [1/3]
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_HOST} !^(127\.0\.0\.0|localhost) [NC]
	RewriteCond %{HTTP_USER_AGENT} (\<|\>|\'|\$x0E|\%0A|\%0D|\%27|\%3C|\%3E|\%00|\@\$x|\!susie|\_irc|\_works|\+select\+|\+union\+|\&lt;\?|1\,\1\,1\,|3gse|4all|4anything|5\.1\;\ xv6875\)|59\.64\.153\.|85\.17\.|88\.0\.106\.|98|a\_browser|a1\ site|abac|abach|abby|aberja|abilon|abont|abot|accept|access|accoo|accoon|aceftp|acme|active|address|adopt|adress|advisor|agent|ahead|aihit|aipbot|alarm|albert|alek|alexa\ toolbar\;\ \(r1\ 1\.5\)|alltop|alma|alot|alpha|america\ online\ browser\ 1\.1|amfi|amfibi|anal|andit|anon|ansearch|answer|answerbus|answerchase|antivirx|apollo|appie|arach|archive|arian|aboutoil|asps|aster|atari|atlocal|atom|atrax|atrop|attrib|autoh|autohot|av\ fetch|avsearch|axod|axon|baboom|baby|back|baid|bali|bandit|barry|basichttp|batch|bdfetch|beat|beaut|become|bee|beij|betabot|biglotron|bilgi|binlar|bison|bitacle|bitly|blaiz|blitz|blogl|blogscope|blogzice|bloob|blow|bord|bond|boris|bost|bot\.ara|botje|botw|bpimage|brand|brok|broth|browseabit|browsex|bruin|bsalsa|bsdseek|built|bulls|bumble|bunny|busca|busi|buy|bwh3|cafek|cafi|camel|cand|captu|casper|catch|ccbot|ccubee|cd34|ceg|cfnetwork|cgichk|cha0s|chang|chaos|char|char\(|chase\ x|check\_http|checker|checkonly|checkprivacy|chek|chill|chttpclient|cipinet|cisco|cita|citeseer|clam|claria|claw|cloak|clshttp|clush|coast|cmsworldmap|code\.com|cogent|coldfusion|coll|collect|comb|combine|commentreader|common|comodo|compan|compatible\-|conc|conduc|contact|control|contype|conv|cool|copi|copy|coral|corn|cosmos|costa|cowbot|cr4nk|craft|cralwer|crank|crap|crawler0|crazy|cres|cs\-cz|cshttp|cuill|CURI|curl|curry|custo|cute|cyber|cz3|czx|daily|dalvik|daobot|dark|darwin|data|daten|dcbot|dcs|dds\ explorer|deep|deps|detect|dex|diam|diavol|diibot|dillo|ding|disc|disp|ditto|dlc|doco|dotbot|drag|drec|dsdl|dsok|dts|duck|dumb|eag|earn|earthcom|easydl|ebin|echo|edco|egoto|elnsb5|email|emer|empas|encyclo|enfi|enhan|enterprise\_search|envolk|erck|erocr|eventax|evere|evil|ewh|exac|exploit|expre|extra|eyen|fang|fast|fastbug|faxo|fdse|feed24|feeddisc|feedfinder|feedhub|fetch|filan|fileboo|fimap|find|firebat|firedownload\/1\.2pre\ firefox\/3\.6|firefox\/0|firs|flam|flash|flexum|flicky|flip|fly|focus|fooky|forum|forv|fost|foto|foun|fount|foxy\/1\;|free|friend|frontpage|fuck|fuer|futile|fyber|gais|galbot|gbpl|gecko\/2001|gecko\/2002|gecko\/2006|gecko\/2009042316|gener|geni|geo|geona|geth|getr|getw|ggl|gira|gluc|gnome|go\!zilla|goforit|goldfire|gonzo|google\ wireless|gosearch|got\-it|gozilla|grab|graf|greg|grub|grup|gsa\-cra|gsearch|gt\:\:www|guidebot|guruji|gyps|haha|hailo|harv|hash|hatena|hax|head|helm|herit|heritrix|hgre|hippo|hloader|hmse|hmview|holm|holy|hotbar\ 4\.4\.5\.0|hpprint|href\s|httpclient|httpconnect|httplib|httrack|human|huron|hverify|hybrid|hyper|ia_archiver|iaskspi|ibm\ evv|iccra|ichiro|icopy|ics\)|ida|ie\/5\.0|ieauto|iempt|iexplore\.exe|ilium|ilse|iltrov|indexer|indy|ineturl|infonav|innerpr|inspect|insuran|intellig|interget|internet\_explorer|internet\x|intraf|ip2|ipsel|irlbot|isc\_sys|isilo|isrccrawler|isspi|jady|jaka|jam|jenn|jet|jiro|jobo|joc|jupit|just|jyx|jyxo|kash|kazo|kbee|kenjin|kernel|keywo|kfsw|kkma|kmc|know|kosmix|krae|krug|ksibot|ktxn|kum|labs|lanshan|lapo|larbin|leech|lets|lexi|lexxe|libby|libcrawl|libcurl|libfetch|libweb|light|linc|lingue|linkcheck|linklint|linkman|lint|list|litefeeds|livedoor|livejournal|liveup|lmq|loader|locu|london|lone|loop|lork|lth\_|lwp|mac\_f|magi|magp|mail\.ru|main|majest|mam|mama|mana|marketwire|masc|mass|mata|mvi|mcbot|mecha|mechanize|mediapartners|metadata|metalogger|metaspin|metauri|mete|mib\/2\.2|microsoft\.url|microsoft\_internet\_explorer|mido|miggi|miix|mindjet|mindman|miner|mips|mira|mire|miss|mist|mizz|mj12|mlbot|mlm|mnog|moge|moje|mooz|more|mouse|mozdex) [NC]
	RewriteRule .* - [G]
</IfModule>

# 2013 UA BLACKLIST [2/3]
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_USER_AGENT} (mozilla\/0|mozilla\/1|mozilla\/4\.61\ \[en\]|mozilla\/firefox|mpf|msie\ 2|msie\ 3|msie\ 4|msie\ 5|msie\ 6\.0\-|msie\ 6\.0b|msie\ 7\.0a1\;|msie\ 7\.0b\;|msie6xpv1|msiecrawler|msnbot\-media|msnbot\-products|msnptc|msproxy|msrbot|musc|mvac|mwm|my\_age|myapp|mydog|myeng|myie2|mysearch|myurl|nag|name|naver|navr|near|netants|netcach|netcrawl|netfront|netinfo|netmech|netsp|netx|netz|neural|neut|newsbreak|newsgatorinbox|newsrob|newt|next|ng\-s|ng\/2|nice|nikto|nimb|ninja|ninte|nog|noko|nomad|norb|note|npbot|nuse|nutch|nutex|nwsp|obje|ocel|octo|odi3|oegp|offby|offline|omea|omg|omhttp|onfo|onyx|openf|openssl|openu|opera\ 2|opera\ 3|opera\ 4|opera\ 5|opera\ 6|opera\ 7|orac|orbit|oreg|osis|our|outf|owl|p3p\_|page2rss|pagefet|pansci|parser|patw|pavu|pb2pb|pcbrow|pear|peer|pepe|perfect|perl|petit|phoenix\/0\.|phras|picalo|piff|pig|pingd|pipe|pirs|plag|planet|plant|platform|playstation|plesk|pluck|plukkie|poe\-com|poirot|pomp|post|postrank|powerset|preload|press|privoxy|probe|program\_shareware|protect|protocol|prowl|proxie|proxy|psbot|pubsub|puf|pulse|punit|purebot|purity|pyq|pyth|query|quest|qweer|radian|rambler|ramp|rapid|rawdog|rawgrunt|reap|reeder|refresh|reget|relevare|repo|requ|request|rese|retrieve|rip|rix|rma|roboz|rocket|rogue|rpt\-http|rsscache|ruby|ruff|rufus|rv\:0\.9\.7\)|salt|sample|sauger|savvy|sbcyds|sbider|sblog|sbp|scagent|scan|scej\_|sched|schizo|schlong|schmo|scorp|scott|scout|scrawl|screen|screenshot|script|seamonkey\/1\.5a|search17|searchbot|searchme|sega|semto|sensis|seop|seopro|sept|sezn|seznam|share|sharp|shaz|shell|shelo|sherl|shim|shopwiki|silurian|simple|simplepie|siph|sitekiosk|sitescan|sitevigil|sitex|skam|skimp|skygrid|sledink|sleip|slide|sly|smag|smurf|snag|snapbot|snapshot|snif|snip|snoop|sock|socsci|sogou|sohu|solr|some|soso|spad|span|spbot|speed|sphere|spin|sproose|spurl|sputnik|spyder|squi|sqwid|sqworm|ssm\_ag|stack|stamp|statbot|state|steel|stilo|strateg|stress|strip|style|subot|such|suck|sume|sunos\ 5\.7|sunrise|superbot|superbro|supervi|surf4me|surfbot|survey|susi|suza|suzu|sweep|swish|sygol|synapse|sync2it|systems|szukacz|tagger|tagoo|tagyu|take|talkro|tamu|tandem|tarantula|tbot|tcf|tcs\/1|teamsoft|tecomi|teesoft|teleport|telesoft|tencent|terrawiz|test|texnut|thomas|tiehttp|timebot|timely|tipp|tiscali|titan|tmcrawler|tmhtload|tocrawl|todobr|tongco|toolbar\;\ \(r1|topic|topyx|torrent|track|translate|traveler|treeview|tricus|trivia|trivial|true|tunnel|turing|turnitin|tutorgig|twat|tweak|twice|tygo|ubee|uchoo|ultraseek|unavail|unf|universal|unknown|upg1|urlbase|urllib|urly|user\-agent\:|useragent|usyd|vagabo|valet|vamp|vci|veri\~li|verif|versus|via|vikspider|virtual|visual|void|voyager|vsyn|w0000t|w3search|walhello|walker|wand|waol|watch|wavefire|wbdbot|weather|web\.ima|web2mal|webarchive|webbot|webcat|webcor|webcorp|webcrawl|webdat|webdup|webgo|webind|webis|webitpr|weblea|webmin|webmoney|webp|webql|webrobot|webster|websurf|webtre|webvac|webzip|wells|wep\_s|wget|whiz|widow|win67|windows\-rss|windows\ 2000|windows\ 3|windows\ 95|windows\ 98|windows\ ce|windows\ me|winht|winodws|wish|wizz|worio|works|world|worth|wwwc|wwwo|wwwster|xaldon|xbot|xenu|xirq|y\!tunnel|yacy|yahoo\-mmaudvid|yahooseeker|yahooysmcm|yamm|yand|yandex|yang|yoono|yori|yotta|yplus\ |ytunnel|zade|zagre|zeal|zebot|zerx|zeus|zhuaxia|zipcode|zixy|zmao|zmeu|zune) [NC]
	RewriteRule .* - [G]
</IfModule>

# 2013 UA BLACKLIST [3/3] (pentag0)
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_USER_AGENT} (black\ hole|titan|webstripper|netmechanic|cherrypicker|emailcollector|emailsiphon|webbandit|emailwolf|extractorpro|copyrightcheck|crescent|wget|sitesnagger|prowebwalker|cheesebot|teleport|teleportpro|miixpc|telesoft|website\ quester|webzip|moget/2\.1|webzip/4\.0|websauger|webcopier|netants|mister\ pix|webauto|thenomad|www-collector-e|rma|libweb/clshttp|asterias|httplib|turingos|spanner|infonavirobot|harvest/1\.5|bullseye/1\.0|mozilla/4\.0\ \(compatible;\ bullseye;\ windows\ 95\)|crescent\ internet\ toolpak\ http\ ole\ control\ v\.1\.0|cherrypickerse/1\.0|cherrypicker\ /1\.0|webbandit/3\.50|nicerspro|microsoft\ url\ control\ -\ 5\.01\.4511|dittospyder|foobot|webmasterworldforumbot|spankbot|botalot|lwp-trivial/1\.34|lwp-trivial|wget/1\.6|bunnyslippers|microsoft\ url\ control\ -\ 6\.00\.8169|urly\ warning|wget/1\.5\.3|linkwalker|cosmos|moget|hloader|humanlinks|linkextractorpro|offline\ explorer|mata\ hari|lexibot|web\ image\ collector|the\ intraformant|true_robot/1\.0|true_robot|blowfish/1\.0|jennybot|miixpc/4\.2|builtbottough|propowerbot/2\.14|backdoorbot/1\.0|tocrawl/urldispatcher|webenhancer|tighttwatbot|suzuran|vci\ webviewer\ vci\ webviewer\ win32|vci|szukacz/1\.4|queryn\ metasearch|openfind\ data\ gathere|openfind|xenu\'s\ link\ sleuth\ 1\.1c|xenu's|zeus|repomonkey\ bait\ &\ tackle/v1\.01|repomonkey|zeus\ 32297\ webster\ pro\ v2\.9\ win32|webster\ pro|erocrawler|linkscan/8\.1a\ unix|keyword\ density/0\.9|kenjin\ spider|cegbfeieh) [NC]
	RewriteRule .* - [G]
</IfModule>

View text format

To implement the UA Blacklist, simply paste into your site’s root .htaccess file (or even better, the Apache configuration file). Upload, test, and stay current with updates and news.

Details

Some notes about differences between 2013 and 2010 versions:

  • After much deliberation and feedback, libwww is removed from the 2013 UA Blacklist.
  • To accommodate Facebook (and others) traffic, the empty/blank user-agent is no longer blocked, which is unfortunate because of its effectiveness at blocking bad requests.
  • Plus the usual optimizing via regex consolidation and syntax improvements.

Bolierplate

The UA Blacklist uses hundreds of regular expressions to block bad bots based on their user-agent. Each of these regular expressions can match many different user-agents. Care has been taken to ensure that only bad bots are blocked, but false positives are inevitable. If you know of a user-agent that should be removed from the list, please let me know. I will do my best to update things asap.

Bottom line: Only use this code if you know what you are doing. It’s not a “fix-it-and-forget” situation, especially for production sites. It’s more like a “fix-it-and-keep-an-eye-on-it” kind of thing, meant for those who understand how it works. As mentioned in the comments, the User Agent Blacklist is a work in progress. As with any code, please use with caution and at your own risk. By obtaining this freely offered code, you assume all responsibility for its use.</boilerplate>

Stay Current

As with IP blacklists, user-agent blacklists are meant to be updated in order to stay effective. But with user-agents, rather than replacing the entire list, it’s more a matter of updating it by adding new strings or removing any that have fallen by the wayside. In general, a good user-agent blacklist will remain beneficial for years, but it’s efficiency will inevitably decrease unless updated to stay current.

More Security Tools

For those new to Perishable Press, please check out some of my other security resources:

Security is an important part of what I do around here, so please chime in with any suggestions, ideas, and comments. Thank you for visiting Perishable Press.

Shouts out

Thank you to the following people for the input, feedback, and help in improving the 2013 UA Blacklist:

  • Jeremy Fairbrass
  • belline (404 link removed 2014/05/30)
  • Muskie
  • Alexander
  • GD

Bonus: Mini UA Blacklist

Here is an alternate “mini” version of the UA Blacklist:

# 2013 MINI UA BLACKLIST
<IfModule mod_setenvif.c>
	SetEnvIfNoCase User-Agent (\<|\>|\'|\$x0|\%0A|\%0D|\%27|\%3C|\%3E|\%00|\+select|\+union|\&lt) keep_out
	SetEnvIfNoCase User-Agent (binlar|casper|checkprivacy|cmsworldmap|comodo|curious|diavol|doco) keep_out
	SetEnvIfNoCase User-Agent (dotbot|feedfinder|flicky|ia_archiver|jakarta|kmccrew|libwww|nutch) keep_out
	SetEnvIfNoCase User-Agent (planetwork|purebot|pycurl|skygrid|sucker|turnit|vikspid|zmeu|zune) keep_out
	<limit GET POST PUT>
		Order Allow,Deny
		Allow from all
		Deny from env=keep_out
		# Deny from all
		# Deny from 111.222.333
	</limit>
</IfModule>

Note: this should be used separately as a lightweight alternative to the full UA Blacklist. It also serves as an example of an alternate syntax/method that may be used for blocking user agents on Apache. If mod_rewrite isn’t available on your server, try this method instead (replace the (user|agent|patterns) in the mini list with the (user|agent|patterns) from the full list.