Ultimate htaccess Blacklist 2 (Compressed Version)
Published Monday, October 15, 2007 @ 11:14 am • 62 Responses
In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement. For example, as reader Greg suggests, a compressed version of the blacklist would be very useful. In this post, we present a compressed version of our Ultimate htaccess Blacklist that features around 50 new agents. Whereas the original blacklist is approximately 8.6KB in size, the compressed version is only 3.4KB, even with the additional agents. Overall, the compressed version requires fewer system resources to block a greater number of bad agents.
# Ultimate htaccess Blacklist 2 from Perishable Press
# Deny domain access to spammers and other scumbags
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston\ Project|BravoBrian\ SpiderEngine\ MarcoPolo|Bot\ mailto:craftbot@yahoo.com|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent\ Internet\ ToolPak|Custo|cyberalert|DA$|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo\ Pump|DISCoFinder|Download\ Demon|Download\ Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx\.net|Email\ Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express\ WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites\ Sweeper|Fetch|FEZhead|FileHound|FlashGet\ WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go\!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green\ Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http\ generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image\ Stripper|Image\ Sucker|imagefetch|IncyWincy|Indy*Library|Indy\ Library|informant|Ingelin|InterGET|Internet\ Ninja|InternetLinkagent|Internet\ Ninja|InternetSeer\.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC\ Web\ Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS\ ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac\ Finder|Mag-Net|Mass\ Downloader|MCspider|Memo|Microsoft.URL|MIDown\ tool|Mirror|Missigua\ Locator|Mister\ PiX|MMMtoCrawl\/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS\ FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net\ Vampire|NetZIP|NetZip\ Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline\ Explorer|Offline\ Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa\ Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms\.it|Second\ Street\ Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport\ Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web\ Downloader|w3mir|Web\ Data\ Extractor|Web\ Image\ Collector|Web\ Sucker|Wweb|WebAuto|WebBandit|web\.by\.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo\ IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website\ eXtractor|Website\ Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon\ WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]
For more information, please see our original htaccess blacklist article, the Ultimate htaccess Blacklist.
Update: (April 30th, 2008) the blacklist has been edited to remove the DA character string. This is to prevent blocking of certain validation services such as those provided via the W3C. Thanks to John S. Britsios for identifying and sharing this information. :)
Update: (May 4th, 2008) the blacklist has been edited to (re)include the DA$ character string. Previously, the DA string matched various validation services because of the “da” string found in the terms “validator”, “validation”, etc. As reader Max explains, we can avoid this problem by appending a $ onto DA. Thus the blacklist has been edited to include the DA$ character string, which protects against the DA bot while allowing us to use various validation services. Thanks Max! ;)
About this article
Related articles
- Ultimate htaccess Blacklist
- Series Summary: Building the 3G Blacklist
- 4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots
- 4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers
- 2G Blacklist: Closing the Door on Malicious Attacks
- Building the 3G Blacklist, Part 3: Improving Site Security by Selectively Blocking Rogue User Agents
- Building the 3G Blacklist, Part 4: Improving the RedirectMatch Directives of the Original 2G Blacklist
Dialogue
62 Responses Jump to comment form
November 9, 2007 at 11:17 am
How does this affect blacklist affect Google crawling your site? There is no side effect?
December 11, 2007 at 7:00 pm
Excellent! This is much more better than the first list you made. I didn’t know that the user-agents name can all be in one line.
This may be offtopic but I have a question. Can I use the same technique (one line) to block ip address using the “deny from ip” code?
January 21, 2008 at 11:09 pm
Hei.
Thanks for this list.
I have my own .htaccess file that are included in my PHP-Nuke site. And i won’t mess it up, so not sure what is
RewriteBase / ?
And where should i but the RewriteEngine On ?On the top of my .htaccess or belove some RewriteCond and RewriteRule ?
Is very important that I don’t mess up my .htaccess, hehe
Thanks for very good homepage you got.
Best Regards
bjarbj78
January 23, 2008 at 9:57 am
Thank you very much Jeff :)
Do you know how to block proxy servers? I make a big list about 9139 proxy domains. But didn’t work when i use:
deny from proxydomain.com proxydomain2.com and so on.
Regards
bjarbj78
January 24, 2008 at 6:14 am
Hi Jeff.
Thank you very much. It’s works :)
lol, and i used almost 2 - 3 hours to complete my list, hehe
Regards
Bjørn
January 24, 2008 at 6:32 am
Hi Jeff.
I found those lines to:
rewritecond %{HTTP:Forwarded-For} !^$ [OR]
rewritecond %{HTTP:X-Forwarded} !^$ [OR]
Can I use those lines too?
January 25, 2008 at 11:10 am
Hi again Jeff :)
I have a question, can I use a wildcard to block an IP.. eg.:
deny from 74.54.143.*
or
deny from 74.54.*.*
The above IP Ranges has been scraping some of my blog contents and I’m getting tired of blocking them one by one.
I’ve Googled for an answer but can’t seem to find any hint on wildcards.
Hope you can help me ;)
January 27, 2008 at 9:38 am
Thanks Jeff :) You’re a life saver!!
January 30, 2008 at 4:33 am
It would be cool if you could provide
just the last 2 lines in a downloadable file,
that you would update and others could download periodically to
construct their .httaccess files from
cheers,
Pádraig.
January 30, 2008 at 4:39 am
Hey you have httrack and ia_archiver in there?
March 5, 2008 at 5:50 am
I’m testing this list and I’m having a problem with RewriteBase /
I’d prefer to use this list in my http.config but RewriteBase / is throwing an error “only valid on a per dir config files”. It it required for this list to work?
March 5, 2008 at 12:00 pm
:) COOL!!
Thanks
March 15, 2008 at 12:34 pm
After including this list in my file, I noticed that the W3 HTML and CSS validator would no longer work on my site, presumably because agent “Jigsaw/2.2.5 W3C_CSS_Validator_JFouffa/2.0″ etc has “DA” in the title. This can be resolved by changing agent “DA” to “DA$”.
April 21, 2008 at 2:04 am
Hi!
Add “anonymouse” to the http_user_agent list
April 23, 2008 at 7:58 am
anonymouse is a proxy site loaded with scumbags. I had added it to my list awhile ago along with SurveyBot|Nikto|MEGAUPLOAD|anonymouse|Java/1.0|CMS\ Spider
April 29, 2008 at 12:58 pm
I appreciate very much your great work, and I read your tutorial with great interest. I only have a experienced one problem that I could figure out how to solve it.
When I use your blocking list, I cannot use the W3C HTML and CSS validator for validating my site.
Also I cannot use the http://www.htmlhelp.com/cgi-bin/validate.cgi validator either.
Can you tell which user agent is in the list that blocks the above validators?
Thanks you very much in advance.
April 30, 2008 at 11:57 am
Thanks Jeff for the reply.
I found out which was blocking W3C Validator.
It was: DA
I just thought of sharing.
Thanks again for the great work.
Cheers,
John
May 1, 2008 at 2:51 am
John; you’re right - DA does block WWW validators - see my post #24 above!
May 12, 2008 at 11:26 am
Thanks for the great information. Much to learn about this .htaccess stuff!
June 10, 2008 at 5:31 pm
This is off topic but I want to know how to block leeching programs that download whole sites or all images according to sizes. They kill tons of bandwidth. How can I block them?
thanks
June 23, 2008 at 12:34 am
Hey thanks for the great reminder. I of all people should be reviewing my logs regularly just to make sure there aren’t any such hackers trying to get into the sensitive “admin” areas of my e-commerce site! Just imagine the danger that lurks around when we are not being watchful of this kind of stuff!
June 25, 2008 at 11:08 am
great list.I’ve added it to one of my sites.
I’m also thinking of adding the following but would like your opinion where or not its worthwhile
as documented here by andrew
http://www.andrewjmorris.com/site-hijacking-part-2.htm#comments
June 30, 2008 at 3:17 pm
Hi, im trying to block all webproxies to see my webpage but its impossile. It does not work for me.
I updated my .htaccess file with the code in this article, and i can access perfectly my home page using hidemyass.com
Any idea?
I need to block this because i recently ban abusive users from my home page, and they are using web proxies to spam my forum and register a lot of trolls accounts.
July 9, 2008 at 2:16 pm
@Perishable: Web proxies are very hard to block and almost impossible. The only thing I could come up with is checking referer (if it is not blocked) and then making a script to visit it in the future and see if it looks like a proxy script, and if so, block that referer. If a proxy blocks the referer from passing itself (as most do), then its tough luck.. Actually I just remembered you can probably grab the URL location via javascript and compare it to your domains.. That might actually be the best solution for web proxies.
July 23, 2008 at 2:49 pm
Thanks for updating the list to include validators. I was wondering why w3c was having issues.
January 13, 2009 at 11:01 am
when i google my site, it gets re-directed to a proxy site tshake.com
google has indexed this page, for the past 2 weeks.
any tips on how to fix this
January 13, 2009 at 11:37 am
sorry jeff, i thought this was o topic.
my apologies.
March 11, 2009 at 11:16 am
This doesn’t seem to help me, I may even have the wrong idea… but could I use version 1 and version 2 at the same time?
I seem to have a very intelligent sort of Spam system attacking my site, almost the human type.
Cheers
March 11, 2009 at 12:15 pm
@Jeff Starr: Thank you.
Cheers
March 11, 2009 at 10:00 pm
I found a testing system, YIPPEEE!
http://www.botsvsbrowsers.com/SimulateUserAgent.asp
This is neat! Add anything that exists in the “Ultimate htaccess Blacklist 1 OR 2″ you get your “Test Page” instead. I kept searching, believing there was one.
Cheers
March 16, 2009 at 12:31 am
@Jeff Starr: Anytime…
I am still a bit lost as to the “/”, “\”, “./” or “.\” (without quotes)
Some of the pages I read have it with “/” or “./” others have it with “\” or “.\”
March 16, 2009 at 10:06 am
Sorry, it’s this one;
Block Spam by Denying Access to No-Referrer Requests
I can’t figure out if the folder is needed and which slash to use?
RewriteCond %{REQUEST_URI} .folder/wp-comments-post\.php*
OR
RewriteCond %{REQUEST_URI} .folder\wp-comments-post\.php*
Cheers
March 16, 2009 at 10:18 am
Thanks Jeff,
I am still very new to .htaccess, the little things make a huge difference.
I appreciate your site and your help.
Cheers
Trackbacks / Pingbacks
[ Comments are closed for this post. ]
If you have additional information, contact me.
← Previous post • Next post →
« Miscellaneous Happenings • New Version of Category LiveBookmarks Plus for WordPress 2.3 »



1 • Simeon_seo
October 16, 2007 at 9:26 am
Why you block “Yandex”? It is a spider of the russian most popular search engine.