Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

Ultimate .htaccess Blacklist 2: Compressed Version

[ Image: Lunar Eclipse ] In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement.

For example, as reader Greg suggests, a compressed version of the blacklist would be very useful. In this post, we present a compressed version of our Ultimate htaccess Blacklist that features around 50 new agents. Whereas the original blacklist is approximately 8.6KB in size, the compressed version is only 3.4KB, even with the additional agents. Overall, the compressed version requires fewer system resources to block a greater number of bad agents.

# Ultimate htaccess Blacklist 2 from Perishable Press
# Deny domain access to spammers and other scumbags
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston\ Project|BravoBrian\ SpiderEngine\ MarcoPolo|Bot\ mailto:craftbot@yahoo.com|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent\ Internet\ ToolPak|Custo|cyberalert|DA$|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo\ Pump|DISCoFinder|Download\ Demon|Download\ Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx\.net|Email\ Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express\ WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites\ Sweeper|Fetch|FEZhead|FileHound|FlashGet\ WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go\!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green\ Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http\ generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image\ Stripper|Image\ Sucker|imagefetch|IncyWincy|Indy*Library|Indy\ Library|informant|Ingelin|InterGET|Internet\ Ninja|InternetLinkagent|Internet\ Ninja|InternetSeer\.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC\ Web\ Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS\ ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac\ Finder|Mag-Net|Mass\ Downloader|MCspider|Memo|Microsoft.URL|MIDown\ tool|Mirror|Missigua\ Locator|Mister\ PiX|MMMtoCrawl\/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS\ FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net\ Vampire|NetZIP|NetZip\ Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline\ Explorer|Offline\ Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa\ Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms\.it|Second\ Street\ Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport\ Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web\ Downloader|w3mir|Web\ Data\ Extractor|Web\ Image\ Collector|Web\ Sucker|Wweb|WebAuto|WebBandit|web\.by\.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo\ IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website\ eXtractor|Website\ Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon\ WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]

For more information, please see our original htaccess blacklist article, the Ultimate htaccess Blacklist. And you also may be interested checking out the new and improved 6G Firewall.

Update: 2008/04/30

The blacklist has been edited to remove the DA character string. This is to prevent blocking of certain validation services such as those provided via the W3C. Thanks to John S. Britsios for identifying and sharing this information. :)

Update: 2008/05/04

The blacklist has been edited to (re)include the DA$ character string. Previously, the DA string matched various validation services because of the “da” string found in the terms “validator”, “validation”, etc. As reader Max explains, we can avoid this problem by appending a $ onto DA. Thus the blacklist has been edited to include the DA$ character string, which protects against the DA bot while allowing us to use various validation services. Thanks Max! ;)

About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
The Tao of WordPress: Master the art of WordPress.

61 responses to “Ultimate .htaccess Blacklist 2: Compressed Version”

  1. Perishable 2008/03/15 5:54 pm

    Thanks for the tip, Max. I will definitely include this modification in the next iteration of the blacklist. Cheers! ;)

  2. Hi!
    Add “anonymouse” to the http_user_agent list

  3. Perishable 2008/04/21 7:51 am

    Sure, I am more than willing to add it the next incarnation, provided some sort of explanation as to why “anonymouse” (or any other user agent for that matter) deserves to be blacklisted. Drop a link and we’ll go from there.. ;)

  4. anonymouse is a proxy site loaded with scumbags. I had added it to my list awhile ago along with SurveyBot|Nikto|MEGAUPLOAD|anonymouse|Java/1.0|CMS Spider

  5. Perishable 2008/04/23 8:33 am

    Yes, I have also read elsewhere that blocking anonymouse is a wise move. I will add the directive to the next incarnation of the blacklist. Thanks for helping to improve the list! :)

  6. John S. Britsios 2008/04/29 12:58 pm

    I appreciate very much your great work, and I read your tutorial with great interest. I only have a experienced one problem that I could figure out how to solve it.

    When I use your blocking list, I cannot use the W3C HTML and CSS validator for validating my site.

    Can you tell which user agent is in the list that blocks the above validators?

    Thanks you very much in advance.

  7. Perishable 2008/04/29 2:14 pm

    Hi John, I’m glad you enjoy the article. As for the blacklist, are you sure that it is the cause of the problem? It may very well be, however, many people use the list without any similar issues. Nonetheless, if you have determined that the source of the conflict involves the blacklist rules, a bit of further testing should help you target the problematic user agents. First, remove the first half of the user agents and test if things are working. If so, replace half of the removed agents and test again. Likewise, if the issue did not resolve after removing the first half, repeat the process with the second half. By repeating this process a few times (it shouldn’t take too long), you should be able to identify the conflicting agents. I hope that makes sense. I would do this exercise myself, but I am literally swamped with a million other tasks and chores that I have to get done. In any case, I hope this information is useful, as I definitely want to help you implement successfully the ultimate htaccess blacklist. Finally, if you do find problematic user agent(s), please drop a comment and share with the community here at Perishable Press.

    Cheers,
    Jeff

  8. John S. Britsios 2008/04/30 11:57 am

    Thanks Jeff for the reply.
    I found out which was blocking W3C Validator.

    It was: DA

    I just thought of sharing.

    Thanks again for the great work.

    Cheers,

    John

  9. Perishable 2008/04/30 1:32 pm

    Hi John, thank you for taking the time to share this information. I have removed the DA character string from the list and updated the article so that others may benefit from your work (and generosity). Thank you for helping to improve the quality of the blacklist! :)

    Kind regards,
    Jeff

  10. John; you’re right – DA does block WWW validators – see my post #24 above!

  11. Perishable 2008/05/01 5:28 am

    Doh! I completely forgot about Max’s comment concerning the DA string! Apparently age is directly proportional to forgetfulness (I should have scrolled thru the comments before responding) — would’ve saved everyone some time. Max’s solution also enables us to continue blocking the DA agent while allowing access to the various validation services. Once I return to the computer (this weekend), I will update the article and the blacklist with this information. Huge apologies to Max — thanks for the gentle reminder ;)

  12. Perishable 2008/05/04 8:30 am

    The blacklist and article have been updated to (re)include the DA$ character string. Many thanks to Max for pointing this out. Cheers!

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Digging Into WordPress: Take your WordPress skills to the next level.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.