Latest TweetsGreat post about the latest power grab: www.eff.org/deeplinks/2018/09/…
Perishable Press

Ultimate .htaccess Blacklist 2: Compressed Version

[ Image: Lunar Eclipse ] In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement.

For example, as reader Greg suggests, a compressed version of the blacklist would be very useful. In this post, we present a compressed version of our Ultimate htaccess Blacklist that features around 50 new agents. Whereas the original blacklist is approximately 8.6KB in size, the compressed version is only 3.4KB, even with the additional agents. Overall, the compressed version requires fewer system resources to block a greater number of bad agents.

# Ultimate htaccess Blacklist 2 from Perishable Press
# Deny domain access to spammers and other scumbags
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston\ Project|BravoBrian\ SpiderEngine\ MarcoPolo|Bot\ mailto:craftbot@yahoo.com|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent\ Internet\ ToolPak|Custo|cyberalert|DA$|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo\ Pump|DISCoFinder|Download\ Demon|Download\ Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx\.net|Email\ Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express\ WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites\ Sweeper|Fetch|FEZhead|FileHound|FlashGet\ WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go\!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green\ Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http\ generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image\ Stripper|Image\ Sucker|imagefetch|IncyWincy|Indy*Library|Indy\ Library|informant|Ingelin|InterGET|Internet\ Ninja|InternetLinkagent|Internet\ Ninja|InternetSeer\.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC\ Web\ Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS\ ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac\ Finder|Mag-Net|Mass\ Downloader|MCspider|Memo|Microsoft.URL|MIDown\ tool|Mirror|Missigua\ Locator|Mister\ PiX|MMMtoCrawl\/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS\ FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net\ Vampire|NetZIP|NetZip\ Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline\ Explorer|Offline\ Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa\ Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms\.it|Second\ Street\ Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport\ Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web\ Downloader|w3mir|Web\ Data\ Extractor|Web\ Image\ Collector|Web\ Sucker|Wweb|WebAuto|WebBandit|web\.by\.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo\ IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website\ eXtractor|Website\ Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon\ WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]

For more information, please see our original htaccess blacklist article, the Ultimate htaccess Blacklist. And you also may be interested checking out the new and improved 6G Firewall.

Update: 2008/04/30

The blacklist has been edited to remove the DA character string. This is to prevent blocking of certain validation services such as those provided via the W3C. Thanks to John S. Britsios for identifying and sharing this information. :)

Update: 2008/05/04

The blacklist has been edited to (re)include the DA$ character string. Previously, the DA string matched various validation services because of the “da” string found in the terms “validator”, “validation”, etc. As reader Max explains, we can avoid this problem by appending a $ onto DA. Thus the blacklist has been edited to include the DA$ character string, which protects against the DA bot while allowing us to use various validation services. Thanks Max! ;)

Jeff Starr
About the Author Jeff Starr = Web Developer. Security Specialist. WordPress Buff.
Archives
61 responses
  1. Why you block “Yandex”? It is a spider of the russian most popular search engine.

  2. Jeff Starr

    Good question.. adding it to the list was suggested by a fellow forum member awhile ago. I sniffed around Google a bit but found no evidence of maleficence, so I have removed it from the list. Thanks for pointing it out to us ;)

  3. How does this affect blacklist affect Google crawling your site? There is no side effect?

  4. Jeff Starr

    John, rest assured that there are absolutely no googlebots mentioned in the blacklist. We are targeting the bad guys here, not legitimate agents such as Google, Yahoo!, and MSN. No worries! ;)

  5. Excellent! This is much more better than the first list you made. I didn’t know that the user-agents name can all be in one line.

    This may be offtopic but I have a question. Can I use the same technique (one line) to block ip address using the “deny from ip” code?

  6. Jeff Starr

    Lisa you are wearing me out! :)

    Anyway, to answer your question, yes, you can list multiple IP addresses in one line when using a “deny from ip” directive. Here is the syntax that you would use:

    deny from 123.123.123 456.456.456 789.789.789

    ..etc. Notice that we are using spaces to separate the individual IPs — no commas allowed ;)

  7. Hei.

    Thanks for this list.
    I have my own .htaccess file that are included in my PHP-Nuke site. And i won’t mess it up, so not sure what is
    RewriteBase / ?

    And where should i but the RewriteEngine On ?On the top of my .htaccess or belove some RewriteCond and RewriteRule ?

    Is very important that I don’t mess up my .htaccess, hehe

    Thanks for very good homepage you got.

    Best Regards

    bjarbj78

  8. Jeff Starr

    Hei bjarbj78,

    RewriteBase specifies the default URL for a directory. It is often used to remove the local directory prefix in order for the rewrite rules to act upon the correct file path. Don’t worry, if it sounds confusing, it is — I still find myself referring to the source when trying to explain it to people :)

    Regards,
    Jeff

  9. Thank you very much Jeff :)
    Do you know how to block proxy servers? I make a big list about 9139 proxy domains. But didn’t work when i use:

    deny from proxydomain.com proxydomain2.com and so on.

    Regards
    bjarbj78

  10. Jeff Starr

    The question is, even if you could use htaccess to block over 9,000 domains, would you really want to? If you consider the potential performance hit and excessive load on server resources associated with the perpetual processing of such a monstrous list, it may inspire you to seek a healthier, perhaps more effective alternative..

    For example, here is the code that I use for stopping 99% of the proxies that attempt to access one of my sites:

    # prevent proxy access
    RewriteEngine on
    RewriteCond %{HTTP:VIA} !^$ [OR]
    RewriteCond %{HTTP:FORWARDED} !^$ [OR]
    RewriteCond %{HTTP:USERAGENT_VIA} !^$ [OR]
    RewriteCond %{HTTP:X_FORWARDED_FOR} !^$ [OR]
    RewriteCond %{HTTP:PROXY_CONNECTION} !^$ [OR]
    RewriteCond %{HTTP:XPROXY_CONNECTION} !^$ [OR]
    RewriteCond %{HTTP:HTTP_PC_REMOTE_ADDR} !^$ [OR]
    RewriteCond %{HTTP:HTTP_CLIENT_IP} !^$
    RewriteRule .* - [F]

    Place in root htaccess and take ’er for a spin via any proxy service(s) of your choice.. – it may not be perfect, but it’s lightweight, concise, and very effective ;)

  11. Hi Jeff.

    Thank you very much. It’s works :)
    lol, and i used almost 2 – 3 hours to complete my list, hehe

    Regards

    Bjorn

  12. Hi Jeff.

    I found those lines to:

    rewritecond %{HTTP:Forwarded-For} !^$ [OR]
    rewritecond %{HTTP:X-Forwarded} !^$ [OR]

    Can I use those lines too?

[ Comments are closed for this post ]