Ultimate .htaccess Blacklist 2: Compressed Version
In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement.
For example, as reader Greg suggests, a compressed version of the blacklist would be very useful. In this post, we present a compressed version of our Ultimate htaccess Blacklist that features around 50 new agents. Whereas the original blacklist is approximately 8.6KB in size, the compressed version is only 3.4KB, even with the additional agents. Overall, the compressed version requires fewer system resources to block a greater number of bad agents.
# Ultimate htaccess Blacklist 2 from Perishable Press
# Deny domain access to spammers and other scumbags
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston\ Project|BravoBrian\ SpiderEngine\ MarcoPolo|Bot\ mailto:craftbot@yahoo.com|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent\ Internet\ ToolPak|Custo|cyberalert|DA$|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo\ Pump|DISCoFinder|Download\ Demon|Download\ Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx\.net|Email\ Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express\ WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites\ Sweeper|Fetch|FEZhead|FileHound|FlashGet\ WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go\!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green\ Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http\ generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image\ Stripper|Image\ Sucker|imagefetch|IncyWincy|Indy*Library|Indy\ Library|informant|Ingelin|InterGET|Internet\ Ninja|InternetLinkagent|Internet\ Ninja|InternetSeer\.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC\ Web\ Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS\ ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac\ Finder|Mag-Net|Mass\ Downloader|MCspider|Memo|Microsoft.URL|MIDown\ tool|Mirror|Missigua\ Locator|Mister\ PiX|MMMtoCrawl\/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS\ FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net\ Vampire|NetZIP|NetZip\ Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline\ Explorer|Offline\ Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa\ Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms\.it|Second\ Street\ Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport\ Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web\ Downloader|w3mir|Web\ Data\ Extractor|Web\ Image\ Collector|Web\ Sucker|Wweb|WebAuto|WebBandit|web\.by\.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo\ IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website\ eXtractor|Website\ Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon\ WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]
For more information, please see our original htaccess blacklist article, the Ultimate htaccess Blacklist. And you also may be interested checking out the new and improved 6G Firewall.
Update: 2008/04/30
The blacklist has been edited to remove the DA
character string. This is to prevent blocking of certain validation services such as those provided via the W3C. Thanks to John S. Britsios for identifying and sharing this information. :)
Update: 2008/05/04
The blacklist has been edited to (re)include the DA$
character string. Previously, the DA
string matched various validation services because of the “da” string found in the terms “validator”, “validation”, etc. As reader Max explains, we can avoid this problem by appending a $
onto DA
. Thus the blacklist has been edited to include the DA$
character string, which protects against the DA
bot while allowing us to use various validation services. Thanks Max! ;)
61 responses to “Ultimate .htaccess Blacklist 2: Compressed Version”
Hi Bjorn,
Glad it worked for you! I have found it to be much more efficient than trying to block the endless swarm of proxy servers individually — things change waay too quickly for that. As for the additional rewrite conditions you mention, I have not used them.. perhaps you could test their effectiveness and share the results with us? ;)
Regards,
Jeff
Hi again Jeff :)
I have a question, can I use a wildcard to block an IP.. eg.:
deny from 74.54.143.*
or
deny from 74.54.*.*
The above IP Ranges has been scraping some of my blog contents and I’m getting tired of blocking them one by one.
I’ve Googled for an answer but can’t seem to find any hint on wildcards.
Hope you can help me ;)
Thanks Jeff :) You’re a life saver!!
Hi Lisa :)
Using htaccess, wildcards are not necessary when specifying specific ranges of IP addresses. Simply truncate the address to the target range, for example:
deny from 74.54.
..would block every IP beginning with the
74.54.
prefix:74.54.1
.
.
.
74.54.255
74.54.255.1
.
.
.
74.54.255.255
..etc. Of course, you could also use wildcards if so desired:
deny from 74.54.*
..which would block the exact same range as before. It’s totally your choice! :)
Ah, one more thing that I should point out.. Be careful when using the “dot” (
.
) in your blocked IP ranges. If we omit the dot from the example in the previous comment, we would block a different set of IP addresses:74.540
.
.
.
74.549
74.540.1
.
.
.
74.549.255
..etc. Inclusion or exclusion of the dot can make all of the difference!
It would be cool if you could provide
just the last 2 lines in a downloadable file,
that you would update and others could download periodically to
construct their .httaccess files from
cheers,
Padraig.
Hey you have httrack and ia_archiver in there?
Padraig Brady,
Yes, that is an excellent idea. I am currently in the process of building the next version of the Ultimate htaccess Blacklist (v.3), and will definitely post an article once it is finished. As you can see, this post features the second version of the Blacklist, and I continue to post many useful tips and tricks concerning all things htaccess, PHP, XHTML, CSS, and all of the other web-related acronyms ;) Thus, my advice to people who want to stay current with the Ultimate htaccess Blacklist as well as other security and anti-spam information is to subscribe to my feed — I guarantee you won’t regret it!
Regards,
Jeff_
I’m testing this list and I’m having a problem with RewriteBase /
I’d prefer to use this list in my http.config but RewriteBase / is throwing an error “only valid on a per dir config files”. It it required for this list to work?
Hi Peter, you are correct — you must remove the
RewriteBase /
directive when using the blacklist in yourhttpd.conf
file. It is meant for per-directory htaccess files and is not needed when working withhttpd.conf
.:) COOL!!
Thanks
After including this list in my file, I noticed that the W3 HTML and CSS validator would no longer work on my site, presumably because agent “Jigsaw/2.2.5 W3C_CSS_Validator_JFouffa/2.0” etc has “DA” in the title. This can be resolved by changing agent “DA” to “DA$”.