Ultimate htaccess Blacklist
Published Thursday, June 28, 2007 @ 10:46 am • 62 Responses
For those of us running Apache, htaccess rewrite rules provide an excellent way to block spammers, scrapers, and other scumbags easily and effectively. While there are many htaccess tricks involving blocking domains, preventing access, and redirecting traffic, Apache’s mod_rewrite module enables us to target bad agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any matches are immediately and quietly denied access.
There are many ways to obtain an effective htaccess blacklist. There are several excellent forums around the web that provide a plethora of priceless htaccess advice. Highly suggested. Additionally, after copying and pasting your favorite forum blacklist examples to your domain’s root htaccess file, you will want to continue with its development by tracking bandwidth thieves, comment spammers, and site scrapers and adding them to the list. Or, you may wish to skip the tedious grunt work and simply grab a copy of the Ultimate htaccess Blacklist!
The Ultimate htaccess Blacklist began as a short list of only the most heinous offenders. Blocking scum was such an enjoyable activity that we soon added to the list the identity of every nasty agent we could find. The result has been a very low-stress, spam-free site with virtually zero stolen bandwidth. The list is fairly comprehensive and attempts to blacklist as many site rippers, grabbers, spammers and bad bots as possible. While no blacklist could ever block them all (nor would they want to using this method)1, an elaborate htaccess blacklist can do wonders to improve overall performance, decrease site maintenance, and reduce server expense. Overall, we consider this blacklist a great foundation on which to build and customize your own ultimate htaccess blacklist!2
So without further ado, here is our version of the ultimate htaccess blacklist, as promised. Simply copy and paste the following code into the root htaccess file of your site to enjoy a serious reduction in wasted bandwidth, stolen resources, and comment spam. Don’t forget to backup your data and test everything, etc. — After that, you’re good to go!
The Ultimate htaccess Blacklist
# Ultimate htaccess Blacklist from Perishable Press
# Deny domain access to spammers and other scumbags
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} almaden [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^CICC [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} email [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go\!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetLinkagent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JBH*agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MS\ FrontPage* [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^PingALink [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^sproose [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^URLSpiderPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
Footnotes
- 1 Note: although this blacklist is highly effective at eliminating unwanted scum, its immense length requires extra processing and may affect the performance of your server. In our experience, employing this list (along with several other htaccess directives) for over two years has resulted in zero noticeable performance issues. Nonetheless, this may not be an ideal solution for sites with extreme levels of visitor traffic. [ ^ ]
- 2 To begin building your own customized blacklist, you may want to check out the excellent list offered at joemaller.com. Thanks, Joe! [ ^ ]
- 3 Update (October 14,2007): To reduce confusion and consolidate htaccess rules, the last two lines have been removed from the blacklist. These two lines are not required for the blacklist to work as intended:
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.perishablepress.com.* - [F,L]
About this article
Related articles
- Ultimate htaccess Blacklist 2 (Compressed Version)
- 4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers
- How to Block Proxy Servers via htaccess
- 4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots
- Series Summary: Building the 3G Blacklist
- Hacking WordPress: The Ultimate Nofollow Blacklist
- Creating the Ultimate htaccess Anti-Hotlinking Strategy
Dialogue
62 Responses Jump to comment form
July 16, 2007 at 2:01 pm
OK, thanks.
I ran into a problem with the blacklist provided, however, and think there might be a typo.
The following line causes an internal error on my server. If I add a space after the DISCo slash, then it works fine again:
RewriteCond %{HTTP_USER_AGENT} ^DISCo\Pump [OR]
Thanks for the great blacklist. I’m now using it in place of the less comprehensive version I obtained from here:
http://www.javascriptkit.com/howto/htaccess13.shtml
Phil.
July 22, 2007 at 3:52 pm
Excellent blacklist for the htaccess file!
It even had a few on there I haven’t seen before! :)
I rant on this subject on my blog all the time:
http://www.a-daily-rant.com/
October 6, 2007 at 2:41 am
Thanks so much, with any doubt, the best list ever.Your site is fantastic !!
I’m wondering if you could privude the same list but in a “compress version” ?
Likethis for exemple: RewriteCond %{HTTP_USER_AGENT} ADSARobot|Anarchie|ASPSeek|Atomz|BackWeb|Bandit|…and so much more… with the [OR] and/or the [NC,OR] at the ends of lines.
Thank you agnain !
October 6, 2007 at 2:46 am
You can see, wht I’m talking about here: http://www.toulouse-renaissance.net/c_outils/c_htaccess_compact.htm
;-)
October 15, 2007 at 12:09 am
Thanks - just a heads up that this line causes the whole list to fail on my server (it ignores everything and lets every bot through). Comment it out and there is no problem
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
October 15, 2007 at 1:15 pm
Well done !
Your Welcome :-)
October 25, 2007 at 6:19 pm
Hi,
glad to find up to date info about referer spam. If i get the point, referer spam is created using forged referer info in the http request, but can’t the user agent be forged also ?
October 27, 2007 at 3:08 am
I set up the compress list and gonna check my logs ;)
Thanx
October 27, 2007 at 8:19 pm
great list thnx
December 11, 2007 at 6:04 pm
Wow! This is one big list :) By default, my .htaccess file already has the line “RewriteEngine on” and
“RewriteBase /”. I don’t need to rewrite them right?
I’m a little confused with how .htaccess works.
January 21, 2008 at 4:38 am
Hi.
Thanks for your list :) It’s your list up to date?
January 21, 2008 at 10:56 pm
Hi Perishable :)
Thanks for your answers:)
I’m gonna take a look at your link.
Thanks
February 24, 2008 at 11:03 pm
Hello Jef,
I suffer a lot from site scrappers.
I have this list in my site.
For test purposes i tried to (scrap) my site using Offline Explorer but it was not blocked.
I also tried the same thing with your site (just for 30 sec) and did not get Offline Explorer blocked.
What should i do to block it ?
How can i determine which software the scrapper is using ? The raw log file shows that ?
Thanks,
Rasheed.
February 26, 2008 at 7:12 pm
Hi Jeff,
I found they changed the agent name to IE.
If i block it also Internet Explorer will be blocked (403).
Frustrating !
February 28, 2008 at 12:24 pm
If you have RSS feeds, something in this list blocks Google and Yahoo from being able to read your RSS XML. Feedburner was still able to update. I put this on my site on Feb-16 and just today checked my feeds on My-Google and Yahoo. Both stopped updating on Feb-16. I commented out just this blacklist from my .htaccess file, waited about 30 minutes and sure enough, Google and Yahoo both updated to today. :\
March 3, 2008 at 9:43 am
Hi
Does this even have to go into a .htaccess file? Can’t it be used more globally by having it in a apache config file?
March 3, 2008 at 10:01 am
Thanks for the follow up. With several domains its much easier to maintain doing this globally.
I log all activity to a database and the amount of 403 I’m getting is impressive between this and the 2G Blacklist.
:)
July 9, 2008 at 2:16 pm
Thanks for this list and the updated list. I love watching all the denials in the logs.. Is that creepy?
September 12, 2008 at 6:46 am
I have noticed that comment spammers are bypassing this security step, and just forging my sites own referer ID. Has anyone discovered a way to detect a forged referer ID?
January 9, 2009 at 7:00 am
My ISP just upgraded to PHP5, and I discovered that an “old” version of this list was caused 500 server errors. I replaced it with the new list, and it’s smiles all around again. I’m just leaving this note as a a thank you and as a “heads up” to anyone else running into a PHP5 panic attack.
Thanks again !!!
March 11, 2009 at 2:26 am
Hi Mark
I get the same with both lists.
Which list did you use?
Cheers
March 11, 2009 at 2:45 am
Fixed, it was a syntax issue!
“Order allow,deny” not “order allow,deny” Case sensitive.
Cheers
March 11, 2009 at 6:09 am
Hello !
I can’t remember now, but it seems you located the error already anyway,
so it’s all good now, I suppose :)
Thanks again,
Mark
March 11, 2009 at 7:16 am
@Jeff Starr: Sorry I didn’t post any code. My 500 error was from the wrong case in “order - should be Order”
Order allow,deny
allow from all
deny from eudora.com
deny from bravenet.com
deny from tripod.com
deny from lethadinan.pib.ir
deny from xanga.com
deny from iblogme.com
March 11, 2009 at 8:33 am
Cool will do for next time. I still can’t figure out how to test it though.
I have searched for a .htaccess testing system but couldn’t find one.
Cheers
June 9, 2009 at 8:55 pm
Silly Question: Couldn’t the spammers just set an user-agent of mozilla? or are you assuming they are not that smart?
June 10, 2009 at 2:54 pm
Hi Jeff,
Good deal, makes sense.
Thanks!
-John
June 13, 2009 at 9:25 am
This seems very useful, thanks.
Unfortunately, my hosting provider (GoDaddy) doesn’t provide detailed logs, so it is hard to know how much difference this is making.
Are there any disadvantages to having a long .htaccess file, in terms of site performance?
Trackbacks / Pingbacks
- Max Design » Some links for light reading (3/7/07)
- htaccess and testing by Web Development
- links for 2007-07-12 | MY Vast Right Wing Conspiracy
- SitePoint Blogs » News Wire: PHP 4’s Days are Numbered
- Interesting posts this week… » Incessant Expressions
- Liens d’août 2007 .:::::. SkyMinds.Net
- A to Z of WordPress .htaccess Hacks | Nometech.com
- Dealing with Spam Comments. | Nometech.com
- Dealing with Spam Comments. | WPShout.com
- A to Z of WordPress .htaccess Hacks | WPShout.com
- 从A到Z!26个用于 Wordpress 的 .htaccess 规则 | 所以说
- 从A到Z!26个用于 Wordpress 的 .htaccess 规则 - 葡萄树 On The Road — 我只是一个会操作计算机的民工
- What to do if you find your website hacked | Benjamin Ashcroft
Share your thoughts..
← Previous post • Next post →
« Use PHP to Create Symbolic Links without Shell Access • Major Problem with cPanel Hotlink Protection and htaccess »



1 • Phil
July 15, 2007 at 3:32 pm
Probably just being dim, but I don’t get the final two lines (where the condition is that the referrer is
http://www.iaea.org). Is there some spambot that pretends to be an incoming link from the IAEA?