Book Sale! Code WP2025 takes 20% OFF our Pro Plugins & Books »
Web Dev + WordPress + Security
116 posts related to: Ultimate Block List to Stop AI Bots

Blacklist Candidate 2008-10-19

[ Photo: Television Flashback ]

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags.. From time to time on the show, a contestant places a bid that is so absurd and so asinine that you literally laugh out loud, point at the monitor, and openly ridicule the pathetic loser. On such occasions, even the host of the show will laugh and mock the idiocy. Of course, this […] Continue reading »

Blacklist Candidate Series Summary

An ongoing series of articles on the fine art of malicious exploit detection and prevention. Learn about preventing the sneaky mischievous and deceptive practices of some of the worst spammers, scrapers, crackers, and other scumbags on the Internet. Continue reading »

Evil Incarnate, but Easily Blocked

As my readers know, I spend a lot of time digging through error logs, preventing attacks, and reporting results. Occasionally, some moron will pull a stunt that deserves exposure, public humiliation, and banishment. There is certainly no lack of this type of nonsense, as many of you are well-aware. 3G Blacklist Even so, I have to admit that I am very happy with my latest strategy against crackers, spammers, and other scumbags, namely, the 3G Blacklist. Since implementing this effective […] Continue reading »

Yahoo! Once Again Caught Disobeying Robots.txt Rules

Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders: Continue reading »

Redirect All Requests for a Nonexistent File to the Actual File

In my previous article on redirecting 404 requests for favicon files, I presented an HTAccess technique for redirecting all requests for nonexistent favicon.ico files to the actual file located in the site’s web-accessible root directory: # REDIRECT FAVICONZ <ifmodule mod_rewrite.c> RewriteCond %{THE_REQUEST} favicon.ico [NC] RewriteRule (.*) http://domain.tld/favicon.ico [R=301,L] </ifmodule> As discussed in the article, this code is already in effect here at Perishable Press, as may be seen by clicking on any of the following links: Update: I’ve removed the […] Continue reading »

Stop the Madness: Redirect those Ridiculous Favicon 404 Requests

For the last several months, I have been seeing an increasing number of 404 errors requesting “favicon.ico” appended onto various URLs: http://example.com/favicon.ico http://example.com/2007/06/12/favicon.ico http://example.com/2007/09/25/absolute-horizontal-and-vertical-centering-via-css/favicon.ico http://example.com/2007/08/01/temporary-site-redirect-for-visitors-during-site-updates/favicon.ico http://example.com/2007/01/16/maximum-and-minimum-height-and-width-in-internet-explorer/favicon.ico When these errors first began appearing in the logs several months ago, I didn’t think too much of it — “just another idiot who can’t find my site’s favicon..” As time went on, however, the frequency and variety of these misdirected requests continued to increase. A bit frustrating perhaps, but not serious enough to […] Continue reading »

Unexplained Crawl Behavior Involving Tagged Query Strings

I need your help! I am losing my mind trying to solve another baffling mystery. For the past three or four months, I have been recording many 404 Errors generated from msnbot, Yahoo-Slurp, and other spider crawls. These errors result from invalid requests for URLs containing query strings such as the following: https://example.com/press/page/2/?tag=spam https://example.com/press/page/3/?tag=code https://example.com/press/page/2/?tag=email https://example.com/press/page/2/?tag=xhtml https://example.com/press/page/4/?tag=notes https://example.com/press/page/2/?tag=flash https://example.com/press/page/2/?tag=links https://example.com/press/page/3/?tag=theme https://example.com/press/page/2/?tag=press Note: For these example URLs, I replaced my domain, perishablepress.com with the generic example.com. Turns out that listing the plain-text […] Continue reading »

Taking Advantage of the X-Robots Tag

Controlling the spidering, indexing and caching of your (X)HTML-based web pages is possible with meta robots directives such as these: <meta name="googlebot" content="index,archive,follow,noodp"/> <meta name="robots" content="all,index,follow"/> <meta name="msnbot" content="all,index,follow"/> I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots”1 crawl and represent my (X)HTML-based content in search results. For other, non-(X)HTML types of content, however, using meta robots directives to control indexing and caching is not an option. An […] Continue reading »

Blacklist Candidate Number 2008-05-31

[ Photo: Bob Barker waves at you ]

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags.. Just under the wire! Even so, this month’s official Blacklist-Candidate article may be the last monthly installment of the series. Although additional BC articles may appear in the future, it is unlikely that they will continue as a regular monthly feature. Oh sure, I see the tears streaming down your face, but think […] Continue reading »

Series Summary: Building the 3G Blacklist

[ 3G Stormtrooper ]

In the now-complete series, Building the 3G Blacklist, I share insights and discoveries concerning website security and protection against malicious attacks. Each article in the series focuses on unique blacklist strategies designed to protect sites transparently, effectively, and efficiently. The five articles culminate in the release of the next generation 3G Blacklist. Here is a quick summary of the entire Building the 3G Blacklist series: Continue reading »

Perishable Press 3G Blacklist

[ 3G Stormtroopers ]

After much research and discussion, I have developed a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Over time, these mega-lists became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by […] Continue reading »

Building the 3G Blacklist, Part 5: Improving Site Security by Selectively Blocking Individual IPs

[ 3G Stormtroopers (Red Version) ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. Wrapping up the series with this article, I provide the final key to our comprehensive blacklist strategy: selectively blocking individual IPs. Previous articles also focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. In the next article, these five articles will culminate in the release of the next generation 3G Blacklist. Continue reading »

Building the 3G Blacklist, Part 4: Improving RedirectMatch in the Original 2G Blacklist

[ 3G Stormtroopers (Team Aqua) ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this fourth article, I build upon previous ideas and techniques by improving the directives contained in the original 2G Blacklist. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist. Continue reading »

Building the 3G Blacklist, Part 3: Improving Security by Blocking Rogue User Agents

[ 3G Stormtroopers (Deep Purple) ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this third article, I discuss targeted, user-agent blacklisting and present an alternate approach to preventing site access for the most prevalent and malicious user agents. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G […] Continue reading »

Building the 3G Blacklist, Part 2: Improving Security by Preventing Query-String Exploits

[ 3G Stormtroopers (Green Machine) ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this second article, I present an incredibly powerful method for eliminating malicious query string exploits. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist. Improving Security by Preventing Query String Exploits A vast […] Continue reading »

Building the 3G Blacklist, Part 1: Improving Security by Exploiting Server Attack Patterns

[ 3G Stormtroopers (Blue Dream) ]

In this series of five articles, I share insights and discoveries concerning website security and protecting against malicious attacks. In this first article of the series, I examine the process of identifying attack trends and using them to immunize against future attacks. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist. Improving […] Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
.htaccess made easy: Improve site performance and security.
Thoughts
Finally finished my ultimate block list to stop AI bots :) Blocks over 100 AI bots!
After 10 years working late at night, my schedule has changed. I am now a “morning person”, starting my day at 6am or earlier.
Nice update for Wutsearch search engine launchpad. Now with 19 engines including Luxxle AI-powered search.
New version of 8G Firewall (v1.4) now available for download :)
Wishing everyone a prosperous and bright New Year!
I disabled AI in Google search results. It was making me lazy.
Went out walking today and soaked up some sunshine. It felt good.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.