Save $10 on my new book, Wizard’s SQL Recipes with code: WIZARD2022
Web Dev + WordPress + Security
142 posts related to: How to Block Bad Bots

Yahoo! Slurp too Stupid to be a Robot

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our sites every minute of the day. For the most part, there are effective methods available enabling […] Continue reading »

Building the Perishable Press 4G Blacklist

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works […] Continue reading »

Controlling Proxy Access with HTAccess

In my recent article on blocking proxy servers, I explain how to use HTAccess to deny site access to a wide range of proxy servers. The method works great, but some readers want to know how to allow access for specific proxy servers while denying access to as many other proxies as possible. Fortunately, the solution is as simple as adding a few lines to my original proxy-blocking method. Specifically, we may allow any requests coming from our whitelist of […] Continue reading »

Eight Ways to Block and Redirect with Apache’s mod_rewrite

[ #1 ]

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with Apache mod_rewrite. Note: I changed the title of this post from “Eight Ways to Blacklist..” to “Eight Ways to […] Continue reading »

Redirect All (Broken) Links from any Domain via HTAccess

Here’s the scene: you have been noticing a large number of 404 requests coming from a particular domain. You check it out and realize that the domain in question has a number of misdirected links to your site. The links may resemble legitimate URLs, but because of typographical errors, markup errors, or outdated references, they are broken, leading to nowhere on your site and producing a nice 404 error for every request. Ugh. Or, another painful scenario would be a […] Continue reading »

Yahoo! Lies about Obeying Robots.txt Directives

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: Continue reading »

Better Default Directory Views with HTAccess

[ Screenshot: Default Directory View ]

Beautify your default directory listings! Displaying index-less file views is a great way to share files, but the drab, bare-bones interface is difficult to integrate into existing designs. While there are many scripts available to customize the appearance and functionality of default directory navigation, most of these methods are either too complicated, too invasive, or otherwise insufficient for expedient directory styling. In this comprehensive tutorial, you will learn how to use the built-in functionality of Apache’s mod_autoindex module to style […] Continue reading »

Blacklist Candidate 2008-10-19

[ Photo: Television Flashback ]

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags.. From time to time on the show, a contestant places a bid that is so absurd and so asinine that you literally laugh out loud, point at the monitor, and openly ridicule the pathetic loser. On such occasions, even the host of the show will laugh and mock the idiocy. Of course, this […] Continue reading »

Redirecting Subdirectories to the Root Directory via HTAccess

One of the most useful techniques in my HTAccess toolbox involves URL redirection using Apache’s RedirectMatch directive. With RedirectMatch, you get the powerful regex pattern matching available in the mod_alias module combined with the simplicity and effectiveness of the Redirect directive. This hybrid functionality makes RedirectMatch the ideal method for highly specific redirection. In this tutorial, we will explore the application of RedirectMatch as it applies to one of the most difficult redirect scenarios: redirecting all requests for a specific […] Continue reading »

Blacklist Candidate Series Summary

An ongoing series of articles on the fine art of malicious exploit detection and prevention. Learn about preventing the sneaky mischievous and deceptive practices of some of the worst spammers, scrapers, crackers, and other scumbags on the Internet. Continue reading »

Evil Incarnate, but Easily Blocked

As my readers know, I spend a lot of time digging through error logs, preventing attacks, and reporting results. Occasionally, some moron will pull a stunt that deserves exposure, public humiliation, and banishment. There is certainly no lack of this type of nonsense, as many of you are well-aware. 3G Blacklist Even so, I have to admit that I am very happy with my latest strategy against crackers, spammers, and other scumbags, namely, the 3G Blacklist. Since implementing this effective […] Continue reading »

Yahoo! Once Again Caught Disobeying Robots.txt Rules

Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders: Continue reading »

Redirect All Requests for a Nonexistent File to the Actual File

In my previous article on redirecting 404 requests for favicon files, I presented an HTAccess technique for redirecting all requests for nonexistent favicon.ico files to the actual file located in the site’s web-accessible root directory: # REDIRECT FAVICONZ <ifmodule mod_rewrite.c> RewriteCond %{THE_REQUEST} favicon.ico [NC] RewriteRule (.*) http://domain.tld/favicon.ico [R=301,L] </ifmodule> As discussed in the article, this code is already in effect here at Perishable Press, as may be seen by clicking on any of the following links: Update: I’ve removed the […] Continue reading »

Stop the Madness: Redirect those Ridiculous Favicon 404 Requests

For the last several months, I have been seeing an increasing number of 404 errors requesting “favicon.ico” appended onto various URLs: http://example.com/favicon.ico http://example.com/2007/06/12/favicon.ico http://example.com/2007/09/25/absolute-horizontal-and-vertical-centering-via-css/favicon.ico http://example.com/2007/08/01/temporary-site-redirect-for-visitors-during-site-updates/favicon.ico http://example.com/2007/01/16/maximum-and-minimum-height-and-width-in-internet-explorer/favicon.ico When these errors first began appearing in the logs several months ago, I didn’t think too much of it — “just another idiot who can’t find my site’s favicon..” As time went on, however, the frequency and variety of these misdirected requests continued to increase. A bit frustrating perhaps, but not serious enough to […] Continue reading »

Perishable Press HTAccess Spring Cleaning, Part 2

[ Spring Cleaning ]

Before Summer arrives, I need to post the conclusion to my seasonal article, Perishable Press HTAccess Spring Cleaning, Part 1. As explained in the first post, I recently spent some time to consolidate and optimize the Perishable Press site-root and blog-root HTAccess files. Since the makeover, I have enjoyed better performance, fewer errors, and cleaner code. In this article, I share some of the changes made to the blog-root HTAccess file and provide a brief explanation as to their intended […] Continue reading »

Blacklist Candidate Number 2008-05-31

[ Photo: Bob Barker waves at you ]

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags.. Just under the wire! Even so, this month’s official Blacklist-Candidate article may be the last monthly installment of the series. Although additional BC articles may appear in the future, it is unlikely that they will continue as a regular monthly feature. Oh sure, I see the tears streaming down your face, but think […] Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
WP Themes In Depth: Deep dive into WP theme development.
Thoughts
Wrapping up book launch. Now focusing on plugin updates for WP 5.9, due Jan. 25th.
Soft launch for my new book: Wizard’s SQL Recipes for WordPress
New book around 200 pages, almost done!
Making great strides on my new book. Planned release in December :)
To organize my life, I keep it simple. online: plain text files, offline: sticky notes.
Official list of Googlebot IP addresses.
Lot of 1s in today’s date 20211111.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.