Save 40% on Pro WordPress plugins with discount code: BLACKFRIDAY21
Web Dev + WordPress + Security
142 posts related to: How to Block Bad Bots

Canonical URLs and Subdomains with Plesk

I am in the process of migrating my sites from A Small Orange to Media Temple. Part of that process involves canonicalizing domain URLs to help maximize SEO strategy. At ASO, URL canonicalization required just a few htaccess directives: # enforce no www prefix <ifmodule mod_rewrite.c> RewriteCond %{HTTP_HOST} !^domain\.tld$ [NC] RewriteRule ^(.*)$ http://domain.tld/$1 [R=301,L] </ifmodule> When placed in the web-accessible root directory’s htaccess file, that snippet will ensure that all requests for your site are not prefixed with www. There’s […] Continue reading »

Latest Blacklist Entries

Recently cleared several megabytes of log files, detecting patterns, recording anomalies, and blacklisting gross offenders. Gonna break it down into three sections: User Agents Character Strings IP Addresses User Agents User-agents come and go, and are easily spoofed, but it’s worth a few lines of htaccess to block the more persistent bots that repeatedly scan your site with malicious requests. # Nov 2010 User Agents SetEnvIfNoCase User-Agent "MaMa " keep_out SetEnvIfNoCase User-Agent "choppy" keep_out SetEnvIfNoCase User-Agent "heritrix" keep_out SetEnvIfNoCase User-Agent […] Continue reading »

2010 User-Agent Blacklist

[ 2010 User-Agent Blacklist ]

The 2010 User-Agent Blacklist blocks hundreds of bad bots while ensuring open-access for the major search engines: Google, Bing, Ask, Yahoo, et al. Blocking bad user-agents is an effective addition to any security strategy. It works like this: your site is getting hammered by rogue bots that waste valuable server resources and bandwidth. So you grab a copy of the 2010 UA Blacklist from Perishable Press, include it in your site’s root .htaccess file, and enjoy better security and performance. […] Continue reading »

Protect Your Site with a Blackhole for Bad Bots

[ Black Hole (Vector) ]

One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the honeypot trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied […] Continue reading »

2010 IP Blacklist

Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated […] Continue reading »

htaccess Redirect to Maintenance Page

Redirecting visitors to a maintenance page or other temporary page is an essential tool to have in your tool belt. Using HTAccess, redirecting visitors to a temporary maintenance page is simple and effective. All you need to redirect your visitors is the following code placed in your site’s root HTAccess: # MAINTENANCE-PAGE REDIRECT <ifmodule mod_rewrite.c> RewriteEngine on RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000 RewriteCond %{REQUEST_URI} !/maintenance.html$ [NC] RewriteCond %{REQUEST_URI} !\.(jpe?g?|png|gif) [NC] RewriteRule .* /maintenance.html [R=302,L] </ifmodule> That is the official copy-&-paste goodness right […] Continue reading »

Stop 404s for Mobile Versions of Your Site

[ Stop 404 Requests for Mobile Sites ]

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files: http://domain.tld/apple-touch-icon.png […] Continue reading »

Is it Secret? Is it Safe?

[ Enjoying the Evening ]

Whenever I find myself working with PHP or messing around with server settings, I nearly always create a phpinfo.php file and place it in the root directory of whatever domain I happen to be working on. These types of informational files employ PHP’s handy phpinfo() function to display a concise summary of all of your server’s variables, which may then be referenced for debugging purposes, bragging rights, and so on. While this sort of thing is normally okay, I frequently […] Continue reading »

HTAccess Privacy for Specific IPs

Running a private site is all about preventing unwanted visitors. Here is a quick and easy way to allow access to multiple IP addresses while redirecting everyone else to a custom message page. To do this, all you need is an HTAccess file and a list of IPs for which you would like to allow access. Continue reading »

HTAccess Password-Protection Tricks

Recently a reader asked about how to password-protect a directory for every specified IP while allowing open access to everyone else. In my article, Stupid htaccess Tricks, I show how to password-protect a directory for every IP except the one specified, but not for the reverse case. In this article, I will demonstrate this technique along with a wide variety of other useful password-protection tricks, including a few from my Stupid htaccess Tricks article. Before getting into the juicy stuff, […] Continue reading »

Secure Visitor Posting for WordPress

Normally, when visitors post a comment to your site, specific types of client data are associated with the request. Commonly, a client will provide a user agent, a referrer, and a host header. When any of these variables is absent, there is good reason to suspect foul play. For example, virtually all browsers provide some sort of user-agent name to identify themselves. Conversely, malicious scripts directly posting spam and other payloads to your site frequently operate without specifying a user […] Continue reading »

HTAccess Spring Cleaning 2009

Just like last year, this Spring I have been taking some time to do some general maintenance here at Perishable Press. This includes everything from fixing broken links and resolving errors to optimizing scripts and eliminating unnecessary plugins. I’ll admit, this type of work is often quite dull, however I always enjoy the process of cleaning up my HTAccess files. In this post, I share some of the changes made to my HTAccess files and explain the reasoning behind each […] Continue reading »

Best Practices for Error Monitoring

Given my propensity to discuss matters involving error log data (e.g., monitoring malicious behavior, setting up error logs, and creating extensive blacklists), I am often asked about the best way to go about monitoring 404 and other types of server errors. While I consider myself to be a novice in this arena (there are far brighter people with much greater experience), I do spend a lot of time digging through log entries and analyzing data. So, when asked recently about […] Continue reading »

4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers

You have seen user-agent blacklists, IP blacklists, 4G Blacklists, and everything in between. Now, in this article, for your sheer and utter amusement, I present a collection of over 8000 blacklisted referrers. Shortcut: skip the article and jump to Disclaimer and Download » Referrer Spam Sucks For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the […] Continue reading »

4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots

[ Image: Inverted Eclipse ]

As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against […] Continue reading »

The Perishable Press 4G Blacklist

[ 4G Stormtrooper ]

At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your site against a wide range of automated attacks and other malicious activity. Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Banhammer: Protect your WordPress site against threats.
Thoughts
Making great strides on my new book. Planned release in December :)
To organize my life, I keep it simple. online: plain text files, offline: sticky notes.
Official list of Googlebot IP addresses.
Lot of 1s in today’s date 20211111.
Working on a new book :)
I enjoy listening to original Star Trek and NG episodes while working online. After a while it feels like I’m working on the ship as part of the crew, going on adventures.
New version (2.6) of my shapeSpace starter theme now available! Always free & open source for everyone :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.