Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

Canonical URLs and Subdomains with Plesk

I am in the process of migrating my sites from A Small Orange to Media Temple. Part of that process involves canonicalizing domain URLs to help maximize SEO strategy. At ASO, URL canonicalization required just a few htaccess directives: # enforce no www prefix <ifmodule mod_rewrite.c> RewriteCond %{HTTP_HOST} !^domain\.tld$ [NC] RewriteRule ^(.*)$ http://domain.tld/$1 [R=301,L] </ifmodule> When placed in the web-accessible root directory’s htaccess file, that snippet will ensure that all requests for your site are not prefixed with www. There’s […] Continue reading »

Protect Your Site with a Blackhole for Bad Bots

[ Black Hole (Vector) ]

One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the honeypot trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied […] Continue reading »

Stop 404s for Mobile Versions of Your Site

[ Stop 404 Requests for Mobile Sites ]

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files: http://domain.tld/apple-touch-icon.png […] Continue reading »

Fixing WordPress Infinite Duplicate Content Issue

Jeff Morris recently demonstrated a potential issue with the way WordPress handles multipaged posts and comments. The issue involves WordPress’ inability to discern between multipaged posts and comments that actually exist and those that do not. By redirecting requests for nonexistent numbered pages to the original post, WordPress creates an infinite amount of duplicate content for your site. In this article, we explain the issue, discuss the implications, and provide an easy, working solution. Understanding the “infinite duplicate content” issue […] Continue reading »

HTAccess Privacy for Specific IPs

Running a private site is all about preventing unwanted visitors. Here is a quick and easy way to allow access to multiple IP addresses while redirecting everyone else to a custom message page. To do this, all you need is an HTAccess file and a list of IPs for which you would like to allow access. Continue reading »

HTAccess Password-Protection Tricks

Recently a reader asked about how to password-protect a directory for every specified IP while allowing open access to everyone else. In my article, Stupid htaccess Tricks, I show how to password-protect a directory for every IP except the one specified, but not for the reverse case. In this article, I will demonstrate this technique along with a wide variety of other useful password-protection tricks, including a few from my Stupid htaccess Tricks article. Before getting into the juicy stuff, […] Continue reading »

Secure Visitor Posting for WordPress

Normally, when visitors post a comment to your site, specific types of client data are associated with the request. Commonly, a client will provide a user agent, a referrer, and a host header. When any of these variables is absent, there is good reason to suspect foul play. For example, virtually all browsers provide some sort of user-agent name to identify themselves. Conversely, malicious scripts directly posting spam and other payloads to your site frequently operate without specifying a user […] Continue reading »

HTAccess Spring Cleaning 2009

Just like last year, this Spring I have been taking some time to do some general maintenance here at Perishable Press. This includes everything from fixing broken links and resolving errors to optimizing scripts and eliminating unnecessary plugins. I’ll admit, this type of work is often quite dull, however I always enjoy the process of cleaning up my HTAccess files. In this post, I share some of the changes made to my HTAccess files and explain the reasoning behind each […] Continue reading »

Best Practices for Error Monitoring

Given my propensity to discuss matters involving error log data (e.g., monitoring malicious behavior, setting up error logs, and creating extensive blacklists), I am often asked about the best way to go about monitoring 404 and other types of server errors. While I consider myself to be a novice in this arena (there are far brighter people with much greater experience), I do spend a lot of time digging through log entries and analyzing data. So, when asked recently about […] Continue reading »

4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers

You have seen user-agent blacklists, IP blacklists, 4G Blacklists, and everything in between. Now, in this article, for your sheer and utter amusement, I present a collection of over 8000 blacklisted referrers. Shortcut: skip the article and jump to Disclaimer and Download » Referrer Spam Sucks For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the […] Continue reading »

4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots

[ Image: Inverted Eclipse ]

As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against […] Continue reading »

The Perishable Press 4G Blacklist

[ 4G Stormtrooper ]

At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your site against a wide range of automated attacks and other malicious activity. Continue reading »

Yahoo! Slurp too Stupid to be a Robot

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our sites every minute of the day. For the most part, there are effective methods available enabling […] Continue reading »

Building the Perishable Press 4G Blacklist

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works […] Continue reading »

Eight Ways to Block and Redirect with Apache’s mod_rewrite

[ #1 ]

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with Apache mod_rewrite. Note: I changed the title of this post from “Eight Ways to Blacklist..” to “Eight Ways to […] Continue reading »

Yahoo! Lies about Obeying Robots.txt Directives

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
USP Pro: Unlimited front-end forms for user-submitted posts and more.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.