Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has […] Continue reading »
In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement. Continue reading »
Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. So it’s always a good idea to implement a procedure for […] Continue reading »
Recently, I discussed the suspicious behavior recently observed by the Yahoo! Slurp crawler. As revealed by the site’s closely watched 404-error logs, Yahoo! had been requesting a series of nonexistent resources. Although a majority of the 404 errors were exclusive to the Slurp crawler, there were several instances of requests that were also coming from Google, Live, and even Ask. Initially, these distinct errors were misdiagnosed as existing URLs appended with various JavaScript functions. Here are a few typical examples […] Continue reading »
Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming the door on […] Continue reading »
In our article Stupid htaccess Tricks, we present the htaccess code required for redirecting visitors temporarily during periods of site maintenance. Although the article provides everything needed to implement the temporary redirect, I think readers would benefit from a more thorough examination of the process — nothing too serious, just enough to get it right. After discussing temporary redirects via htaccess, I’ll also explain how to accomplish the same thing using only a small slice of PHP. It’s like two […] Continue reading »
Recently, I needed to find and replace all instances of “http://website” in the wp_comments table of the WordPress database. Fortunately, SQL provides a simple way to find and replace data with its wonderful UPDATE function. Continue reading »
For over a year now, I have been using Markus Kämmerer’s (Happy Arts Blog) WP-ShortStat plugin for WordPress. The plugin is relatively well-maintained and remains one of my favorite admin tools. Great for popping in on stats without logging into Mint. Nonetheless, due to its IP/country-detection functionality, WP-ShortStat has experienced its share of difficulties (e.g., read through the change log on the plugin’s home page). In this article, I describe how WP-Shortstat slows down the root index-page of a site, […] Continue reading »
Figuratively speaking, hunting down and killing spammers, scrapers, and other online scum remains one of our favorite pursuits. Once we have determined that a particular IP address is worthy of banishment, we generally invoke the magical powers of htaccess to lock the gates. When htaccess is not available, we may summon the versatile functionality of PHP to get the job done. This method is straightforward. Simply edit, copy and paste the following code example into the top of any PHP […] Continue reading »
In this article, we extrapolate our favorite CSS-compression technique for JavaScript. Below, we outline the steps required to auto-compress your JavaScript documents via gzip and PHP. Two different compression methods are presented. The first method does not require htaccess, but rather involves the manual editing of JavaScript files. The second method employs htaccess to do all the work for you, thus requiring much less effort to implement. In either case, the result is the same: automatically compressed content delivered only […] Continue reading »
In our never-ending battle against spammers, leeches, scrapers, and other online undesirables, we have implemented several powerful security measures to improve the operational integrity of our perpetual virtual existence. Here is a rundown of the new behind-the-scenes security features of Perishable Press. Continue reading »
Recently, every website on our primary server was simultaneously attacked. The offending party indiscriminately replaced the contents of every index file, regardless of its extension or location, with a few vulgar lines of code, which indicated intention, identity, and influence. Apparently, the attack occurred via Germany, through a server at the University of Hamburg (uni-hamburg.de). This relatively minor attack resulted in several hours of valuable online education. In this article, it is our intention to share experience with website attack […] Continue reading »
A list of HTTP Error codes and corresponding definitions: Informational Codes 100 — Continue 101 — Switching Protocols Successful Client Requests 200 — OK 201 — Created 202 — Accepted 203 — Non-Authoritative Information 204 — No Content 205 — Reset Content 206 — Partial Content Client Request Redirected 300 — Multiple Choices 301 — Moved Permanently 302 — Moved Temporarily 303 — See Other 304 — Not Modified 305 — Use Proxy 307 — Temporary Redirect Client Request Errors […] Continue reading »