Tag: blacklist

Latest Blacklist Entries

Posted on November 9, 2010 in Websites by Jeff Starr

Recently cleared several megabytes of log files, detecting patterns, recording anomalies, and blacklisting gross offenders. Gonna break it down into three sections:

User Agents

User-agents come and go, and are easily spoofed, but it’s worth a few lines of htaccess to block the more persistent bots that repeatedly scan your site with malicious requests.

# Nov 2010 User Agents
SetEnvIfNoCase User-Agent "MaMa " keep_out
SetEnvIfNoCase User-Agent "choppy" keep_out
SetEnvIfNoCase User-Agent "heritrix" keep_out
SetEnvIfNoCase User-Agent "Purebot" keep_out
SetEnvIfNoCase User-Agent "PostRank" keep_out
SetEnvIfNoCase User-Agent "archive.org_bot" keep_out
SetEnvIfNoCase User-Agent "msnbot.htm)._" keep_out

<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 Deny from env=keep_out
</Limit>

Continue Reading

2010 User-Agent Blacklist

Posted on August 9, 2010 in Websites by Jeff Starr

[ 2010 User-Agent Blacklist ] The 2010 User-Agent Blacklist blocks hundreds of bad bots while ensuring open-access for the major search engines: Google, Bing, Ask, Yahoo, et al. Blocking bad user-agents is an effective addition to any security strategy. It works like this: your site is getting hammered by rogue bots that waste valuable server resources and bandwidth. So you grab a copy of the 2010 UA Blacklist from Perishable Press, include it in your site’s root .htaccess file, and enjoy a more secure and better performing website. It’s that easy.

Proven Security

The 2010 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.

Continue Reading

Protect Your Site with a Blackhole for Bad Bots

Posted on July 14, 2010 in Websites by Jeff Starr

[ Black Hole ] One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol, check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.

In five easy steps, you can set up your own Blackhole to trap bad bots and protect your site from evil scripts, bandwidth thieves, content scrapers, spammers, and other malicious behavior.

[ Blackhole Directory with Files ] The Blackhole is built with PHP, and uses a bit of .htaccess to protect the blackhole directory. The blackhole script combines heavily modified versions of the Kloth.net script (for the bot trap) and the Network Query Tool (for the whois lookups). Refined over the years and completely revamped for this tutorial, the Blackhole consists of a single plug-&-play directory that contains the following four files:

Continue Reading

2010 IP Blacklist

Posted on July 6, 2010 in Websites by Jeff Starr

Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated via IP lookup and denied access via simple htaccess directive:

<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 Deny from 123.456.789
</LIMIT>

Although many of the worst attacks happen in randomized, zombie-like fashion, I have found that individual IPs that are not blacklisted will return repeatedly until finally blocked. Yet, despite the short-term success enjoyed by denying access to the most malicious IPs, the long-term futility of such blacklisting reflects the temporary nature of this solution.

In other words, I have found that blocking individual IPs is useful only for limited periods of time. Thus, every year, I gather my code and flush the blacklist of all individually blocked IP addresses. I then start fresh, adding the worst villains to the list, blocking entire IP ranges if necessary, and referring to previous versions of my htaccess files to cross-check suspiciously familiar entities. Eventually, a new blacklist emerges and I share it at Perishable Press. Here is the current version for 2010..

Continue Reading

Stop 404 Requests for Mobile Versions of Your Site

Posted on April 26, 2010 in Function by Jeff Starr

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files:

  • http://domain.tld/apple-touch-icon.png
  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m

So some bot comes along, assumes that your site includes a mobile version, and then tries its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories. If this only happens once in awhile, it’s no big deal. But these days I’ve been seeing many different bots requesting these nonexistent resources.

Even worse, these mobile-hungry bots can’t seem to remember where they’ve been – they typically request the same resources repeatedly, and in multiple locations within the directory structure. I frequently see hundreds of these types of requests in my weekly error-log analyses. Needless to say, this is an incredible waste of time, bandwidth, and server resources.

Continue Reading

Protect WordPress Against Malicious URL Requests

Posted on December 22, 2009 in WordPress by Jeff Starr

A few months ago, many WordPress sites were attacked with some extremely malicious code. While searching for a good solution, I discovered the following gem of a plugin in the pastebin repository:

<?php /* Plugin Name: Block Bad Queries */

if (strlen($_SERVER['REQUEST_URI']) > 255 || 
	strpos($_SERVER['REQUEST_URI'], "eval(") || 
	strpos($_SERVER['REQUEST_URI'], "base64")) {
		@header("HTTP/1.1 414 Request-URI Too Long");
		@header("Status: 414 Request-URI Too Long");
		@header("Connection: Close");
		@exit;
} ?>

This script checks for excessively long request strings (i.e., greater than 255 characters), as well as the presence of either “eval(” or “base64” in the request URI. These sorts of nefarious requests were implicated in the September 2009 WordPress attacks.

Continue Reading

HTAccess Privacy for Specific IPs

Posted on October 12, 2009 in Function by Jeff Starr

Running a private site is all about preventing unwanted visitors. Here is a quick and easy way to allow access to multiple IP addresses while redirecting everyone else to a custom message page.

To do this, all you need is an HTAccess file and a list of IPs for which you would like to allow access.

Edit the following code according to the proceeding instructions and place into the root HTAccess file of your domain:

# ALLOW ONLY MULTIPLE IPs
<Limit GET POST PUT>
 Order Deny,Allow
 Deny from all
 Allow from 123.456.789
 Allow from 456.789.123
 Allow from 789.123.456
</Limit>
ErrorDocument 403 path/custom-message.html
<Files path/custom-message.html>
 Order Allow,Deny
 Allow from all
</Files>

To prepare this code for use on your site, do these three things:

  1. Edit the three IP addresses to suit your needs. Feel free to add more IPs or remove any that aren’t needed.
  2. Edit both instances of “path/custom-message.html” to match the path and file name of the file that will contain your custom message. This may be anything, anywhere, with any functionality you desire.
  3. That’s it. Copy/paste into your site’s root htaccess file, upload, test, and get out!

Continue Reading

Block Multiple IP Addresses with PHP

Posted on June 8, 2009 in Function by Jeff Starr

[ Screenshot: The Legion of Doom ] Let’s face it. There’s just as much scum on the Internet as there is out there in the “real world.” Maybe even more, who knows. From scammers and spammers to scrapers and crackers, the Web is just crawling with all sorts of pathetic scumbags. As predictably random as much of the malicious activity happens to be, it is virtually guaranteed that you will be hounded by at least a few persistent IP addresses that, for whatever reason, have latched on and just won’t let go. Like satanic parasites, they plague you night and day, haunting you and making your online life a living hell. Perhaps they leave endless spam comments; perhaps they are just mindless trolls giving you grief; or perhaps they continue to take flying stabs at the security of your website. Whatever the behavior, once you have determined that you need to block a collection of evil IPs, you have many choices. Here is a simple way to blacklist multiple IP addresses with a little PHP magic..

Continue Reading

Best Practices for Error Monitoring

Posted on May 3, 2009 in Websites by Jeff Starr

Given my propensity to discuss matters involving error log data (e.g., monitoring malicious behavior, setting up error logs, and creating extensive blacklists), I am often asked about the best way to go about monitoring 404 and other types of server errors. While I consider myself to be a novice in this arena (there are far brighter people with much greater experience), I do spend a lot of time digging through log entries and analyzing data. So, when asked recently about my error monitoring practices, I decided to share my response here at Perishable Press, and hopefully get some good feedback concerning best practices for error monitoring. Here is my email response to the question:

Continue Reading

4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers

Posted on April 21, 2009 in Websites by Jeff Starr

You have seen user-agent blacklists, IP blacklists, 4G Blacklists, and everything in between. Now, in this article, for your sheer and utter amusement, I present a collection of over 8000 blacklisted referrers.

For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the Mainstream Media website and happened to follow a link to your site (of all places), you would look at your access logs, notice Johnny’s visit, and speak out loud (slowly): “hmmm.. it looks like the Mainstream Media website referred my good pal Johnny to my Alka-Seltzer sales page.” In such a bizarre case, the Mainstream Media website — or specific page — is referred to as (no pun intended) the referrer.

Continue Reading

4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots

Posted on March 29, 2009 in Websites by Jeff Starr

[ Image: Inverted Eclipse ] As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against malicious behavior, it may certainly provide another layer of protection.

Even so, there are several things to consider before choosing to implement an extensive user-agent blacklist on your site. First and most importantly is the transient nature of the user agent itself. On most systems, the user-agent variable is easy to change, making it possible for bot owners to use any user-agent name they wish. Once a bad bot makes the rounds, becomes known, and is blacklisted, the bot owner need only modify or change its declared user agent and they’re back in business. User-agent names are constantly invented, spoofed, or otherwise altered in order to operate beneath — or above — the virtual radar. Thus, a user-agent blacklist is a high-maintenance affair, requiring continuous cultivation in order to maintain relevancy and effectiveness.

Continue Reading

The Perishable Press 4G Blacklist

Posted on March 16, 2009 in Websites by Jeff Starr

[ 4G Stormtrooper ] At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your website against a wide range of malicious activity. Like its 3G predecessor, the 4G Blacklist is designed for use on Apache servers and is easily implemented via HTAccess or the httpd.conf configuration file. In order to function properly, the 4G Blacklist requires two specific Apache modules, mod_rewrite and mod_alias. As with the third generation of the blacklist, the 4G Blacklist consists of multiple parts:

Update Feb 22, 2011: The 5G version of the blacklist is available now in beta.

Continue Reading

Yahoo! Slurp too Stupid to be a Robot

Posted on March 15, 2009 in Nonsense, Websites by Jeff Starr

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our websites every minute of the day.

For the most part, there are effective methods available enabling us to protect our sites against the endless hordes of irrelevant and mischievous bots. Such evil is easily blocked with virtually zero side-effects because their presence is simply irrelevant.

But what about bad bots that aren’t exactly irrelevant, such as Yahoo’s mindless Slurp crawler? By disobeying the robots.txt protocol as promised, Yahoo’s Slurp clearly falls into the “bad-bot” category. Unlike typical “nonsense” bots, Slurp is not exactly irrelevant (yet), so simply blocking them is not a reasonable solution.

Continue Reading

Building the Perishable Press 4G Blacklist

Posted on March 8, 2009 in Websites by Jeff Starr

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

Continue Reading

Controlling Proxy Access with HTAccess

Posted on February 22, 2009 in Function by Jeff Starr

In my recent article on blocking proxy servers, I explain how to use HTAccess to deny site access to a wide range of proxy servers. The method works great, but some readers want to know how to allow access for specific proxy servers while denying access to as many other proxies as possible.

Fortunately, the solution is as simple as adding a few lines to my original proxy-blocking method. Specifically, we may allow any requests coming from our whitelist of proxy servers by testing Apache’s HTTP_REFERER variable, like so:

RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-01.domain.tld(.*)
RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-02.domain.tld(.*)
RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-03.domain.tld(.*)

Continue Reading

Eight Ways to Blacklist with Apache’s mod_rewrite

Posted on February 3, 2009 in Function by Jeff Starr

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with mod_rewrite.

Blacklist via Request Method

[ #1 ] This first blacklisting method evaluates the client’s request method. Every time a client attempts to connect to your server, it sends a message indicating the type of connection it wishes to make. There are many different types of request methods recognized by Apache. The two most common methods are GET and POST requests, which are required for “getting” and “posting” data to and from the server. In most cases, these are the only request methods required to operate a dynamic website. Allowing more request methods than are necessary increases your site’s vulnerability. Thus, to restrict the types of request methods available to clients, we use this block of Apache directives:

Continue Reading

Yahoo! Lies about Obeying Robots.txt Directives

Posted on November 16, 2008 in Websites by Jeff Starr

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap:

Continue Reading

Blacklist Candidate 2008-10-19

Posted on October 19, 2008 in Function by Jeff Starr

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..

[ Photo: Television Flashback ] From time to time on the show, a contestant places a bid that is so absurd and so asinine that you literally laugh out loud, point at the monitor, and openly ridicule the pathetic loser. On such occasions, even the host of the show will laugh and mock the idiocy. Of course, this same situation happens frequently here at Perishable Press, where the scumbags that manage to escape the 3G Blacklist are proving themselves to be increasingly desperate and pathetic. Such is the case with this month’s official Blacklist Candidate Number 2008-10-19:

Come on down — you’re the next cracker whore to get banished from the site!

Continue Reading

Evil Incarnate, but Easily Blocked

Posted on September 15, 2008 in Websites by Jeff Starr

As my readers know, I spend a lot of time digging through error logs, preventing attacks, and reporting results. Occasionally, some moron will pull a stunt that deserves exposure, public humiliation, and banishment. There is certainly no lack of this type of nonsense, as many of you are well-aware.

3G Blacklist

Even so, I have to admit that I am very happy with my latest strategy against crackers, spammers, and other scumbags, namely, the 3G Blacklist. Since implementing this effective HTAccess security method, I have seen a dramatic decrease in the overall volume of malicious activity recorded in my Apache, PHP, and 404 error logs. Thankfully, the 3G Blacklist has kept things pretty quiet around here, at least as far as malicious activity is concerned. In fact, it has been awhile since I’ve seen anything worthy of sharing with you — until now.

Continue Reading

Redirect All Requests for a Nonexistent File to the Actual File

Posted on August 12, 2008 in Function by Jeff Starr

In my previous article on redirecting 404 requests for favicon files, I presented an HTAccess technique for redirecting all requests for nonexistent favicon.ico files to the actual file located in the site’s web-accessible root directory:

# REDIRECT FAVICONZ
<ifmodule mod_rewrite.c>
 RewriteCond %{THE_REQUEST} favicon.ico [NC]
 RewriteRule (.*) http://domain.tld/favicon.ico [R=301,L] 
</ifmodule>

As discussed in the article, this code is already in effect here at Perishable Press, as may be seen by clicking on any of the following links:

Clearly, none of these URL requests target the “real” favicon.ico file, yet thanks to the previous method they are all happily redirected to the proper location. This is useful for a variety of reasons, including preventing excessive and unnecessary server strain due to malicious scripts.

Continue Reading

Stop the Madness: Redirect those Ridiculous Favicon 404 Requests

Posted on August 11, 2008 in Function by Jeff Starr

For the last several months, I have been seeing an increasing number of 404 errors requesting “favicon.ico” appended onto various URLs:

http://perishablepress.com/press/favicon.ico
http://perishablepress.com/press/2007/06/12/favicon.ico
http://perishablepress.com/press/2007/09/25/absolute-horizontal-and-vertical-centering-via-css/favicon.ico
http://perishablepress.com/press/2007/08/01/temporary-site-redirect-for-visitors-during-site-updates/favicon.ico
http://perishablepress.com/press/2007/01/16/maximum-and-minimum-height-and-width-in-internet-explorer/favicon.ico

When these errors first began appearing in the logs several months ago, I didn’t think too much of it — “just another idiot who can’t find my site’s favicon..” As time went on, however, the frequency and variety of these misdirected requests continued to increase. A bit frustrating perhaps, but not serious enough to justify immediate action. After all, what’s the worst that can happen? The idiot might actually find the blasted thing? Wouldn’t that be nice..

Continue Reading

Blacklist Candidate Number 2008-05-31

Posted on May 31, 2008 in Function by Jeff Starr

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags..

[ Photo: Bob Barker waves at you ] Just under the wire! Even so, this month’s official Blacklist-Candidate article may be the last monthly installment of the series. Although additional BC articles may appear in the future, it is unlikely that they will continue as a regular monthly feature. Oh sure, I see the tears streaming down your face, but think about it: this is actually good news. After implementing the 3G Blacklist, finding decent blacklist candidates is becoming increasingly difficult. This is a good thing because it indicates that the new blacklist strategy is working. Nonetheless, every now and then, some brainless bozo will magically appear and poke around where it shouldn’t be poking. Such is the case with this month’s Blacklist Candidate Number 2008-05-31:

Come on down! You’re the next worthless turd to get banished from the site!

Continue Reading

Series Summary: Building the 3G Blacklist

Posted on May 25, 2008 in Websites by Jeff Starr

[ 3G Stormtrooper ] In the now-complete series, Building the 3G Blacklist, I share insights and discoveries concerning website security and protection against malicious attacks. Each article in the series focuses on unique blacklist strategies designed to protect sites transparently, effectively, and efficiently. The five articles culminate in the release of the next generation 3G Blacklist.

For the record, here is a quick summary of the entire Building the 3G Blacklist series:

Continue Reading