Tag: ip

Latest Blacklist Entries

Posted on November 9, 2010 in Websites by Jeff Starr

Recently cleared several megabytes of log files, detecting patterns, recording anomalies, and blacklisting gross offenders. Gonna break it down into three sections:

User Agents

User-agents come and go, and are easily spoofed, but it’s worth a few lines of htaccess to block the more persistent bots that repeatedly scan your site with malicious requests.

# Nov 2010 User Agents
SetEnvIfNoCase User-Agent "MaMa " keep_out
SetEnvIfNoCase User-Agent "choppy" keep_out
SetEnvIfNoCase User-Agent "heritrix" keep_out
SetEnvIfNoCase User-Agent "Purebot" keep_out
SetEnvIfNoCase User-Agent "PostRank" keep_out
SetEnvIfNoCase User-Agent "archive.org_bot" keep_out
SetEnvIfNoCase User-Agent "msnbot.htm)._" keep_out

<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 Deny from env=keep_out
</Limit>

Continue Reading

2010 User-Agent Blacklist

Posted on August 9, 2010 in Websites by Jeff Starr

[ 2010 User-Agent Blacklist ] The 2010 User-Agent Blacklist blocks hundreds of bad bots while ensuring open-access for the major search engines: Google, Bing, Ask, Yahoo, et al. Blocking bad user-agents is an effective addition to any security strategy. It works like this: your site is getting hammered by rogue bots that waste valuable server resources and bandwidth. So you grab a copy of the 2010 UA Blacklist from Perishable Press, include it in your site’s root .htaccess file, and enjoy a more secure and better performing website. It’s that easy.

Proven Security

The 2010 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.

Continue Reading

2010 IP Blacklist

Posted on July 6, 2010 in Websites by Jeff Starr

Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated via IP lookup and denied access via simple htaccess directive:

<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 Deny from 123.456.789
</LIMIT>

Although many of the worst attacks happen in randomized, zombie-like fashion, I have found that individual IPs that are not blacklisted will return repeatedly until finally blocked. Yet, despite the short-term success enjoyed by denying access to the most malicious IPs, the long-term futility of such blacklisting reflects the temporary nature of this solution.

In other words, I have found that blocking individual IPs is useful only for limited periods of time. Thus, every year, I gather my code and flush the blacklist of all individually blocked IP addresses. I then start fresh, adding the worst villains to the list, blocking entire IP ranges if necessary, and referring to previous versions of my htaccess files to cross-check suspiciously familiar entities. Eventually, a new blacklist emerges and I share it at Perishable Press. Here is the current version for 2010..

Continue Reading

Is it Secret? Is it Safe?

Posted on March 17, 2010 in Function by Jeff Starr

[ Enjoying the Evening ] Whenever I find myself working with PHP or messing around with server settings, I nearly always create a phpinfo.php file and place it in the root directory of whatever domain I happen to be working on. These types of informational files employ PHP’s handy phpinfo() function to display a concise summary of all of your server’s variables, which may then be referenced for debugging purposes, bragging rights, and so on.

While this sort of thing is normally okay, I frequently forget to remove the file and just leave it sitting there for the entire world to look at. This of course is a big “no-no” for site security, because the phpinfo.php file contains a hefty amount of information about my server, including stuff like:

  • The web server version
  • The IP address of the host
  • The version of the operating system
  • The root directory of the web server
  • Configuration information about the remote PHP installation
  • The username of the user who installed php and if they are a SUDO user

Continue Reading

HTAccess Privacy for Specific IPs

Posted on October 12, 2009 in Function by Jeff Starr

Running a private site is all about preventing unwanted visitors. Here is a quick and easy way to allow access to multiple IP addresses while redirecting everyone else to a custom message page.

To do this, all you need is an HTAccess file and a list of IPs for which you would like to allow access.

Edit the following code according to the proceeding instructions and place into the root HTAccess file of your domain:

# ALLOW ONLY MULTIPLE IPs
<Limit GET POST PUT>
 Order Deny,Allow
 Deny from all
 Allow from 123.456.789
 Allow from 456.789.123
 Allow from 789.123.456
</Limit>
ErrorDocument 403 path/custom-message.html
<Files path/custom-message.html>
 Order Allow,Deny
 Allow from all
</Files>

To prepare this code for use on your site, do these three things:

  1. Edit the three IP addresses to suit your needs. Feel free to add more IPs or remove any that aren’t needed.
  2. Edit both instances of “path/custom-message.html” to match the path and file name of the file that will contain your custom message. This may be anything, anywhere, with any functionality you desire.
  3. That’s it. Copy/paste into your site’s root htaccess file, upload, test, and get out!

Continue Reading

Block Multiple IP Addresses with PHP

Posted on June 8, 2009 in Function by Jeff Starr

[ Screenshot: The Legion of Doom ] Let’s face it. There’s just as much scum on the Internet as there is out there in the “real world.” Maybe even more, who knows. From scammers and spammers to scrapers and crackers, the Web is just crawling with all sorts of pathetic scumbags. As predictably random as much of the malicious activity happens to be, it is virtually guaranteed that you will be hounded by at least a few persistent IP addresses that, for whatever reason, have latched on and just won’t let go. Like satanic parasites, they plague you night and day, haunting you and making your online life a living hell. Perhaps they leave endless spam comments; perhaps they are just mindless trolls giving you grief; or perhaps they continue to take flying stabs at the security of your website. Whatever the behavior, once you have determined that you need to block a collection of evil IPs, you have many choices. Here is a simple way to blacklist multiple IP addresses with a little PHP magic..

Continue Reading

Yahoo! Slurp too Stupid to be a Robot

Posted on March 15, 2009 in Nonsense, Websites by Jeff Starr

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our websites every minute of the day.

For the most part, there are effective methods available enabling us to protect our sites against the endless hordes of irrelevant and mischievous bots. Such evil is easily blocked with virtually zero side-effects because their presence is simply irrelevant.

But what about bad bots that aren’t exactly irrelevant, such as Yahoo’s mindless Slurp crawler? By disobeying the robots.txt protocol as promised, Yahoo’s Slurp clearly falls into the “bad-bot” category. Unlike typical “nonsense” bots, Slurp is not exactly irrelevant (yet), so simply blocking them is not a reasonable solution.

Continue Reading

Building the Perishable Press 4G Blacklist

Posted on March 8, 2009 in Websites by Jeff Starr

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

Continue Reading

Eight Ways to Blacklist with Apache’s mod_rewrite

Posted on February 3, 2009 in Function by Jeff Starr

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with mod_rewrite.

Blacklist via Request Method

[ #1 ] This first blacklisting method evaluates the client’s request method. Every time a client attempts to connect to your server, it sends a message indicating the type of connection it wishes to make. There are many different types of request methods recognized by Apache. The two most common methods are GET and POST requests, which are required for “getting” and “posting” data to and from the server. In most cases, these are the only request methods required to operate a dynamic website. Allowing more request methods than are necessary increases your site’s vulnerability. Thus, to restrict the types of request methods available to clients, we use this block of Apache directives:

Continue Reading

Temporary PHP Redirect: Allow Multiple IP Access and Redirect Everyone Else

Posted on January 14, 2009 in Function by Jeff Starr

[ Image: Abstract Mathematical Diagram ] In my previous article on temporarily redirecting visitors during site updates, I present numerous PHP and HTAccess methods for handling traffic during site maintenance, updates, and other temporary periods of downtime. Each of the PHP methods presented in the article allow for access from a single IP while redirecting everyone else. In this article, we modify our previous techniques to allow access for multiple IP addresses while temporarily redirecting everyone else to the page of our choice. Plus, while we’re at it, we’ll explore a few additional ways to adapt and use the general technique.

Continue Reading

Yahoo! Lies about Obeying Robots.txt Directives

Posted on November 16, 2008 in Websites by Jeff Starr

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap:

Continue Reading

Unexplained Crawl Behavior Involving Tagged Query Strings

Posted on June 4, 2008 in Websites by Jeff Starr

I need your help! I am losing my mind trying to solve another baffling mystery. For the past three or four months, I have been recording many 404 Errors generated from msnbot, Yahoo-Slurp, and other spider crawls. These errors result from invalid requests for URLs containing query strings such as the following:

  • http://perishablepress.com/press/page/2/?tag=spam
  • http://perishablepress.com/press/page/3/?tag=code
  • http://perishablepress.com/press/page/2/?tag=email
  • http://perishablepress.com/press/page/2/?tag=xhtml
  • http://perishablepress.com/press/page/4/?tag=notes
  • http://perishablepress.com/press/page/2/?tag=flash
  • http://perishablepress.com/press/page/2/?tag=links
  • http://perishablepress.com/press/page/3/?tag=theme
  • http://perishablepress.com/press/page/2/?tag=press

..plus hundreds and hundreds more 1. The URL pattern is always the same: a different page number followed by a query string containing one of the tags used here at Perishable Press, for example: “/?tag=something”. The problem is that there are no such links anywhere on the site. The site employs permalink format for all WordPress-generated links (e.g., post/page links, search queries, tag queries, etc.). In an effort to locate the source of these URLs, I have performed multiple, thorough searches through every aspect of my site (files, database, code, examples, etc.), and the results are always the same: nothing. UGH! What is the source of these misguided URLs? Hopefully, you will be able to help shed some light on this unexplained crawl behavior (please, I beg you!!).

Continue Reading

Series Summary: Building the 3G Blacklist

Posted on May 25, 2008 in Websites by Jeff Starr

[ 3G Stormtrooper ] In the now-complete series, Building the 3G Blacklist, I share insights and discoveries concerning website security and protection against malicious attacks. Each article in the series focuses on unique blacklist strategies designed to protect sites transparently, effectively, and efficiently. The five articles culminate in the release of the next generation 3G Blacklist.

For the record, here is a quick summary of the entire Building the 3G Blacklist series:

Continue Reading

Building the 3G Blacklist, Part 5: Improving Site Security by Selectively Blocking Individual IPs

Posted on May 13, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. Wrapping up the series with this article, I provide the final key to our comprehensive blacklist strategy: selectively blocking individual IPs. Previous articles also focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. In the next article, these five articles will culminate in the release of the next generation 3G Blacklist.

Improving Site Security by Selectively Blocking Individual IPs

The final component of the 3G Blacklist establishes a vehicle through which individual IPs may be blocked. As previously discussed, the goal of the 3G Blacklist involves moving away from extensive lists of blacklisted user agents, IP addresses, and other identifying attributes. Nonetheless, a comprehensive security strategy benefits from the ability to quickly and easily neutralize threats by directly blocking individual IP addresses.

In my experience, an effective IP blacklist should be kept as current as possible. Unfortunately, once lists grow to over 100 entries, they become quite unmanageable. Periodic verification of blocked IPs becomes increasingly difficult as lists grow in size. Uncultivated blacklists that continue to grow in size will eventually do more harm than good. Beyond the burden placed on server resources, constantly changing IP ownership inevitably results in unintentional blocking of legitimate traffic.

Just as with individual blocking of user agents, blocking individual (or ranges of) IP addresses provides a secondary layer of protection against direct attacks, corrupt networks, and other case-specific situations. This level of granular control enables defensive fine-tuning for targeting and blocking specific machines. Known repeat offenders should remain permanently blacklisted, whereas short-term offenders should be blocked temporarily. Examples of known repeat offenders typically include:

Continue Reading

Over 150 of the Worst Spammers, Scrapers and Crackers from 2007

Posted on February 24, 2008 in Websites by Jeff Starr

Update 2010/07/07: Please visit the 2010 IP Blacklist for more current information.

Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated via IP lookup and denied access via simple htaccess directive:

<Limit GET POST PUT>
 order allow,deny
 allow from all
 deny from 123.456.789
</LIMIT>

Although many of the worst attacks happen in randomized, zombie-like fashion, I have found that individual IPs that are not blacklisted will return repeatedly until finally blocked. Yet, despite the short-term success enjoyed by denying access to the most malicious IPs, the long-term futility of such blacklisting reflects the temporary nature of this solution. In other words, I have found that blocking individual IPs is useful only for limited periods of time. Thus, every year, I gather my code and flush the blacklist of all individually blocked IP addresses. I then start fresh, adding the worst villains to the list, blocking entire IP ranges if necessary, and referring to previous versions of my htaccess files to cross-check suspiciously familiar entities. It is within this context, then, that I present the following manually assembled collection of over 150 of the worst spammers, scrapers, and crackers to hit my site in 2007.

Continue Reading

Yahoo! Slurp in My Blackhole (Yet Again)

Posted on December 16, 2007 in Websites by Jeff Starr

Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible):

Continue Reading

A Dramatic Week Here at Perishable Press..

Posted on December 10, 2007 in Perishable, Websites by Jeff Starr

..And we’re back. After an insane week spent shopping for a new host, dealing with some Bad Behavior, and transferring Perishable Press to its new home on a virtual private server (VPS), everything is slowly falling back into place. Along the way, there have been some interesting challenges and many lessons learned. Here are a few of the highlights..

The tide may be turning for A Small Orange

I am certainly not alone when I say that shopping for a new hosting provider and transferring websites is one of my least favorite aspects of web development. In my experience, switching hosts requires waay too much time and rarely unfolds without significant problems. Nonetheless, when service and/or support turns sour, upgrading to a better host is well worth the effort. In my case, A Small Orange just wasn’t working out.

Everything was going fine for the first several months — excellent service, fantastic support, and consistent, reliable server uptime. However, during the last several months, server uptime frequently dipped below the 98% level, an unacceptable amount of downtime, especially since it generally happened during critical times: peak hours or while I was trying to work on the site. When I finally submitted a support ticket addressing the “unacceptable levels of downtime,” ASO support staff put my mind at ease by moving my site to a “a more stable server.”

Continue Reading

Yahoo! in my Blackhole

Posted on November 25, 2007 in Websites by Jeff Starr

Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has anyone else experienced such unexpected behavior from one the four major search engines? Hmmm.. let’s dig a little further..

Here is the carefully formulated, highly specific, properly placed robots.txt rule that explicitly and strictly forbids all agents from accessing my blackhole bot trap:

Continue Reading

How to Verify the Four Major Search Engines

Posted on October 9, 2007 in Websites by Jeff Starr

Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. Thus, it is a good idea to implement a procedure for scanning and checking select agents for authenticity. In general, the verification process involves a “forward/reverse” DNS lookup, which is then cross-verified with the search engine in question. Let’s have a quick look at how to do this..

First, visit and bookmark the following articles (and/or this article). These resources explain how to identify and verify the agents for each of the four major search engines: Google, Yahoo!, MSN/Live, and Ask.

Continue Reading

Temporary Site Redirect for Visitors during Site Updates

Posted on August 1, 2007 in Function by Jeff Starr

[ Image: Abstract Mathematical Diagram ] In our article Stupid htaccess Tricks, we present the htaccess code required for redirecting visitors temporarily during periods of site maintenance. Although the article provides everything needed to implement the temporary redirect, I think readers would benefit from a more thorough examination of the process — nothing too serious, just enough to get it right. After discussing temporary redirects via htaccess, I’ll also explain how to accomplish the same thing using only PHP.

Continue Reading

WP-ShortStat Slowing Down Root Index Pages

Posted on July 17, 2007 in Function, WordPress by Jeff Starr

For over a year now, I have been using Markus Kämmerer’s (Happy Arts Blog) WP-ShortStat plugin for WordPress. The plugin is relatively well-maintained and remains one of my favorite admin tools. Great for popping in on stats without logging into Mint. Nonetheless, due to its IP/country-detection functionality, WP-ShortStat has experienced its share of difficulties (e.g., read through the change log on the plugin’s home page). In this article, I describe how WP-Shortstat slows down the root index-page of a site, and then suggest a (temporary) fix for the issue.

Continue Reading

How to Block IP Addresses with PHP

Posted on July 3, 2007 in Function by Jeff Starr

[ Image: Skeletor Blocks a Move ] Figuratively speaking, hunting down and killing spammers, scrapers, and other online scum remains one of our favorite pursuits. Once we have determined that a particular IP address is worthy of banishment, we generally invoke the magical powers of htaccess to lock the gates. When htaccess is not available, we may summon the versatile functionality of PHP to get the job done.

This method is relatively straightforward. Simply edit, copy and paste the following code example into the top of any PHP for which you wish to block access:

Continue Reading

Harvesting cPanel Raw Access Logs

Posted on May 28, 2007 in Websites by Jeff Starr

[ Image: Harvesting the Land ]
Harvesting Raw Logs
For those of us using cPanel as the control panel for our websites, a wealth of information is readily available via cPanel ‘Raw Access Logs’. These logs are perpetually updated with data involving user agents, IP addresses, HTTP activity, resource access, and a whole lot more. Here is a quick tutorial on accessing and interpreting your cPanel raw access logs.

Part One: Grab ‘em

To grab a copy of your raw access logs, log into cPanel and click on the "Raw Access Logs" icon. Within the Raw Access Log interface, scroll through the list of available log files and download the raw access log(s) of your choice.

Exit cPanel and navigate to your local copy of the raw access log, which should have been downloaded as a zipped/g-zipped file (i.e., .zip or .gz file extension), with a name similar to accesslog_your-domain.com_4_20_2007.gz.

Unzip the file and extract its contents, which should be a single file named your-domain.com. Rename the file by appending a .log or .txt extension to the file name. Alternatively, if the file is not named with a .com, .net, or .whatever extension, no rename is necessary, as it also may be opened via right-click » ‘Open With…’.

That’s all there is to it. If you understand how to interpret the contents of your Raw Access Log, you’re solid gold, baby. Otherwise, continue reading for a breif tutorial to get you started with the basics..

Continue Reading

Disobedient Robots and Company

Posted on January 1, 2007 in Perishable, Websites by Jeff Starr

In our never-ending battle against spammers, leeches, scrapers, and other online undesirables, we have implemented several powerful security measures to improve the operational integrity of our perpetual virtual existence. Here is a rundown of the new behind-the-scenes security features of Perishable Press:

  • Automated spambot trap, designed to identify bots (and/or stupid people) that disobey rules specified in the site’s robots.txt file.
  • Automated disobedient-robot identification (via reverse IP lookup), admin-notification (via email) and blacklist inclusion (via htaccess).
  • Automated inclusion of disobedient robot identification on our now public "Disobedient Robots" page.
  • Imroved htaccess rules, designed to eliminate scum-sucking worms and other useless vermin.
  • Automated tracking tools, designed to keep a close eye on any suspicious or questionable activity.
  • Automated 404-error statistics, designed to optimize the elimination of 404 errors.
  • Plus a few other secret-agent tricks that we are not at liberty to discuss ;)

As you can see, we have been pretty busy around here — fortunately, the new security features have been working flawlessly, reducing stolen bandwidth, potential spam, disobedient robots, and 404 errors. Hopefully, the end result of these new features will involve smoother site functionality and better browsing for everyone.