Tag: mod_rewrite

htaccess Redirect to Maintenance Page

Posted on May 19, 2010 in Function by Jeff Starr

Redirecting visitors to a maintenance page or other temporary page is an essential tool to have in your tool belt. Using HTAccess, redirecting visitors to a temporary maintenance page is simple and effective. All you need to redirect your visitors is the following code placed in your site’s root HTAccess:

# MAINTENANCE-PAGE REDIRECT
<IfModule mod_rewrite.c>
 RewriteEngine on
 RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000
 RewriteCond %{REQUEST_URI} !/maintenance.html$ [NC]
 RewriteCond %{REQUEST_URI} !\.(jpe?g?|png|gif) [NC]
 RewriteRule .* /maintenance.html [R=302,L]
</IfModule>

That is the official copy-&-paste goodness right there. Just grab, gulp and go. Of course, there are a few more details for those who may be unfamiliar with the process. Let’s look at all the gory details..

Continue Reading

Stop 404 Requests for Mobile Versions of Your Site

Posted on April 26, 2010 in Function by Jeff Starr

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files:

  • http://domain.tld/apple-touch-icon.png
  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m

So some bot comes along, assumes that your site includes a mobile version, and then tries its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories. If this only happens once in awhile, it’s no big deal. But these days I’ve been seeing many different bots requesting these nonexistent resources.

Even worse, these mobile-hungry bots can’t seem to remember where they’ve been – they typically request the same resources repeatedly, and in multiple locations within the directory structure. I frequently see hundreds of these types of requests in my weekly error-log analyses. Needless to say, this is an incredible waste of time, bandwidth, and server resources.

Continue Reading

Is it Secret? Is it Safe?

Posted on March 17, 2010 in Function by Jeff Starr

[ Enjoying the Evening ] Whenever I find myself working with PHP or messing around with server settings, I nearly always create a phpinfo.php file and place it in the root directory of whatever domain I happen to be working on. These types of informational files employ PHP’s handy phpinfo() function to display a concise summary of all of your server’s variables, which may then be referenced for debugging purposes, bragging rights, and so on.

While this sort of thing is normally okay, I frequently forget to remove the file and just leave it sitting there for the entire world to look at. This of course is a big “no-no” for site security, because the phpinfo.php file contains a hefty amount of information about my server, including stuff like:

  • The web server version
  • The IP address of the host
  • The version of the operating system
  • The root directory of the web server
  • Configuration information about the remote PHP installation
  • The username of the user who installed php and if they are a SUDO user

Continue Reading

Secure Visitor Posting for WordPress

Posted on June 1, 2009 in WordPress by Jeff Starr

[ ~{*}~ ] Normally, when visitors post a comment to your site, specific types of client data are associated with the request. Commonly, a client will provide a user agent, a referrer, and a host header. When any of these variables is absent, there is good reason to suspect foul play. For example, virtually all browsers provide some sort of user-agent name to identify themselves. Conversely, malicious scripts directly posting spam and other payloads to your site frequently operate without specifying a user agent. In the Ultimate User-Agent Blacklist, we account for the “no-user-agent” case in the very first directive, preventing a host of anonymous visitors from hitting the site.

In addition to empty user-agent strings, malicious requests for site content frequently fail to provide any referrer information. Unless special privacy software is being used, the web page from which a visitor has arrived at your site will be specified in the header information for that request. Likewise, when a visitor posts a comment at your site, the referrer string for that post request will be the URL of that particular page. Thus, as with blank user-agent requests, no-referrer requests are frequently indicative of spam and other malicious behavior.

Another important piece of information provided by all legitimate clients is the host request header. The host header specifies the Internet host and port number of the requested resource. This information is required for all clients making HTTP/1.1 requests. Thus, requiring the host request-header field for all posts to your site safely eliminates illicit requests from hitting your server.

Continue Reading

HTAccess Spring Cleaning 2009

Posted on May 11, 2009 in Function by Jeff Starr

Just like last year, this Spring I have been taking some time to do some general maintenance here at Perishable Press. This includes everything from fixing broken links and resolving errors to optimizing scripts and eliminating unnecessary plugins. I’ll admit, this type of work is often quite dull, however I always enjoy the process of cleaning up my HTAccess files. In this post, I share some of the changes made to my HTAccess files and explain the reasoning behind each modification. Some of the changes may surprise you! ;)

Continue Reading

4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers

Posted on April 21, 2009 in Websites by Jeff Starr

You have seen user-agent blacklists, IP blacklists, 4G Blacklists, and everything in between. Now, in this article, for your sheer and utter amusement, I present a collection of over 8000 blacklisted referrers.

For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the Mainstream Media website and happened to follow a link to your site (of all places), you would look at your access logs, notice Johnny’s visit, and speak out loud (slowly): “hmmm.. it looks like the Mainstream Media website referred my good pal Johnny to my Alka-Seltzer sales page.” In such a bizarre case, the Mainstream Media website — or specific page — is referred to as (no pun intended) the referrer.

Continue Reading

4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots

Posted on March 29, 2009 in Websites by Jeff Starr

[ Image: Inverted Eclipse ] As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against malicious behavior, it may certainly provide another layer of protection.

Even so, there are several things to consider before choosing to implement an extensive user-agent blacklist on your site. First and most importantly is the transient nature of the user agent itself. On most systems, the user-agent variable is easy to change, making it possible for bot owners to use any user-agent name they wish. Once a bad bot makes the rounds, becomes known, and is blacklisted, the bot owner need only modify or change its declared user agent and they’re back in business. User-agent names are constantly invented, spoofed, or otherwise altered in order to operate beneath — or above — the virtual radar. Thus, a user-agent blacklist is a high-maintenance affair, requiring continuous cultivation in order to maintain relevancy and effectiveness.

Continue Reading

The Perishable Press 4G Blacklist

Posted on March 16, 2009 in Websites by Jeff Starr

[ 4G Stormtrooper ] At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your website against a wide range of malicious activity. Like its 3G predecessor, the 4G Blacklist is designed for use on Apache servers and is easily implemented via HTAccess or the httpd.conf configuration file. In order to function properly, the 4G Blacklist requires two specific Apache modules, mod_rewrite and mod_alias. As with the third generation of the blacklist, the 4G Blacklist consists of multiple parts:

Update Feb 22, 2011: The 5G version of the blacklist is available now in beta.

Continue Reading

Building the Perishable Press 4G Blacklist

Posted on March 8, 2009 in Websites by Jeff Starr

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

Continue Reading

Controlling Proxy Access with HTAccess

Posted on February 22, 2009 in Function by Jeff Starr

In my recent article on blocking proxy servers, I explain how to use HTAccess to deny site access to a wide range of proxy servers. The method works great, but some readers want to know how to allow access for specific proxy servers while denying access to as many other proxies as possible.

Fortunately, the solution is as simple as adding a few lines to my original proxy-blocking method. Specifically, we may allow any requests coming from our whitelist of proxy servers by testing Apache’s HTTP_REFERER variable, like so:

RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-01.domain.tld(.*)
RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-02.domain.tld(.*)
RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-03.domain.tld(.*)

Continue Reading

Eight Ways to Blacklist with Apache’s mod_rewrite

Posted on February 3, 2009 in Function by Jeff Starr

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with mod_rewrite.

Blacklist via Request Method

[ #1 ] This first blacklisting method evaluates the client’s request method. Every time a client attempts to connect to your server, it sends a message indicating the type of connection it wishes to make. There are many different types of request methods recognized by Apache. The two most common methods are GET and POST requests, which are required for “getting” and “posting” data to and from the server. In most cases, these are the only request methods required to operate a dynamic website. Allowing more request methods than are necessary increases your site’s vulnerability. Thus, to restrict the types of request methods available to clients, we use this block of Apache directives:

Continue Reading

Redirect All (Broken) Links from any Domain via HTAccess

Posted on December 31, 2008 in Function by Jeff Starr

Here’s the scene: you have been noticing a large number of 404 requests coming from a particular domain. You check it out and realize that the domain in question has a number of misdirected links to your site. The links may resemble legitimate URLs, but because of typographical errors, markup errors, or outdated references, they are broken, leading to nowhere on your site and producing a nice 404 error for every request. Ugh. Or, another painful scenario would be a single broken link on a highly popular site. For example, you may have one of your best posts mentioned in the SitePoint forums, but the person leaving the link completely botched the job:

Continue Reading

Redirecting Subdirectories to the Root Directory via HTAccess

Posted on October 6, 2008 in Function by Jeff Starr

One of the most useful techniques in my HTAccess toolbox involves URL redirection using Apache’s RedirectMatch directive. With RedirectMatch, you get the powerful regex pattern matching available in the mod_alias module combined with the simplicity and effectiveness of the Redirect directive. This hybrid functionality makes RedirectMatch the ideal method for highly specific redirection. In this tutorial, we will explore the application of RedirectMatch as it applies to one of the most difficult redirect scenarios: redirecting all requests for a specific subdirectory (or any subordinate directory or file) to the root (or any parent) directory. We will explore how to accomplish this redirect using PHP in a subsequent article.

Continue Reading

Redirect All Requests for a Nonexistent File to the Actual File

Posted on August 12, 2008 in Function by Jeff Starr

In my previous article on redirecting 404 requests for favicon files, I presented an HTAccess technique for redirecting all requests for nonexistent favicon.ico files to the actual file located in the site’s web-accessible root directory:

# REDIRECT FAVICONZ
<ifmodule mod_rewrite.c>
 RewriteCond %{THE_REQUEST} favicon.ico [NC]
 RewriteRule (.*) http://domain.tld/favicon.ico [R=301,L] 
</ifmodule>

As discussed in the article, this code is already in effect here at Perishable Press, as may be seen by clicking on any of the following links:

Clearly, none of these URL requests target the “real” favicon.ico file, yet thanks to the previous method they are all happily redirected to the proper location. This is useful for a variety of reasons, including preventing excessive and unnecessary server strain due to malicious scripts.

Continue Reading

Stop the Madness: Redirect those Ridiculous Favicon 404 Requests

Posted on August 11, 2008 in Function by Jeff Starr

For the last several months, I have been seeing an increasing number of 404 errors requesting “favicon.ico” appended onto various URLs:

http://perishablepress.com/press/favicon.ico
http://perishablepress.com/press/2007/06/12/favicon.ico
http://perishablepress.com/press/2007/09/25/absolute-horizontal-and-vertical-centering-via-css/favicon.ico
http://perishablepress.com/press/2007/08/01/temporary-site-redirect-for-visitors-during-site-updates/favicon.ico
http://perishablepress.com/press/2007/01/16/maximum-and-minimum-height-and-width-in-internet-explorer/favicon.ico

When these errors first began appearing in the logs several months ago, I didn’t think too much of it — “just another idiot who can’t find my site’s favicon..” As time went on, however, the frequency and variety of these misdirected requests continued to increase. A bit frustrating perhaps, but not serious enough to justify immediate action. After all, what’s the worst that can happen? The idiot might actually find the blasted thing? Wouldn’t that be nice..

Continue Reading

Perishable Press HTAccess Spring Cleaning, Part 2

Posted on June 17, 2008 in Function by Jeff Starr

[ ~:{*}:~ ] Before Summer arrives, I need to post the conclusion to my seasonal article, Perishable Press HTAccess Spring Cleaning, Part 1. As explained in the first post, I recently spent some time to consolidate and optimize the Perishable Press site-root and blog-root HTAccess files. Since the makeover, I have enjoyed better performance, fewer errors, and cleaner code. In this article, I share some of the changes made to the blog-root HTAccess file and provide a brief explanation as to their intended purpose. Granted, most of the blog-root directives affected by the renovation involve redirecting broken/missing URLs, but there are some other gems mixed in as well. In sharing these deprecated excerpts, I hope to inspire others to improve their own HTAccess and/or configuration files. What an excellent way to wrap up this delightful Spring season! :)

Continue Reading

Series Summary: Building the 3G Blacklist

Posted on May 25, 2008 in Websites by Jeff Starr

[ 3G Stormtrooper ] In the now-complete series, Building the 3G Blacklist, I share insights and discoveries concerning website security and protection against malicious attacks. Each article in the series focuses on unique blacklist strategies designed to protect sites transparently, effectively, and efficiently. The five articles culminate in the release of the next generation 3G Blacklist.

For the record, here is a quick summary of the entire Building the 3G Blacklist series:

Continue Reading

Perishable Press HTAccess Spring Cleaning, Part 1

Posted on May 20, 2008 in Function by Jeff Starr

[ Psychedelic Blossom ] While developing the 3G Blacklist, I completely renovated the Perishable Press site-root and blog-root HTAccess files. Since the makeover, I have enjoyed better performance, fewer errors, and cleaner code. In this article, I share some of the changes made to the root HTAccess file and provide a brief explanation as to their intended purpose and potential benefit. In sharing this information, I hope to inspire others to improve their own HTAccess and/or configuration files. In the next article, I will cover some of the changes made to the blog-root HTAccess file. As always, suggestions and questions are always welcome — just drop a comment below! Have fun!! :)

Continue Reading

Perishable Press 3G Blacklist

Posted on May 13, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

After much research and discussion, I have developed a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Over time, these mega-lists became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

The 3G Blacklist

Work on the 3G Blacklist required several weeks of research, testing, and analysis. During the development process, five major improvements were discovered, documented, and implemented. Using pattern recognition, access immunization, and multiple layers of protection, the 3G Blacklist serves as an extremely effective security strategy for preventing a vast majority of common exploits. The list consists of four distinct parts, providing multiple layers of protection while synergizing into a comprehensive defense mechanism. Further, as discussed in previous articles, the 3G Blacklist is designed to be as lightweight and flexible as possible, thereby facilitating periodic cultivation and maintenance. Sound good? Here it is:

Continue Reading

Building the 3G Blacklist, Part 5: Improving Site Security by Selectively Blocking Individual IPs

Posted on May 13, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. Wrapping up the series with this article, I provide the final key to our comprehensive blacklist strategy: selectively blocking individual IPs. Previous articles also focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. In the next article, these five articles will culminate in the release of the next generation 3G Blacklist.

Improving Site Security by Selectively Blocking Individual IPs

The final component of the 3G Blacklist establishes a vehicle through which individual IPs may be blocked. As previously discussed, the goal of the 3G Blacklist involves moving away from extensive lists of blacklisted user agents, IP addresses, and other identifying attributes. Nonetheless, a comprehensive security strategy benefits from the ability to quickly and easily neutralize threats by directly blocking individual IP addresses.

In my experience, an effective IP blacklist should be kept as current as possible. Unfortunately, once lists grow to over 100 entries, they become quite unmanageable. Periodic verification of blocked IPs becomes increasingly difficult as lists grow in size. Uncultivated blacklists that continue to grow in size will eventually do more harm than good. Beyond the burden placed on server resources, constantly changing IP ownership inevitably results in unintentional blocking of legitimate traffic.

Just as with individual blocking of user agents, blocking individual (or ranges of) IP addresses provides a secondary layer of protection against direct attacks, corrupt networks, and other case-specific situations. This level of granular control enables defensive fine-tuning for targeting and blocking specific machines. Known repeat offenders should remain permanently blacklisted, whereas short-term offenders should be blocked temporarily. Examples of known repeat offenders typically include:

Continue Reading

Building the 3G Blacklist, Part 4: Improving the RedirectMatch Directives of the Original 2G Blacklist

Posted on May 12, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this fourth article, I build upon previous ideas and techniques by improving the directives contained in the original, 2G Blacklist. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist.

Improving the RedirectMatch Directives of the Original 2G Blacklist

In the first version (2G) of the next-generation blacklist, a collection of malicious attack strings were compiled from access logs, blocked via Apache’s RedirectMatch directive, tested for effectiveness and transparency (i.e., non-interference), and finally converted into the 2G Blacklist. This 2G Blacklist continues to serve as my primary line of defense here at Perishable Press, replacing several other lists, including a million individual IP addresses and even the Ultimate htaccess Blacklist itself. So far, the results have extraordinary.

Of course, as discussed in Building the 3G Blacklist, Part 3, static, “fix-it-and-forget-it”-type strategies ultimately fail when dealing with the perpetually evolving nature of the enemy (i.e., malicious attackers, crackers, and other scumbags). Thus, as stated in the original 2G article, the blacklist is a work in progress, designed for periodic updates as new attack patterns and trends are identified. In the remainder of this article, we consider various aspects of the previous blacklist and consider various improvements. Note that it is not necessary to assemble and transcribe any of the new blacklist directives presented herein. At the conclusion of this series, each of the different strategies will be combined into a single, comprehensive security strategy: The 3G Blacklist!

Continue Reading

Building the 3G Blacklist, Part 3: Improving Site Security by Selectively Blocking Rogue User Agents

Posted on May 6, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this third article, I discuss targeted, user-agent blacklisting and present an alternate approach to preventing site access for the most prevalent and malicious user agents. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist.

Improving Site Security by Selectively Blocking Rogue User Agents

Several months ago, while developing improved methods for protecting websites against malicious attacks, I decided to remove the Ultimate htaccess Blacklist from the Perishable Press domain. In its place, I have been using the 2G Blacklist, individual IP blocking, and an extremely concise selection of blacklisted user-agents. So far, this new, streamlined strategy has proven just as (if not more) effective as the previous “ultimate-blacklist” method.

There are several reasons why using extensive lists of blocked user agents inevitably fails to provide worthwhile protection. First, creating and maintaining a current list of every nasty agent consumes far too much time. Once lists become significantly populated, efficient cultivation becomes increasingly difficult. Once an agent is added to the list, it typically remains there indefinitely, unless the required time is spent to periodically verify and edit each and every entry.

Continue Reading

Building the 3G Blacklist, Part 2: Improving Site Security by Preventing Malicious Query-String Exploits

Posted on May 5, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

In this continuing five-article series, I share insights and discoveries concerning website security and protecting against malicious attacks. In this second article, I present an incredibly powerful method for eliminating malicious query string exploits. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist.

Improving Site Security by Preventing Malicious Query String Exploits

A vast majority of website attacks involves appending malicious query strings onto legitimate, indexed URLs. Any webmaster serious about site security is well-familiar with the following generalized access-log entries:

Continue Reading

Building the 3G Blacklist, Part 1: Improving Site Security by Recognizing and Exploiting Server Attack Patterns

Posted on May 4, 2008 in Websites by Jeff Starr

[ 3G Stormtroopers ]

In this series of five articles, I share insights and discoveries concerning website security and protecting against malicious attacks. In this first article of the series, I examine the process of identifying attack trends and using them to immunize against future attacks. Subsequent articles will focus on key blacklist strategies designed to protect your site transparently, effectively, and efficiently. At the conclusion of the series, the five articles will culminate in the release of the next generation 3G Blacklist.

Improving Site Security by Recognizing and Exploiting Server Attack Patterns

Crackers, spammers, scrapers, and other attackers are getting smarter and more creative with their methods. It is becoming increasingly difficult to identify elements of an attack that may be effectively manipulated to prevent further attempts. For example, attackers almost exclusively employ decentralized botnets to execute their attacks. Such decentralized networks consist of large numbers of compromised computer systems that are remotely controlled to post data, run scripts, and request resources. Each compromised computer used in an attack is associated with a unique IP, thus making it impossible to prevent future attempts by blocking one or more addresses:

Continue Reading

How to Block Proxy Servers via htaccess

Posted on April 20, 2008 in Function by Jeff Starr

Not too long ago, a reader going by the name of bjarbj78 asked about how to block proxy servers from accessing her website. Apparently, bjarbj78 had taken the time to compile a proxy blacklist of over 9,000 domains, only to discover afterwards that the formulated htaccess blacklisting strategy didn’t work as expected:

deny from proxydomain.com proxydomain2.com

Blacklisting proxy servers by blocking individual domains seems like a futile exercise. Although there are a good number of reliable, consistent proxy domains that could be blocked directly, the vast majority of such sites are constantly changing. It would take a team of professionals working around the clock just to keep up with them all.

As explained in my reply to bjarbj78’s comment, requiring Apache to process over 9,000 htaccess entries for every request could prove disastrous:

Continue Reading