Tag: htaccess

Canonical URLs and Subdomains with Plesk

Posted on January 6, 2011 in Websites by Jeff Starr

I am in the process of migrating my sites from A Small Orange to Media Temple. Part of that process involves canonicalizing domain URLs to help maximize SEO strategy. At ASO, URL canonicalization required just a few htaccess directives:

# enforce no www prefix
<IfModule mod_rewrite.c>
 RewriteCond %{HTTP_HOST} !^domain\.tld$ [NC]
 RewriteRule ^(.*)$ http://domain.tld/$1 [R=301,L]
</IfModule>

When placed in the web-accessible root directory’s htaccess file, that snippet will ensure that all requests for your site are not prefixed with www. There’s also a force-www technique if that’s how you roll. Either way, the point is that on most shared hosting, URL canonicalization is simple.

Continue Reading

Latest Blacklist Entries

Posted on November 9, 2010 in Websites by Jeff Starr

Recently cleared several megabytes of log files, detecting patterns, recording anomalies, and blacklisting gross offenders. Gonna break it down into three sections:

User Agents

User-agents come and go, and are easily spoofed, but it’s worth a few lines of htaccess to block the more persistent bots that repeatedly scan your site with malicious requests.

# Nov 2010 User Agents
SetEnvIfNoCase User-Agent "MaMa " keep_out
SetEnvIfNoCase User-Agent "choppy" keep_out
SetEnvIfNoCase User-Agent "heritrix" keep_out
SetEnvIfNoCase User-Agent "Purebot" keep_out
SetEnvIfNoCase User-Agent "PostRank" keep_out
SetEnvIfNoCase User-Agent "archive.org_bot" keep_out
SetEnvIfNoCase User-Agent "msnbot.htm)._" keep_out

<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 Deny from env=keep_out
</Limit>

Continue Reading

How to Deal with Content Scrapers

Posted on September 24, 2010 in Websites by Jeff Starr

Chris Coyier of CSS-Tricks recently declared that people should do “nothing” in response to other sites scraping their content. I totally get what Chris is saying here. He is basically saying that the original source of content is better than scrapers because:

  • it’s on a domain with more trust.
  • you published that article first.
  • it’s coded better for SEO than theirs.
  • it’s better designed than theirs.
  • it isn’t at risk for serious penalization from search engines.

If these things are all true, then I agree, you have nothing to worry about. Unfortunately, that’s a tall order for many sites on the Web today. Although most scraping sites are pure and utter crap, the software available for automating the production of decent-quality websites is getting better everyday. More and more I’ve been seeing scraper sites that look and feel authentic because they are using some sweet WordPress theme and a few magical plugins. In the past, it was easy to spot a scraper site, but these days it’s getting harder to distinguish between scraped and original content. Not just for visitors, but for search engines too.

Continue Reading

2010 User-Agent Blacklist

Posted on August 9, 2010 in Websites by Jeff Starr

[ 2010 User-Agent Blacklist ] The 2010 User-Agent Blacklist blocks hundreds of bad bots while ensuring open-access for the major search engines: Google, Bing, Ask, Yahoo, et al. Blocking bad user-agents is an effective addition to any security strategy. It works like this: your site is getting hammered by rogue bots that waste valuable server resources and bandwidth. So you grab a copy of the 2010 UA Blacklist from Perishable Press, include it in your site’s root .htaccess file, and enjoy a more secure and better performing website. It’s that easy.

Proven Security

The 2010 UA Blacklist has been carefully constructed based on rigorous server-log analyses. Obsessive daily log monitoring reveals bad bots scanning for exploits, spamming resources, and wasting bandwidth. While analyzing malicious behavior, evil bots are identified and added to the UA Blacklist. Blocked user-agents are denied access to your site, increasing efficiency and providing safety for your visitors.

Continue Reading

Protect Your Site with a Blackhole for Bad Bots

Posted on July 14, 2010 in Websites by Jeff Starr

[ Black Hole ] One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol, check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.

In five easy steps, you can set up your own Blackhole to trap bad bots and protect your site from evil scripts, bandwidth thieves, content scrapers, spammers, and other malicious behavior.

[ Blackhole Directory with Files ] The Blackhole is built with PHP, and uses a bit of .htaccess to protect the blackhole directory. The blackhole script combines heavily modified versions of the Kloth.net script (for the bot trap) and the Network Query Tool (for the whois lookups). Refined over the years and completely revamped for this tutorial, the Blackhole consists of a single plug-&-play directory that contains the following four files:

Continue Reading

htaccess Code for WordPress Multisite

Posted on July 7, 2010 in WordPress by Jeff Starr

For the upcoming Digging into WordPress update for WordPress 3.0, I have been working with WordPress’ multisite functionality. Prior to version 3.0, WordPress came in two flavors: “original” and “multisite” (MU). Most designers probably work with regular, one-blog installations of “regular” WordPress. The htaccess rules for all single-blog installations of WordPress haven’t changed. They are the same for WordPress 3.0 as they are for all previous versions.

But now that multisite has merged with regular-flavored WordPress, we can stick with single-blog installs (which is how things are setup by default), or we can activate multisite functionality and create an unlimited network of sites. The process is still new and there are bugs that need to be worked out, but eventually it will be a widely used WordPress feature. That said, the htaccess rules used for WordPress Multisite may change as the software continues to evolve.

Continue Reading

2010 IP Blacklist

Posted on July 6, 2010 in Websites by Jeff Starr

Over the course of each year, I blacklist a considerable number of individual IP addresses. Every day, Perishable Press is hit with countless numbers of spammers, scrapers, crackers and all sorts of other hapless turds. Weekly examinations of my site’s error logs enable me to filter through the chaff and cherry-pick only the most heinous, nefarious attackers for blacklisting. Minor offenses are generally dismissed, but the evil bastards that insist on wasting resources running redundant automated scripts are immediately investigated via IP lookup and denied access via simple htaccess directive:

<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 Deny from 123.456.789
</LIMIT>

Although many of the worst attacks happen in randomized, zombie-like fashion, I have found that individual IPs that are not blacklisted will return repeatedly until finally blocked. Yet, despite the short-term success enjoyed by denying access to the most malicious IPs, the long-term futility of such blacklisting reflects the temporary nature of this solution.

In other words, I have found that blocking individual IPs is useful only for limited periods of time. Thus, every year, I gather my code and flush the blacklist of all individually blocked IP addresses. I then start fresh, adding the worst villains to the list, blocking entire IP ranges if necessary, and referring to previous versions of my htaccess files to cross-check suspiciously familiar entities. Eventually, a new blacklist emerges and I share it at Perishable Press. Here is the current version for 2010..

Continue Reading

htaccess Redirect to Maintenance Page

Posted on May 19, 2010 in Function by Jeff Starr

Redirecting visitors to a maintenance page or other temporary page is an essential tool to have in your tool belt. Using HTAccess, redirecting visitors to a temporary maintenance page is simple and effective. All you need to redirect your visitors is the following code placed in your site’s root HTAccess:

# MAINTENANCE-PAGE REDIRECT
<IfModule mod_rewrite.c>
 RewriteEngine on
 RewriteCond %{REMOTE_ADDR} !^123\.456\.789\.000
 RewriteCond %{REQUEST_URI} !/maintenance.html$ [NC]
 RewriteCond %{REQUEST_URI} !\.(jpe?g?|png|gif) [NC]
 RewriteRule .* /maintenance.html [R=302,L]
</IfModule>

That is the official copy-&-paste goodness right there. Just grab, gulp and go. Of course, there are a few more details for those who may be unfamiliar with the process. Let’s look at all the gory details..

Continue Reading

Stop 404 Requests for Mobile Versions of Your Site

Posted on April 26, 2010 in Function by Jeff Starr

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files:

  • http://domain.tld/apple-touch-icon.png
  • http://domain.tld/iphone
  • http://domain.tld/mobile
  • http://domain.tld/mobi
  • http://domain.tld/m

So some bot comes along, assumes that your site includes a mobile version, and then tries its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories. If this only happens once in awhile, it’s no big deal. But these days I’ve been seeing many different bots requesting these nonexistent resources.

Even worse, these mobile-hungry bots can’t seem to remember where they’ve been – they typically request the same resources repeatedly, and in multiple locations within the directory structure. I frequently see hundreds of these types of requests in my weekly error-log analyses. Needless to say, this is an incredible waste of time, bandwidth, and server resources.

Continue Reading

Is it Secret? Is it Safe?

Posted on March 17, 2010 in Function by Jeff Starr

[ Enjoying the Evening ] Whenever I find myself working with PHP or messing around with server settings, I nearly always create a phpinfo.php file and place it in the root directory of whatever domain I happen to be working on. These types of informational files employ PHP’s handy phpinfo() function to display a concise summary of all of your server’s variables, which may then be referenced for debugging purposes, bragging rights, and so on.

While this sort of thing is normally okay, I frequently forget to remove the file and just leave it sitting there for the entire world to look at. This of course is a big “no-no” for site security, because the phpinfo.php file contains a hefty amount of information about my server, including stuff like:

  • The web server version
  • The IP address of the host
  • The version of the operating system
  • The root directory of the web server
  • Configuration information about the remote PHP installation
  • The username of the user who installed php and if they are a SUDO user

Continue Reading

Protect WordPress Against Malicious URL Requests

Posted on December 22, 2009 in WordPress by Jeff Starr

A few months ago, many WordPress sites were attacked with some extremely malicious code. While searching for a good solution, I discovered the following gem of a plugin in the pastebin repository:

<?php /* Plugin Name: Block Bad Queries */

if (strlen($_SERVER['REQUEST_URI']) > 255 || 
	strpos($_SERVER['REQUEST_URI'], "eval(") || 
	strpos($_SERVER['REQUEST_URI'], "base64")) {
		@header("HTTP/1.1 414 Request-URI Too Long");
		@header("Status: 414 Request-URI Too Long");
		@header("Connection: Close");
		@exit;
} ?>

This script checks for excessively long request strings (i.e., greater than 255 characters), as well as the presence of either “eval(” or “base64” in the request URI. These sorts of nefarious requests were implicated in the September 2009 WordPress attacks.

Continue Reading

HTAccess Privacy for Specific IPs

Posted on October 12, 2009 in Function by Jeff Starr

Running a private site is all about preventing unwanted visitors. Here is a quick and easy way to allow access to multiple IP addresses while redirecting everyone else to a custom message page.

To do this, all you need is an HTAccess file and a list of IPs for which you would like to allow access.

Edit the following code according to the proceeding instructions and place into the root HTAccess file of your domain:

# ALLOW ONLY MULTIPLE IPs
<Limit GET POST PUT>
 Order Deny,Allow
 Deny from all
 Allow from 123.456.789
 Allow from 456.789.123
 Allow from 789.123.456
</Limit>
ErrorDocument 403 path/custom-message.html
<Files path/custom-message.html>
 Order Allow,Deny
 Allow from all
</Files>

To prepare this code for use on your site, do these three things:

  1. Edit the three IP addresses to suit your needs. Feel free to add more IPs or remove any that aren’t needed.
  2. Edit both instances of “path/custom-message.html” to match the path and file name of the file that will contain your custom message. This may be anything, anywhere, with any functionality you desire.
  3. That’s it. Copy/paste into your site’s root htaccess file, upload, test, and get out!

Continue Reading

Disable Trace and Track for Better Security

Posted on September 6, 2009 in Function by Jeff Starr

The shared server on which I host Perishable Press was recently scanned by security software that revealed a significant security risk. Namely, the HTTP request methods TRACE and TRACK were found to be enabled on my webserver. The TRACE and TRACK protocols are HTTP methods used in the debugging of webserver connections.

Although these methods are useful for legitimate purposes, they may compromise the security of your server by enabling cross-site scripting attacks (XST). By exploiting certain browser vulnerabilities, an attacker may manipulate the TRACE and TRACK methods to intercept your visitors’ sensitive data. The solution, of course, is disable these methods on your webserver.

Continue Reading

HTAccess Password-Protection Tricks

Posted on July 13, 2009 in Function by Jeff Starr

Recently a reader asked about how to password-protect a directory for every specified IP while allowing open access to everyone else. In my article, Stupid htaccess Tricks, I show how to password-protect a directory for every IP except the one specified, but not for the reverse case. In this article, I will demonstrate this technique along with a wide variety of other useful password-protection tricks, including a few from my Stupid htaccess Tricks article. Before getting into the juicy stuff, we’ll review a few basics of HTAccess password protection.

Continue Reading

Secure Visitor Posting for WordPress

Posted on June 1, 2009 in WordPress by Jeff Starr

[ ~{*}~ ] Normally, when visitors post a comment to your site, specific types of client data are associated with the request. Commonly, a client will provide a user agent, a referrer, and a host header. When any of these variables is absent, there is good reason to suspect foul play. For example, virtually all browsers provide some sort of user-agent name to identify themselves. Conversely, malicious scripts directly posting spam and other payloads to your site frequently operate without specifying a user agent. In the Ultimate User-Agent Blacklist, we account for the “no-user-agent” case in the very first directive, preventing a host of anonymous visitors from hitting the site.

In addition to empty user-agent strings, malicious requests for site content frequently fail to provide any referrer information. Unless special privacy software is being used, the web page from which a visitor has arrived at your site will be specified in the header information for that request. Likewise, when a visitor posts a comment at your site, the referrer string for that post request will be the URL of that particular page. Thus, as with blank user-agent requests, no-referrer requests are frequently indicative of spam and other malicious behavior.

Another important piece of information provided by all legitimate clients is the host request header. The host header specifies the Internet host and port number of the requested resource. This information is required for all clients making HTTP/1.1 requests. Thus, requiring the host request-header field for all posts to your site safely eliminates illicit requests from hitting your server.

Continue Reading

HTAccess Spring Cleaning 2009

Posted on May 11, 2009 in Function by Jeff Starr

Just like last year, this Spring I have been taking some time to do some general maintenance here at Perishable Press. This includes everything from fixing broken links and resolving errors to optimizing scripts and eliminating unnecessary plugins. I’ll admit, this type of work is often quite dull, however I always enjoy the process of cleaning up my HTAccess files. In this post, I share some of the changes made to my HTAccess files and explain the reasoning behind each modification. Some of the changes may surprise you! ;)

Continue Reading

4G Series: The Ultimate Referrer Blacklist, Featuring Over 8000 Banned Referrers

Posted on April 21, 2009 in Websites by Jeff Starr

You have seen user-agent blacklists, IP blacklists, 4G Blacklists, and everything in between. Now, in this article, for your sheer and utter amusement, I present a collection of over 8000 blacklisted referrers.

For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the Mainstream Media website and happened to follow a link to your site (of all places), you would look at your access logs, notice Johnny’s visit, and speak out loud (slowly): “hmmm.. it looks like the Mainstream Media website referred my good pal Johnny to my Alka-Seltzer sales page.” In such a bizarre case, the Mainstream Media website — or specific page — is referred to as (no pun intended) the referrer.

Continue Reading

4G Series: The Ultimate User-Agent Blacklist, Featuring Over 1200 Bad Bots

Posted on March 29, 2009 in Websites by Jeff Starr

[ Image: Inverted Eclipse ] As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against malicious behavior, it may certainly provide another layer of protection.

Even so, there are several things to consider before choosing to implement an extensive user-agent blacklist on your site. First and most importantly is the transient nature of the user agent itself. On most systems, the user-agent variable is easy to change, making it possible for bot owners to use any user-agent name they wish. Once a bad bot makes the rounds, becomes known, and is blacklisted, the bot owner need only modify or change its declared user agent and they’re back in business. User-agent names are constantly invented, spoofed, or otherwise altered in order to operate beneath — or above — the virtual radar. Thus, a user-agent blacklist is a high-maintenance affair, requiring continuous cultivation in order to maintain relevancy and effectiveness.

Continue Reading

The Perishable Press 4G Blacklist

Posted on March 16, 2009 in Websites by Jeff Starr

[ 4G Stormtrooper ] At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your website against a wide range of malicious activity. Like its 3G predecessor, the 4G Blacklist is designed for use on Apache servers and is easily implemented via HTAccess or the httpd.conf configuration file. In order to function properly, the 4G Blacklist requires two specific Apache modules, mod_rewrite and mod_alias. As with the third generation of the blacklist, the 4G Blacklist consists of multiple parts:

Update Feb 22, 2011: The 5G version of the blacklist is available now in beta.

Continue Reading

Building the Perishable Press 4G Blacklist

Posted on March 8, 2009 in Websites by Jeff Starr

[ Building the Hoover Dam, Part 1 ]

Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..

Continue Reading

Controlling Proxy Access with HTAccess

Posted on February 22, 2009 in Function by Jeff Starr

In my recent article on blocking proxy servers, I explain how to use HTAccess to deny site access to a wide range of proxy servers. The method works great, but some readers want to know how to allow access for specific proxy servers while denying access to as many other proxies as possible.

Fortunately, the solution is as simple as adding a few lines to my original proxy-blocking method. Specifically, we may allow any requests coming from our whitelist of proxy servers by testing Apache’s HTTP_REFERER variable, like so:

RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-01.domain.tld(.*)
RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-02.domain.tld(.*)
RewriteCond %{HTTP_REFERER} !(.*)allowed-proxy-03.domain.tld(.*)

Continue Reading

Eight Ways to Blacklist with Apache’s mod_rewrite

Posted on February 3, 2009 in Function by Jeff Starr

With the imminent release of the next series of (4G) blacklist articles here at Perishable Press, now is the perfect time to examine eight of the most commonly employed blacklisting methods achieved with Apache’s incredible rewrite module, mod_rewrite. In addition to facilitating site security, the techniques presented in this article will improve your understanding of the different rewrite methods available with mod_rewrite.

Blacklist via Request Method

[ #1 ] This first blacklisting method evaluates the client’s request method. Every time a client attempts to connect to your server, it sends a message indicating the type of connection it wishes to make. There are many different types of request methods recognized by Apache. The two most common methods are GET and POST requests, which are required for “getting” and “posting” data to and from the server. In most cases, these are the only request methods required to operate a dynamic website. Allowing more request methods than are necessary increases your site’s vulnerability. Thus, to restrict the types of request methods available to clients, we use this block of Apache directives:

Continue Reading

PHP Short Open Tag: Convenient Shortcut or Short Changing Security?

Posted on January 12, 2009 in Function by Bill Brown

[ Echo Shortcut Code ] Most of us learned how to use “echo()” in one of our very first PHP tutorials. That was certainly the case for me. As a consequence, I never really had a need to visit PHP’s documentation page for echo(). On a recent visit to Perishable Press, I saw a Tumblr post from Jeff about the use of PHP’s shortcut syntax for echo() but somewhere deep in my memory, there lurked a warning about its use. I decided to investigate.

Continue Reading

Redirect All (Broken) Links from any Domain via HTAccess

Posted on December 31, 2008 in Function by Jeff Starr

Here’s the scene: you have been noticing a large number of 404 requests coming from a particular domain. You check it out and realize that the domain in question has a number of misdirected links to your site. The links may resemble legitimate URLs, but because of typographical errors, markup errors, or outdated references, they are broken, leading to nowhere on your site and producing a nice 404 error for every request. Ugh. Or, another painful scenario would be a single broken link on a highly popular site. For example, you may have one of your best posts mentioned in the SitePoint forums, but the person leaving the link completely botched the job:

Continue Reading

Redirect WordPress Individual Category Feeds to Feedburner via HTAccess

Posted on December 15, 2008 in WordPress by Jeff Starr

Time for another Feedburner redirect tutorial! In our previous FeedBurner-redirect post, I provide an improved HTAccess method for redirecting your site’s main feed and comment feed to their respective Feedburner URLs. In this tutorial, we are redirecting individual WordPress category feeds to their respective FeedBurner URLs. We will also look at the complete code required to redirect all of the above: the main feed, comments feed, and of course any number of individual category feeds. Let’s jump into it..

Continue Reading