Book Sale! Save 20% on WordPress books with discount code: SAVE20
Web Dev + WordPress + Security

Invite Only: Traffic Control via Whitelist

Web developers trying to control comment-spam, bandwidth-theft, and content-scraping must choose between two fundamentally different approaches: selectively deny target offenders (the “blacklist” method) or selectively allow desirable agents (the “opt-in”, or “whitelist” method).

Currently popular according to various online forums and discussion boards is the blacklist method. The blacklist method requires the webmaster to create and maintain a working list of undesirable agents, usually blocking their access via htaccess or php. The downside of blacklisting is that it requires considerable effort to stay current with the exponential number of ever-evolving threats, which require exceedingly long lists for an effective response.

Although time-consuming and potentially work-intensive (there are automated methods of blacklisting bad bots), blacklisting optimizes hits by allowing site access to anyone not on the blacklist. Unfortunately for blacklisters, it has become relatively trivial to disguise bots by using standard user-agent strings. So the bad guys bypass the blacklist and slip into your site incognito. Besides, nobody wants to waste valuable time digging through endless access logs. Whereas blacklisting is reactive, whitelisting is proactive..

The whitelist approach

Growing in popularity now is the exclusive, invitation-only “opt-in”, or “whitelist” method. The upside of the whitelist method is that it delivers a devastating blow to online scum trying to spam your site, steal your bandwidth, and scrape your content. Whitelisting is also gaining favor because it eliminates the time-consuming task of maintaining and updating rapidly growing, incredibly long blacklists, which, incidentally, may also affect server performance. With the opt-in method, you simply set a simple block of htaccess rules and you’re done. Nonetheless, the argument against whitelisting is that it is still possible for spambots to disguise themselves and gain access to your site. Not to mention the inevitable consequences of denying access to legitimate users, browsers, and other agents.

It all comes down to choosing between the lesser of two evils: false negatives or false positives. If security is an issue, and you aren’t terribly concerned about allowing absolutely everyone into your site, then whitelisting is probably the best option. Not only will you end up saving massive amounts of time, but you will also eliminate a great deal of spam, bandwidth theft, and stolen content. On the other hand, if you are running a business or operating a public-service oriented site, government site, etc., unintentionally denying access to legitimate users is an absolutely horrifying prospect.

Whitelist example

Here is an example of a extremely restrictive whitelist that blocks everyone except for the major search engines (Google, Yahoo, MSN, Ask) and popular browsers (Firefox, Internet Explorer, Opera, Safari). Everyone — and we mean everyone (e.g., Lynx, cell phones, PDA’s, anonymous agents, etc.) — is denied access. Also be advised that the following whitelist, although common, is far from perfect — search engines are constantly evolving, multiplying, and experimenting. Further research is always advisable;)

# Exclusive whitelist for search engines and browsers

# Google agents
BrowserMatchNoCase Googlebot            good_pass
BrowserMatchNoCase Mediapartners-Google good_pass

# Yahoo agents
BrowserMatchNoCase Slurp                good_pass
BrowserMatchNoCase Yahoo-MMCrawler      good_pass

# MSN agents
BrowserMatchNoCase ^msnbot              good_pass
BrowserMatchNoCase SandCrawler          good_pass

# Ask agents
BrowserMatchNoCase Teoma                good_pass
BrowserMatchNoCase Jeeves               good_pass

# Popular browsers
BrowserMatchNoCase ^Mozilla             good_pass
BrowserMatchNoCase ^Opera               good_pass
BrowserMatchNoCase ^MSIE IE		good_pass
BrowserMatchNoCase ^Safari		good_pass

# The bouncer
<Files ~ ".*\..*">
	Order Deny,Allow
	Deny from all
	Allow from env=good_pass

This example is meant as a starting point, to give you an idea about how to create and implement your own traffic whitelist. For more information, check out my tutorial on whitelisting good bots.

Jeff Starr
About the Author
Jeff Starr = Designer. Developer. Producer. Writer. Editor. Etc.
WP Themes In Depth: Deep dive into WP theme development.

2 responses to “Invite Only: Traffic Control via Whitelist”

  1. Seems just like what I was looking for, but isn’t this blackhat SEO?

  2. Jeff Starr
    Perishable 2007/04/03 9:33 am

    Never! ;)

Comments are closed for this post. Something to add? Let me know.
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
The Tao of WordPress: Master the art of WordPress.
When an app won’t hide, you can force it by holding the ⌥ (option) key and clicking anywhere on the desktop.
Currently having a blast redesigning Plugin Planet, so much work so little time.
Thanks to David McCan over at WebTNG for the awesome BBQ Pro review.
Enjoyed a nice mini-vacation with my fam. Great way to recharge and regroup.
Nice little interview with yours truly over at ThemeIsle. WordPress, web dev & more!
Perishable Press celebrating 16 years online! An incredible, rewarding journey.
Thanks to Nicholas Ferrell for his excellent and thorough review of Wutsearch search-engine launchpad.
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.