Plugin Sale! Save 15% on pro plugins with discount code: FALL2020
Web Dev + WordPress + Security

Invite Only: Traffic Control via Whitelist

Web developers trying to control comment-spam, bandwidth-theft, and content-scraping must choose between two fundamentally different approaches: selectively deny target offenders (the “blacklist” method) or selectively allow desirable agents (the “opt-in”, or “whitelist” method).

Currently popular according to various online forums and discussion boards is the blacklist method. The blacklist method requires the webmaster to create and maintain a working list of undesirable agents, usually blocking their access via htaccess or php. The downside of blacklisting is that it requires considerable effort to stay current with the exponential number of ever-evolving threats, which require exceedingly long lists for an effective response.

Although time-consuming and potentially work-intensive (there are automated methods of blacklisting bad bots), blacklisting optimizes hits by allowing site access to anyone not on the blacklist. Unfortunately for blacklisters, it has become relatively trivial to disguise bots by using standard user-agent strings. So the bad guys bypass the blacklist and slip into your site incognito. Besides, nobody wants to waste valuable time digging through endless access logs. Whereas blacklisting is reactive, whitelisting is proactive..

The whitelist approach

Growing in popularity now is the exclusive, invitation-only “opt-in”, or “whitelist” method. The upside of the whitelist method is that it delivers a devastating blow to online scum trying to spam your site, steal your bandwidth, and scrape your content. Whitelisting is also gaining favor because it eliminates the time-consuming task of maintaining and updating rapidly growing, incredibly long blacklists, which, incidentally, may also affect server performance. With the opt-in method, you simply set a simple block of htaccess rules and you’re done. Nonetheless, the argument against whitelisting is that it is still possible for spambots to disguise themselves and gain access to your site. Not to mention the inevitable consequences of denying access to legitimate users, browsers, and other agents.

It all comes down to choosing between the lesser of two evils: false negatives or false positives. If security is an issue, and you aren’t terribly concerned about allowing absolutely everyone into your site, then whitelisting is probably the best option. Not only will you end up saving massive amounts of time, but you will also eliminate a great deal of spam, bandwidth theft, and stolen content. On the other hand, if you are running a business or operating a public-service oriented site, government site, etc., unintentionally denying access to legitimate users is an absolutely horrifying prospect.

Whitelist example

Here is an example of a extremely restrictive whitelist that blocks everyone except for the major search engines (Google, Yahoo, MSN, Ask) and popular browsers (Firefox, Internet Explorer, Opera, Safari). Everyone — and we mean everyone (e.g., Lynx, cell phones, PDA’s, anonymous agents, etc.) — is denied access. Also be advised that the following whitelist, although common, is far from perfect — search engines are constantly evolving, multiplying, and experimenting. Further research is always advisable;)

# Exclusive whitelist for search engines and browsers

# Google agents
BrowserMatchNoCase Googlebot            good_pass
BrowserMatchNoCase Mediapartners-Google good_pass

# Yahoo agents
BrowserMatchNoCase Slurp                good_pass
BrowserMatchNoCase Yahoo-MMCrawler      good_pass

# MSN agents
BrowserMatchNoCase ^msnbot              good_pass
BrowserMatchNoCase SandCrawler          good_pass

# Ask agents
BrowserMatchNoCase Teoma                good_pass
BrowserMatchNoCase Jeeves               good_pass

# Popular browsers
BrowserMatchNoCase ^Mozilla             good_pass
BrowserMatchNoCase ^Opera               good_pass
BrowserMatchNoCase ^MSIE IE		good_pass
BrowserMatchNoCase ^Safari		good_pass

# The bouncer
<Files ~ ".*\..*">
	Order Deny,Allow
	Deny from all
	Allow from env=good_pass
</Files>

This example is meant as a starting point, to give you an idea about how to create and implement your own traffic whitelist. For more information, check out my tutorial on whitelisting good bots.

Jeff Starr
About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
Blackhole Pro: Trap bad bots in a virtual black hole.

2 responses to “Invite Only: Traffic Control via Whitelist”

  1. Seems just like what I was looking for, but isn’t this blackhat SEO?

  2. Jeff Starr
    Perishable 2007/04/03 9:33 am

    Never! ;)

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
USP Pro: Unlimited front-end forms for user-submitted posts and more.
Thoughts
Stoked! Had a great interview with Eric over at Speckyboy.com :)
Air finally clearing here in WA. Feeling grateful to breathe again. #oxygenmatters
Past week here in WA state has been hellish. So much smoke, like living in a chimney.
Now in September, I’m where I wanted to be in March.
Spent some time updating my article on unsafe characters, once again current with latest IETF specification.
Just realized that “Neo” is an anagram for “One”. As in, “he is the One” (The Matrix).
To get VLC app to load all songs (including subfolders), go to Preferences ▸ Show All ▸ Playlist ▸ Subdirectory behavior ▸ Expand.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.