As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against malicious behavior, it may certainly provide another layer of protection.
Even so, there are several things to consider before choosing to implement an extensive user-agent blacklist on your site. First and most importantly is the transient nature of the user agent itself. On most systems, the user-agent variable is easy to change, making it possible for bot owners to use any user-agent name they wish. Once a bad bot makes the rounds, becomes known, and is blacklisted, the bot owner need only modify or change its declared user agent and they’re back in business. User-agent names are constantly invented, spoofed, or otherwise altered in order to operate beneath — or above — the virtual radar. Thus, a user-agent blacklist is a high-maintenance affair, requiring continuous cultivation in order to maintain relevancy and effectiveness.
Continue Reading
Check out this sweet composition of aural styles discovered in the stylesheet for the W3C’s website:
/* AURAL STYLES (via W3C) */
@media aural {
h1, h2, h3,
h4, h5, h6 { voice-family: paul, male; stress: 20; richness: 90 }
h1 { pitch: x-low; pitch-range: 90 }
h2 { pitch: x-low; pitch-range: 80 }
h3 { pitch: low; pitch-range: 70 }
h4 { pitch: medium; pitch-range: 60 }
h5 { pitch: medium; pitch-range: 50 }
h6 { pitch: medium; pitch-range: 40 }
li, dt, dd { pitch: medium; richness: 60 }
dt { stress: 80 }
pre, code, tt { pitch: medium; pitch-range: 0; stress: 0; richness: 80 }
em { pitch: medium; pitch-range: 60; stress: 60; richness: 50 }
strong { pitch: medium; pitch-range: 60; stress: 90; richness: 90 }
dfn { pitch: high; pitch-range: 60; stress: 60 }
s, strike { richness: 0 }
i { pitch: medium; pitch-range: 60; stress: 60; richness: 50 }
b { pitch: medium; pitch-range: 60; stress: 90; richness: 90 }
u { richness: 0 }
a:link { voice-family: harry, male }
a:visited { voice-family: betty, female }
a:active { voice-family: betty, female; pitch-range: 80; pitch: x-high }
}
Not bad. Listening to this cascading orchestra, I would imagine the sound of a relaxed-yet-formal standards-compliance gentleman carefully articulating the contents of the page.
At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your website against a wide range of malicious activity. Like its 3G predecessor, the 4G Blacklist is designed for use on Apache servers and is easily implemented via HTAccess or the httpd.conf configuration file. In order to function properly, the 4G Blacklist requires two specific Apache modules, mod_rewrite and mod_alias. As with the third generation of the blacklist, the 4G Blacklist consists of multiple parts:
Update Feb 22, 2011: The 5G version of the blacklist is available now in beta.
Continue Reading
I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our websites every minute of the day.
For the most part, there are effective methods available enabling us to protect our sites against the endless hordes of irrelevant and mischievous bots. Such evil is easily blocked with virtually zero side-effects because their presence is simply irrelevant.
But what about bad bots that aren’t exactly irrelevant, such as Yahoo’s mindless Slurp crawler? By disobeying the robots.txt protocol as promised, Yahoo’s Slurp clearly falls into the “bad-bot” category. Unlike typical “nonsense” bots, Slurp is not exactly irrelevant (yet), so simply blocking them is not a reasonable solution.
Continue Reading
Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works by targeting common elements of decentralized server attacks. Consisting of a mere 37 lines, this “2G” Blacklist provided enough protection to enable me to completely eliminate over 350 blacklisting directives from my site’s root htaccess file. This improvement increased site performance and decreased attack rates, however many bad hits were still getting through. More work was needed..
Continue Reading