6G Firewall Beta

♦ Posted by Jeff Starr in .htaccess, Security

Updated August 9, 2022 • 185 comments

[ 6G Blacklist (beta) ] Since releasing the 5G Blacklist earlier this year, malicious server scans and bad requests have surged with more novel attacks than I’ve seen since first getting into this stuff six years ago. In other words, now is the time to beef up security and lock things down. If you’re into monitoring your server and knowing your traffic, you may be observing the same recent spike in malicious activity. In response to these attacks, I’ve been secretly working on the next generation of G-series blacklist, the inevitable 6G Firewall.

Update: Check out the new and improved 6G Firewall »

Featured in this jam-packed post:

The 6G Firewall – beta version
Development strategy (building the 6G)
Additional resources (article series)
Credits and Thanks
Important notes.. (read first!)

Before getting started, take a moment to read thru the important notes, which contain information about using blacklists, server requirements, licensing, and other details. Then after presenting the 6G beta, we’ll jog through some of the thinking and strategy going into the code. Even without trying the blacklist, reading through “building the 6G Blacklist” should prove a beneficial exercise in pattern-matching and protecting against malicious HTTP behavior.

6G Blacklist beta

The 6G consists of the following sections:

# 6G:[REQUEST STRINGS]
# 6G:[QUERY STRINGS]
# 6G:[USER AGENTS]
# 6G:[REFERRERS]
# 6G:[BAD IPS]

Each of these sections works independently of the others, such that you could, say, omit the entire query-string and IP-address blocks and the remaining sections would continue to work just fine. Mix-n-match to suit your needs. This code is formatted for deployment in your site’s root .htaccess file.

# 6G BLACKLIST/FIREWALL (beta)
# @ https://perishablepress.com/6g-beta/

# 6G:[REQUEST STRINGS]
<ifModule mod_alias.c>
 RedirectMatch 403 /(\$|\*)/?$
 RedirectMatch 403 (?i)(<|>|:|;|\'|\s)
 RedirectMatch 403 (?i)([a-zA-Z0-9]{18})
 RedirectMatch 403 (?i)(https?|ftp|php)\:/
 RedirectMatch 403 (?i)(\"|\.|\_|\&|\&amp)$
 RedirectMatch 403 (?i)(\=\\\'|\=\\%27|/\\\'/?)\.
 RedirectMatch 403 (?i)/(author\-panel|submit\-articles)/?$
 RedirectMatch 403 (?i)/(([0-9]{5})|([0-9]{6}))\-([0-9]{10})\.(gif|jpg|png)
 RedirectMatch 403 (?i)(\,|//|\)\+|/\,/|\{0\}|\(/\(|\.\.|\+\+\+|\||\\\"\\\")
 RedirectMatch 403 (?i)/uploads/([0-9]+)/([0-9]+)/(cache|cached|wp-opt|wp-supercache)\.php
 RedirectMatch 403 (?i)\.(asp|bash|cfg|cgi|dll|exe|git|hg|ini|jsp|log|mdb|out|sql|svn|swp|tar|rar|rdf|well)
 RedirectMatch 403 (?i)/(^$|1|addlink|btn_hover|contact?|dkscsearch|dompdf|easyboard|ezooms|formvars|fotter|fpw|i|imagemanager|index1|install|iprober|legacy\-comments|join|js\-scraper|mapcms|mobiquo|phpinfo|phpspy|pingserver|playing|postgres|product|register|scraper|shell|signup|single\-default|t|sqlpatch|test|textboxes.css|thumb|timthumb|topper|tz|ucp_profile|visit|webring.docs|webshell|wp\-lenks|wp\-links|wp\-plugin|wp\-signup|wpcima|zboard|zzr)\.php
 RedirectMatch 403 (?i)/(\=|\$\&|\_mm|administrator|auth|bytest|cachedyou|cgi\-|cvs|config\.|crossdomain\.xml|dbscripts|e107|etc/passwd|function\.array\-rand|function\.parse\-url|livecalendar|localhost|makefile|muieblackcat|release\-notes|rnd|sitecore|tapatalk|wwwroot)
 RedirectMatch 403 (?i)(\$\(this\)\.attr|\&pws\=0|\&t\=|\&title\=|\%7BshopURL\%7Dimages|\_vti\_|\(null\)|$itemURL|ask/data/ask|com\_crop|document\)\.ready\(fu|echo.*kae|eval\(|fckeditor\.htm|function.parse|function\(\)|gifamp|hilton.ch|index.php\&amp\;quot|jfbswww|monstermmorpg|msnbot\.htm|netdefender/hui|phpMyAdmin/config|proc/self|skin/zero_vote|/spaw2?|text/javascript|this.options)
</ifModule>

# 6G:[QUERY STRINGS]
<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} !^/$ [NC]
 RewriteCond %{QUERY_STRING} (mod|path|tag)= [NC,OR]
 RewriteCond %{QUERY_STRING} ([a-zA-Z0-9]{32}) [NC,OR]
 RewriteCond %{QUERY_STRING} (localhost|loopback|127\.0\.0\.1) [NC,OR]
 RewriteCond %{QUERY_STRING} (\?|\.\./|\.|\*|:|;|<|>|'|"|\)|\[|\]|=\\\'$|%0A|%0D|%22|%27|%3C|%3E|%00|%2e%2e) [NC,OR]
 RewriteCond %{QUERY_STRING} (benchmark|boot.ini|cast|declare|drop|echo.*kae|environ|etc/passwd|execute|input_file|insert|md5|mosconfig|scanner|select|set|union|update) [NC]
 RewriteRule .* - [F,L]
</IfModule>

# 6G:[USER AGENTS]
<ifModule mod_setenvif.c>
 #SetEnvIfNoCase User-Agent ^$ keep_out
 SetEnvIfNoCase User-Agent (<|>|'|<|%0A|%0D|%27|%3C|%3E|%00|href\s) keep_out
 SetEnvIfNoCase User-Agent (archiver|binlar|casper|checkprivacy|clshttp|cmsworldmap|comodo|curl|diavol|dotbot|email|extract|feedfinder|flicky|grab|harvest|httrack|ia_archiver|kmccrew|libwww|loader|miner|nikto|nutch|planetwork|purebot|pycurl|python|scan|skygrid|sucker|turnit|vikspider|wget|winhttp|youda|zmeu|zune) keep_out
 <limit GET POST PUT>
  Order Allow,Deny
  Allow from all
  Deny from env=keep_out
 </limit>
</ifModule>

# 6G:[REFERRERS]
<IfModule mod_rewrite.c>
 RewriteCond %{HTTP_REFERER} (<|>|'|%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
 RewriteCond %{HTTP_REFERER} ([a-zA-Z0-9]{32}) [NC]
 RewriteRule .* - [F,L]
</IfModule>

# 6G:[BAD IPS]
<Limit GET POST PUT>
 Order Allow,Deny
 Allow from all
 # uncomment/edit/repeat next line to block IPs
 # Deny from 123.456.789
</Limit>

Whoop, there it is, but only for testing at this point. So let me know in the comments or via email with any discoveries on 6G beta. I’ll give it at least a month or so before rolling out the official release of the 6G. This beta version is admittedly heavy-handed in some areas, so plenty of edits are expected in the process of fine-tuning and dialing it in. Your help in this process is HUGE and appreciated by myself and other 6G users.

Alright, that’s that. New beta version, but how does it work? Let’s continue with some of the thinking and strategy going into the 6G Firewall..

Behind the scenes / development strategy

Filtering URL requests with Apache involves various modules and directives:

# 6G:[REQUEST STRINGS] -> mod_alias (RedirectMatch)
# 6G:[QUERY STRINGS] -> mod_rewrite (RewriteCond/RewriteRule)
# 6G:[USER AGENTS] -> mod_setenvif (SetEnvIfNoCase User-Agent)
# 6G:[REFERRERS] -> mod_rewrite (RewriteCond/RewriteRule)
# 6G:[BAD IPS] -> core functionality via Limit (Order Allow,Deny)

These modules enable us to filter different parts of the request, such as the user-agent, referrer, and request-string. They operate both autonomously and cumulatively, providing much control over specific HTTP activity and server traffic in general. Apache gives us numerous ways to blacklist bad requests and block bad user agents, requests & queries to prevent hacking. To better understand how the 6G Firewall works, let’s “zoom-in” on the different modules & directives and examine some concrete examples..

Front Line: Request strings

Apache’s mod_alias module enables our frontline of defense via the RedirectMatch directive. RM is used to filter the actual base part of the URL that is requested on the server. Here are some examples of the types of nasty URL requests that are easily blocked via mod_alias/RM:

http://example.com/wp-content/themes/mimboedited/timthumb.php
http://example.com/themes/SimplePress/timthumb.php?src=http%3a%2f
http://example.com/plugins/auto-attachments/timthumb.php?src=http%3A%2F%2Fpicasa.com.ipsupply.com.au%2Fwp-http://example.com/content%2Fuploads%2F2012%2F03%2FIN.php
http://example.com/timthumb.php?src=http%3a%2f
http://example.com/timthumb.php?src=http%3A%2F%2Fflickr.com.bpmohio.com%2Fbad.php
http://example.com/timthumb/timthumb.php?src=http%3A%2F%2Fflickr.com.bpmohio.com%2Fbad.php
http://example.com/timthumb.php?src=http%3A%2F
http://example.com/themes/coda/timtumb.php?src=
http://example.com/timthumb.php?src=http%3A%2F%2Fpicasa.com.ipsupply.com.au%2Fwp-content%2Fuploads%2F2012%2F03%2FIN.php
http://example.com/timthumb.phptimthumb.php?src=
http://example.com/timthumb.phptimthumb.php?src=

http://example.com/wp-content/themes/chapters/thumb.php?src=http%3a%2f%2fpicasa.combos.orgasmguide.org/tmp.php
http://example.com/wp-content/themes/chapters/thumb.php?src=http%3a%2f%2fpicasa.combos.orgasmguide.org/byroe.php

This is a great example as it shows varieties of possibly the most-scanned-for target ever: timthumb.php and its numerous incarnations. Malicious scanners also frequently target files named thumb.php and similar. Recursive scans can mean hundreds or thousands of requests hitting your server in short periods of time. This drains resources and negatively impacts site performance. As if that’s not reason enough to block such activity, if the target vulnerability is actually found on your server, it’s “game over”. So the 6G protects by blocking requests for both thumb.php and timthumb.php, using logic similar to this:

RedirectMatch 403 (?i)/(thumb|timthumb)\.php

That one line in your .htaccess file will block all URL requests that include either thumb.php and timthumb.php (not including the query string). This helps keep many malicious requests at bay, freeing up valuable resources for legit requests. Note that if you are timthumb or similar “thumb” script for your site, you will need to remove the thumb|timthumb| string from 6G (REQUEST STRINGS section).

The first “REQUEST-STRINGS” section in the 6G uses this strategy to block many different types of malicious requests. With each generation of the 6G, the various rules and patterns are further refined and updated to block the most dangerous and relevant types of requests. Pattern-matching with regular expressions enables us to block many different types of threats; however, as precise as we can get, there remain commonly scanned-for targets that are simply too common or too general to block effectively. Consider the following examples:

http://example.com/[path]/share
http://example.com/[path]]/login
http://example.com/[path]/signin
http://example.com/[path]/accepted
http://example.com/[path]/feed.php
http://example.com/[path]/form.php
http://example.com/[path]/format.php
http://example.com/[path]/plugin-editor.php
http://example.com/[path]/post.php
http://example.com/[path]/post-new.php
http://example.com/[path]/wp-comments-post.php
http://example.com/[path]/wp-conf.php
http://example.com/[path]/wp-error.php
http://example.com/[path]/wp-library.php
http://example.com/[path]/wp-post.php
http://example.com/[path]/update.php
http://example.com/[path]/upload.php

In these examples URLs, the target string is the part appearing immediately after the “http://example.com/[path]/”, which is necessary to include in this post because it prevents sloppy search engines and bad bots from following these supposedly “relative” links and generating further 404 errors. But I digress.. the point here is that malicious scans frequently target existing files that are too common to block in a widely distributed firewall such as 6G. If you’re getting hit with many requests for common/well-known files, my best advice is to custom-craft a few rules based on the actual structure and content of your site.

A quick example of this, let’s say the server is getting hammered by malicious requests targeting a file named post-new.php. This file name is common enough to warrant not blacklisting in the 6G, even though it is trivial to block on an individual basis. Here at Perishable Press, I’m running WordPress in a subdirectory named “/wp/”, so I know immediately that I can safely block all requests for post.php that aren’t located in the /wp/ directory.

RewriteCond %{REQUEST_URI} !^/wp/wp-admin/post.php [NC]
RewriteCond %{REQUEST_URI} /post.php [NC]
RewriteRule .* - [F,L]

Similarly, as the post.php file is located in a subdirectory and not root, we can use mod_alias’ RedirectMatch to block all requests for the file in a root-install of WordPress:

RedirectMatch 403 ^/wp-admin/post.php

With either of these methods, other common files are easily added to the rule, safely eliminating extraneous requests for non-existent files. This example serves to demonstrate one of the shortcomings of any copy/paste blacklist, while illustrating the importance of customizing and fine-tuning your own security strategy.

Filtering Query strings

Some URLs include a query-string, which is appended to the URL via question mark (?). Query strings tend to look like gibberish or random strings to the uninitiated, but are actually highly specific, well-structured data used to communicate between browser and server. Without knowing what’s happening on your server, it may difficult to discern between good and bad query-string requests, but there are some things to look for:

Unusual and/or unexpected characters such as additional question marks, angled brackets, asterix, and so on
Unencoded characters that should be encoded, such as these: $ & + , / : ; = ? @
Super-long random-looking strings of encoded gibberish, alphanumeric or laced with symbols such as %
Super-short query strings that may seem to terminate abruptly, often with a single quote ('), double quote ("), or equal sign (=)

There are other signs as well, but ultimately it comes down to whether the request is understood or not by the server. If it’s not, the request could be a simple 404 error or similar, or it could be malicious. Generally the one-off 404s are the result of typos or other human errors, and tend to appear sporadically or infrequently in the server-access logs. Contrast this with malicious query-string requests that occur frequently, in rapid succession, targeting non-existent files with encoded gibberish and other nonsense.

With the 5G Blacklist in place, many evil query-string requests are blocked, but with the recent surge of scanning activity, a new breed of encoded nasty was getting through, looking similar to these examples:

?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vY3NzLWltYWdlLWNhY2hpbmcv==
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vaHRtbDUtdGFibGUtdGVtcGxhdGUv==
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vYmFzaWMtZG9zLWNvbW1hbmRzLw==
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vd2hhdC1pcy1teS13b3JkcHJlc3MtZmVlZC11cmwv
?aHR0cDovL3BlcmlzaGFibGVwcmVzcy5jb20vcHJlc3MvMjAwNy8wMS8xNi9tYXhpbXVtLWFuZC1taW5pbXVtLWhlaWdodC1hbmQtd2lkdGgtaW4taW50ZXJuZXQtZXhwbG9yZXIv
?actions=get_wp_version%2Cget_plugins%2Cget_themes%2Csupports_backups%2Cget_filesystem_method&wpr_api_key=15644F32D7D80B3150710834D8F406E9&t=1335026415
?actions=get_wp_version%2Cget_plugins%2Cget_themes%2Csupports_backups%2Cget_filesystem_method&wpr_api_key=15644F32D7D80B3150710834D8F406E9&t=1335026385

As you can see, these malicious strings contain numerous common-denominators that could be matched against, such as:

%2C matching the UTF-8 (hex) encoded encoded comma (,) would be partially effective
== matching two equal signs would be partially effective
Other character combinations..?

We could match the hex-encoded comma, but that’s such a common character that it would cause more problems than it would solve (in most cases), so really not an option. Looking closely at other possible character-combinations, suddenly the “least-common denominator” hits you: long, random sequences of alphanumeric characters appear in all of these examples, and many others that I’ve encountered. Thus, in the query-string section of the 6G, excessively long strings of alphanumeric characters are effectively blocked with the following rule:

RewriteCond %{QUERY_STRING} ([a-zA-Z0-9]{32}) [NC,OR]

Yeah.. the trick here is choosing the optimal number of sequential characters to match against. If we set the match to, say, {16}, the number of false positives increases; conversely, if we set the match to a larger number, such as {64}, the number of false negatives increases. So once again it’s all about finding the balance.

Important note about placement of the 6G query-string rules within the .htaccess file. If the query-string rules don’t seem to be working, try moving them to appear before any other mod_rewrite rules that may be in play. I’m not sure why this is the case, but I think it has something to do with the query-string data being unavailable for processing after the first encounter with mod_rewrite. Any info on this would be appreciated :)

Blocking Bad User-agents

The next two sections in the 6G protect against some of the worst user-agents and referrers from messing with your site. The technique is essentially the same as with the request-string and query-string sections, but filters different properties of the URI request.

The specified user-agent of a request may consist of multiple elements, and it may be empty. Previous versions of the g-series blacklist block empty (or “blank”) user-agents with the following rule:

SetEnvIfNoCase User-Agent ^$ keep_out

This rule “flags” any request from a blank user-agent, and worked well for many years. These days, however, social-media, mobile apps, PayPal, and certain Ajax requests frequently use an empty string as the user-agent when interacting with the server. For example, Google requires the blank user-agent in order to display thumbnails for Google+. So at this point the pros/cons of blocking bad empty requests is a no-brainer and the rule is now “deprecated” (commented-out) with a pound-sign (#).

Beyond this, the 6G USER-AGENTS section includes new rules to block malicious character-strings operating via the user-agent string. The 5G blocks some of the “worst of the worst” known bad user-agents, stuff like:

binlar
nutch
sucker
zmeu

Plus around 20 other nasty agents are blocked in the 5G, with the entire “USER-AGENT” section included as sort of a template for individual customization. Unfortunately, there are increasing numbers of malicious strings being passed as the user-agent, so the 6G includes more protection in this area. The 6G not only blocks additional well-known bad agents, it protects against encoded strings, forbidden characters, and other malicious garbage. Most of this is accomplished with a single new directive:

SetEnvIfNoCase User-Agent (<|>|'|<|%0A|%0D|%27|%3C|%3E|%00|href\s) keep_out

These character strings have no business appearing in the user-agent string. Most if not all of the widely used browsers such as Firefox, Chrome, Opera, IE, mobile browsers, feed readers, and even borderline/questionable scripts and bots refrain from suing any of these forbidden characters in their user-agent description. For example, here is Chrome’s reported user-agent:

Mozilla/5.0 Macintosh Intel Mac OS X 10_6_8 AppleWebKit/536.5 KHTML, like Gecko Chrome/19.0.1084.46 Safari/536.5

Legitimate user-agents contain only valid strings, so blocking illegal characters is an effective way to filter directory-traversals, XSS attacks, and other malicious exploits.

Blocking Bad Referrers

The 6G Firewall/Blacklist also includes new directives for blocking bad referrers. The strategy here is similar to that of the additional QUERY-STRING rules: filtering malicious character-strings to protect against bad referrers. Referrer information isn’t always included with the request, so we don’t want to block blank referrers, but forbidden characters are safely blocked, as are long strings (32 characters or more) of strictly alphanumeric characters. A simple and effective strategy using the following two filters:

RewriteCond %{HTTP_REFERER} (<|>|'|%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
RewriteCond %{HTTP_REFERER} ([a-zA-Z0-9]{32}) [NC]

This also serves as a template for further customization. If you’re seeing lots of weird referrers filling your access/error logs, the REFERRERS section of the 6G will help to curb the riff-raff.

Blocking Bad IPs

Blocking by IP address is best used for specific threats, either individual, or by region, country, or similar. With a strong firewall, blocking IPs is unnecessary unless someone or something is attacking you specifically with requests that aren’t being blocked. I’ve heard from a number of people saying that their sites are being targeted/harassed by weird stalkers, enemies, spurned lovers, and it goes on and on. I’ve experienced this through a chat/forum site that had attracted all sorts of low-life, bottom-feeding douche-bags. They would just jump into the chat at random and ruin the conversation with potty humor and juvenile slurs. The PHP blacklist for the chat script wasn’t catching a lot of the garbage, so it was a perfect time to check the logs and ban the fools individually. After a bit of research and a few lines of .htaccess, the idiots were gone and peace was restored to the chat forum.

Thus, for the purpose of blocking individual threats the “bad-IPs” section of the 6G is entirely optional and intended as a template to use should the need arise. By default, the bad-IPs section in the 6G is empty, but over the past few months I’ve assembled my own private collection of blacklisted IP addresses. These are the some of the worst offenders I’ve seen this year:

# 2012 IP Blacklist

 Deny from 24.213.139.114
 Deny from 87.144.218.222
 Deny from 95.5.32.79
 Deny from 213.251.186.27
 Deny from 88.191.93.186
 Deny from 91.121.136.44
 Deny from 50.56.92.47
 Deny from 174.143.148.105
 Deny from 82.170.168.91
 Deny from 24.213.139.114
 Deny from 61.147.110.14
 Deny from 188.134.42.65
 Deny from 122.164.215.155
 Deny from 65.49.68.173
 Deny from 220.155.1.166
 Deny from 218.38.16.26
 Deny from 50.56.92.47
 Deny from 24.213.139.114
 Deny from 91.200.19.84
 Deny from 31.44.199.131
 Deny from 49.50.8.63

Including these IPs is entirely optional — they are provided here mostly for reference, but also for über-paranoid faction ;)

Thanks to..

Thank you to everyone who contributes to the g-series blacklist with feedback, suggestions, test-results, and links. Specifically for the 6G beta, huge thanks goes to Ken Dawes and Andy Wrigley for their generous help.

Important notes..

This is the beta release of the 6G Blacklist. There have been many improvements, including optimized code, greater accuracy, and better overall protection. I’ve been running the 6G (in its various incarnation) here at Perishable Press for the past several weeks and have been well-pleased with the results. The 6G is pretty slick stuff, but there are some important things to keep in mind:

It takes more than a blacklist to secure your site: No one single security measure is perfect; good security is the result of many concerted strategic layers of protection. The 6G is designed to better secure your site by adding a strong layer of protection.
Sometimes blacklists block legit requests: A perfect firewall would block only bad traffic, but in reality it’s inevitable that some good requests get blocked. The goal is to keep the number of false positives to a minimum while maximizing the effectiveness of the ruleset. It’s a statistical game of sorts.
Resolving issues..: If/when you do encounter a potential false positive (e.g., you can’t load a certain page), there is a simple way to determine if it’s that crazy chunk of blacklist code you stuck into your .htaccess file. If you remove the blacklist and the page in question begins to work, well, you’ve can either forget about it or take a few moments to locate the offending rule and remove it from the list. I’ve found that it’s better to “comment out” rather than delete as it’s easier to keep track of things when the inevitable next version of the blacklist hits the streets.
This is beta.: And most importantly, this is the beta version of the 6G. As mentioned, there’s a lot of new stuff happening with this blacklist, and it’s super-important for me to thoroughly test via widest base possible. Only use this code if you are savvy and want to help out by reporting data, errors, logs, or whatever. That said, this “beta” version has been running flawlessly on multiple sites, including one that’s super-complex with many themes, plugins, and customizations (i.e., this site).
It’s all you.: Once the code leaves this site, you assume all responsibility. Always back up your original working .htaccess file and you should be good to go.
Server requirements: Linux/Apache or similar (if adapted). 6G is formatted for deployment in .htaccess files, and also works when formatted for use directly in the Apache main configuration file. For the required Apache modules, see this list.
License: GNU General Public License.

I freely share my work on the g-series blacklist to help the community better protect their sites against malicious activity. If you find it useful, please show support by linking and sharing so others may learn and benefit as well. Thanks.

About the Author

Jeff Starr = Web Developer. Book Author. Secretly Important.

185 responses to “6G Firewall Beta”

Gizmoscoop 2012/10/17 9:53 am

I have Link Checker plugin installed, and it’s showing a lot of the 403 Forbidden errors in my links. I went back to 5G list, did a recheck of Link Checker and everything went back to normal. This is also the reason why Googlebot can’t crawl my website. I got the same 403 errors in Google Toolmaster as well. I suspect something in the [QUERY STRINGS] module was causing this, just haven’t figured out the culprits yet.
- Jeff Starr 2012/10/17 11:48 am • Post Author
  
  I will look into this if you can provide some URLs that were leading to 403 errors. The last I checked everything was working great, but something may have changed..
  - Gizmoscoop 2012/10/17 9:36 pm
    
    Here are a few, even when I installed the 5G list. Something is definitely block the Googlebot from crawling. I just don’t get it, I was able to view the post normally elsewhere, but Googlebot got a 403 access denial when crawling. This causes my organic search to tank drastically.
    
    1) http://carguideblog.com/12227/2011-chevrolet-camaro-2ls-model-30-mpg-coming/
    
    2) http://carguideblog.com/9755/porsche-plans-unveil-midengine-sports-car-911-models/
  - Gizmoscoop 2012/10/17 9:38 pm
    
    There were about 1,500 urls on my website that got this 403 access denial error.
  - Jeff Starr 2012/10/17 10:23 pm • Post Author
    
    Are you using the 5G or 6G? The 6G is still being tested, but the 5G is known to play nice with the major search engines.
    
    Can you paste a few 403 requests from the server’s access/error log? (plz omit/edit any sensitive data before posting) That would enable a closer look. Based on just the URL examples, there is no reason that 5G/6G would block.
    
    As for Broken Link Checker, I’ll do some testing and see what’s up. Again, the trick is examining the BLC requests via server access logs. All shall be revealed :)
Gizmoscoop 2012/10/17 10:41 pm

At first I used the 6G, then switched to the 5G since last night, but I still got the 403 errors in Google Toolmaster today. So, I am taking them off for the time being to see if the 5G or 6G list is the culprit, and I’ll start eliminating some of the modules from there on. I’ll go check the server access logs. I do hate looking at those logs; they are long small. :(
- Jeff Starr 2012/10/17 10:54 pm • Post Author
  
  Yes, server data would be great, even better if you can narrow it down :)
  
  Also, when referring to the “403 errors in Google Toolmaster”, are you referring to something like a “live” URL checker or the actual crawl errors (as fetched by googlebot)?
  
  Thanks for the help, btw.
  - Gizmoscoop 2012/10/18 8:20 am
    
    The actual crawl errors.
  - Gizmoscoop 2012/10/22 1:30 am
    
    Here are the crawl errors I found. Does this mean that the list is working or there is something wrong? My website is functioning normally.
    
    [Mon Oct 22 03:12:31 2012] [error] [client 74.125.17.209] SoftException in Application.cpp:574: Could not execute script "/home/xxxxx/public_html/www.carguideblog.com/index.php", referer: http://carguideblog.com/
    [Mon Oct 22 03:12:31 2012] [error] [client 74.125.17.209] SoftException in Application.cpp:574: Could not execute script "/home/xxxxx/public_html/www.carguideblog.com/wp-content/plugins/wordpress-popular-posts/timthumb.php", referer: http://carguideblog.com/
    [Mon Oct 22 02:41:04 2012] [error] [client 195.149.120.10] SoftException in Application.cpp:574: Could not execute script "/home/xxxxx/public_html/www.carguideblog.com/wp-content/themes/Cayon/timthumb.php"
    [Mon Oct 22 02:41:03 2012] [error] [client 195.149.120.10] SoftException in Application.cpp:574: Could not execute script "/home/xxxxx/public_html/www.carguideblog.com/index.php"
    
    Editor’s note: removed some redundant log items.
  - Jeff Starr 2012/10/22 11:33 pm • Post Author
    
    Thanks for posting. I’m unable to find anything that the 5G would block in these requests. The 5G denies access based on request strings, query strings, and user agents. These log entries include no query-strings, and the only user-agents that are blocked are known to be malicious (i.e., not googlebot). So that leaves request strings, which for example would be the /home/xxxxx/public_html/www.carguideblog.com/index.php portion of the logged requests. From what I can tell, the 5G doesn’t match anything in this part of the requests, but the xxxxx is unknown (to me) and may indeed be the culprit.
    
    I recommend including only the REQUEST STRINGS section of the 5G and see what happens. If the error persists, remove the entire section and then check again. The problem could be 5G related, or it could be something more complicated. Some reading on that particular error generates several potential causes:
    
    CPU limitations
    
    Improper file permissions
    
    Interfering .htaccess rules
    
    Server misconfiguration
    
    Missing files/resources
    
    ..among others. Either way, I recommend removing the 5G until you’re able to investigate further to isolate and resolve the issue.
Peter Mumford 2012/10/23 5:39 am

One of Gizmo’s errors contains “could not execute script … timthumb.php”.

I would hope not! Doesn’t 5G block all attempts to run timthumb?
- Jeff Starr 2012/10/23 12:02 pm • Post Author
  
  5G does not block timthumb requests, 6G does.. also, it’s not known at this point whether Gizmoscoop’s issues are 5G-related, trying to get there.
Gizmoscoop 2012/10/25 6:45 pm

I have been switching back and forth on 5G and 6G, but these last errors came from 5G. The “xxxxx” is the user/database name, I am blocking out for security purposes. I am still getting these “Googlebot couldn’t crawl your URL because your server either requires login to access the page, or is blocking Googlebot from accessing your site. ” errors, and I have been talking with Hostgator, but they couldn’t solve these errors even they white-listed timthumb.
- Jeff Starr 2012/10/25 9:51 pm • Post Author
  
  I’m unable to replicate this issue with 5G or 6G, but will keep trying and report back with any news. In the meantime, please disable the 5G/6G unless for testing purposes.
  - Gizmoscoop 2012/10/30 12:24 am
    
    I just want to update on the status. It seems that Hostgator has been blocking some of the bots through their mod_security on the server side. This also causes some false negatives for the googlebot and timthumb. So they whitelisted both. I don’t get those errors from googlebot anymore, but other bots still get when accessing my site, which I don’t really care. I think Hostgator did mod_security on the server side is similar to your 6G. I managed to get the link to its mod_security. Here is the link if you are interested in it. http://updates.atomicorp.com/channels/rules/delayed/modsec/
  - Jeff Starr 2012/10/30 1:07 am • Post Author
    
    Well that’s good news. Glad to hear you’ve resolved the issue and that it’s not 5G/6G related. Thanks for the update!
photocurio 2012/10/26 6:00 am

I hope you don’t mind me pointing out something obvious, but when I had a persistent 403 error problem, my helpful hosting company pointed out I had 3 .htaccess files, in different folders. The problem was not in my root .htaccess, but a different one.
- Gizmoscoop 2012/10/26 8:50 am
  
  So, did you delete the other two and keep only one in the root?
  - photocurio 2012/10/26 8:58 am
    
    I can’t remember what I did, but I realized I had to check several .htaccess files, not just one.
Philippe 2012/10/28 2:08 am

Hello Jeff

I just noticed that some instructions in the # 6G:[QUERY STRINGS] are conflicting with the blackhole protection. It drives to a 403 forbidden.
I didn’t find what to change to make the process.php? file accessible.
- Jeff Starr 2012/10/29 12:55 am • Post Author
  
  Hi Philippe,
  
  Which URL(s) result in 403 errors and I’ll check it out..
Philippe 2012/10/29 1:18 am

It happens when trying to access the page “You have fallen into a trap!” via this URL:

process.php?h=ddd93bf16b7bdb97d916f9833a6e3456
- Jeff Starr 2012/10/29 1:03 pm • Post Author
  
  It looks like it’s triggering the block for strings of 32 characters or more, which will be removed (or modified) in the 6G final. For now you can simply comment out (or remove) the following line:
  
  RewriteCond %{QUERY_STRING} ([a-zA-Z0-9]{32}) [NC,OR]
Philippe 2012/10/29 11:50 pm

Done.
I forgot it was still on beta.
Thank you so much Jeff for helping, for the fantastic job you’re doing, and, much more, for your kindness.
- Jeff Starr 2012/10/30 1:10 am • Post Author
  
  You’re most welcome, Philippe! :)
Zach Bui 2012/11/02 7:30 am

Thank you very much Jeff, I just found this site, in trying to read up on securing my websites.

Keep up the good work!
Marcel 2012/11/05 1:41 am

Entry:

RedirectMatch 403 (?i)([a-zA-Z0-9]{18})

Gives problems with google Webmaster tools site verification file: google12xx3456c7de8910.html
Francois 2012/11/13 1:39 pm

Hi

Just a huge thank you for all the valuable info on how to block nasties I have learnt here. The only problem I now have is since I implemented the blocking of proxies as well I have received some complaints from clean visitors getting a 403 when they try and access my site. Hmmm so I had to remove the proxy blocking for now. Not to happy for it but it gets them to visit OK now. My view on proxies have always been if youre hiding behind a proxy WHY ? Normal decent no bad intention visitors dont hind behind proxies if they are just kinda surfing and looking at sites. Correct ? The most extreme blocking I implemented are all lets say undesirable counties using the CIDR list. I have a small cultural site with nice clean content and the crap I get in the bigger picture from these country visitors warrants blocking the whole bloody country fullstop no gaps open. Is that overkill perhaps or paranoid ? Since 2001 I have had every possible thing that can harm the site had a go at it. So extreme attacks warrant extreme measures I reckon ?

Worse one ? And its not even a hijack was scorecardresearch that basically killed my server to a halt and was extremely difficult to remove and block.

Thanks Jeff keep up the good work its much appreciated.

Cheers
Pvman 2012/11/16 2:31 pm

I was looking to increase the security of my website with htaccess rules and stumble upon your hot stuff.

I decided to give it a try at the 6G beta and give you feedbacks. I noticed 2 problems. The line :
– RewriteRule .* - [F,L] : prevent the PayPal banner image to load (loaded from a https url)
– RewriteRule .* - [F,L] : prevent to load an iframe from same domain (haven’t had the time to figure why)
I’ll get back to you when I’ll know a bit more.

Thank you again to share all your precious work !

Cheers
- Jeff Starr 2012/11/19 2:56 am • Post Author
  
  Thanks for the feedback, Pvman. Each of the RewriteRules takes into account each of its respective RewriteConds, many of which block multiple strings. The key to resolving issues is catching the URL of any blocked/non-working pages. If you can provide some, it would be very useful for testing and revising the 6G for the final version.