Protect Against Humans.txt Query-String Scans
I woke up this morning to the sound of thousands of 404 requests hitting the server. It’s sad that there are kiddies out there who have nothing better to do than buy some pathetic $50 script and then sit there like an imbecile harassing people for hours on end. But alas, that is the world we live in — fortunately it’s less than trivial to block the entire scan with just a few lines of good old .htaccess.
About the scans
Before getting to the code, a bit more about the scans that it protects against. Apparently some despearate lowlife released yet another script for scanning sites for vulnerabilities and exploit opportunities. So suddenly there are waves of scans probing for a vulnerability related to the following query-string parameter:
?whatever=http://www.google.com/humans.txt?
Unless this means something to your server, thousands of scans for it are simply wasting time, bandwidth, energy, money, and everything else. Responding with 404 is fine if you’ve optimized the response, otherwise they’re basically leeches sucking the lifeblood from your business.
In a few months (or less) the script should be much less active, but right now there are too many idiots running it, which means that untold numbers of sites are being assaulted with relentless, malicious URL requests by the thousands, because, apparently, it’s not enough to run the script only once per site: they just keep it on auto-loop and repeat the same set of several hundred queries over and over and..
Here is a sampling of the requests made by the “humans.txt” scans:
http://example.com/admin.php?lang=http://www.google.com/humans.txt
http://example.com/zoomstats/libs/dbmax/mysql.php?GLOBALS[\'lib\'][\'db\'][\'path\']=http://www.google.com/humans.txt?
http://example.com/ytb/cuenta/cuerpo.php?base_archivo=http://www.google.com/humans.txt?
http://example.com/yabbse/Sources/Packages.php?sourcedir=http://www.google.com/humans.txt?
http://example.com/xt_counter.php?server_base_dir=http://www.google.com/humans.txt?
http://example.com/xarg_corner_top.php?xarg=http://www.google.com/humans.txt?
http://example.com/xoopsgallery/init_basic.php?GALLERY_BASEDIR=http://www.google.com/humans.txt?&2093085906=1&995617320=2
http://example.com/xarg_corner_bottom.php?xarg=http://www.google.com/humans.txt?
http://example.com/wsk/wsk.php?wsk=http://www.google.com/humans.txt?
http://example.com/xarg_corner.php?xarg=http://www.google.com/humans.txt?
http://example.com/wp-content/plugins/wp-table/js/wptable-button.phpp?wpPATH=http://www.google.com/humans.txt?
http://example.com/wp-content/plugins/wordtube/wordtube-button.php?wpPATH=http://www.google.com/humans.txt?
http://example.com/wp-content/plugins/mygallery/myfunctions/mygallerybrowser.php?myPath=http://www.google.com/humans.txt?
http://example.com/wp-cache-phase1.php?plugin=http://www.google.com/humans.txt?
http://example.com/worldpay_notify.php?mosConfig_absolute_path=http://www.google.com/humans.txt?
http://example.com/wordpress/wp-content/plugins/sniplets/modules/syntax_highlight.php?libpath=http://www.google.com/humans.txt?
.
.
.
This sort of scan is malicious for a number of reasons, but most annoying is the sheer volume of requests now hitting servers looking for a “humans.txt”-related response. Utterly pathetic how many times this scanning script is being executed with nothing more than a single potential payload (i.e., that made possible via the http://www.google.com/humans.txt?
vulnerability). What a waste.
Until the usage of this scan script slows down, you may want to take a moment and check if your server logs show any signs of getting hit — and you’ll know immediately because thousands upon thousands of humans.txt requests are hard to miss.
I could sit here all day and discuss the matter, but my time is limited, and 50-dollar kiddies are simply not worth the effort. Stopping malicious nonsense, however, IS worth my time, so let’s look at a drop-dead simple way to stop the entire “humans.txt” scan cold dead.
Block the humans.txt scanning with .htaccess
As mentioned, all of the requests for this particular scan are targeting, via query string, “humans.txt
”, which makes my job super-easy. Crack open your root .htaccess file and add the following snippet:
# block humans.txt scans
<IfModule mod_rewrite.c>
RewriteCond %{QUERY_STRING} http\:\/\/www\.google\.com\/humans\.txt\? [NC]
RewriteRule .* - [F,L]
</IfModule>
I have this code installed on most of my sites now, and thankfully the pesky scans have stopped completely. Of course, they’ll be back with a new $50 script next month, but until then it’s nice to conserve server resources and keep error logs clear of nonsense.
I can hear it already: “wait, why are you blocking requests for the humans.txt file?” — I’m not, this technique blocks requests that include http://www.google.com/humans.txt?
in the query string. Consider:
# good request, never blocked.
http://example.com/humans.txt
# bad request, always blocked.
http://example.com/?some_path=http://www.google.com/humans.txt?
Hopefully that clears up any potential confusion regarding this simple yet effective solution.
Happy hunting people.
21 responses to “Protect Against Humans.txt Query-String Scans”
Thanks for this. I’m always concerned about these kind of scans- looking for vulnerabilities.
You said you “woke up this morning to the sound of thousands of 404 requests hitting the server”, but I am interested how you knew about these requests in the first place? Do you routinely look through your server logs, and if so, how do you do it?
The issue is that I have over 60 websites on our server and it would be very time consuming to look through each raw log in detail. Is there an easier way?
Yes actually I continually monitor my sites in real-time while awake, and then spend about 30 minutes each morning going over the details and putting together block lists, etc. There are programs that help in the analysis of logs, but I use my own system. Try a search for “server log analysis program” or similar and you should get some good results.
Thanks. I’d be interested to know more about your system. I’ll look into some log analysis programs. It’s also about knowing what to look for!
I’ve been working on further automating my own system (which currently is too much hands-on), and have plans for a plugin/script as well. Right now though it’s all about finding the time for another project :)
Sounds great, I’d be interested in that. However I totally understand the time thing. I’m still lamenting not being able to spend time developing my Twitter app.
Is 5G/6G ever going to become a WP plugin? I use Better WP Security, I wonder whether the two would play ball or whether there could be some kind of integration? Sorry, just my mind working overdrive….
Hmm.. I hadn’t thought of making 5G/6G into a plugin as it’s entirely .htaccess, just copy/paste. But it’s an interesting idea that I’ll be rolling around in my head.. I would have to look at Better WP Security, but unless it’s doing something weird, should work fine with a 5G plugin.. What happened with your Twitter app, is what I want to know..
Yes, pasting in to .htaccess is the best way, but less techie people might find that hard. I know that Better WP Security has an option to ban bad bots using a list. I never use that option because it blocks image and summary previews with the likes of LinkedIn. I think an option to use 5G/6G would be good there. However, the great thing about pasting it into .htaccess is that you don’t have to include all lines if you don’t want.
My Twitter app is called Twools. It was my first proper PHP app let alone Twitter app. I also made it into a WordPress plugin. It offers quite powerful filters for your Twitter streams and also outputs RSS feeds, so that you can use with the likes of IFTTT. You can see more at http://iag.me/twools/
I need some time to clean up the code and get it on GitHub. I’ll get there!
Hi,.
Can you tell how to rewrite url under godaddy sphere. Like I’m facing lots of issues on this. Please do write an article about this.
Waiting your kind support.
I don’t use WordPress (but love/use some of the tips on this site). I found these scans on February 4th as well, so far only that day from 1 IP address (I also do real-time monitoring of 404, 403, etc. 24 hours a day – yes, I get up at 3am to deal with these at times).
I have had some fun with these scanners who look for common WordPress files, like wp-login.php. I recently created files with those filenames with custom code that automatically blocks their IP and the scans have reduced significantly.
I tried adding your code (I had come up with something similar) to my .htaccess, but it doesn’t seem to actually work. I still end up on a 404 page (I’d rather they get denied) and one site keeps getting hit with these.
Hmmm.. sounds like something may be interfering.. what is the exact code that you tried?
Originally, the code exactly as you posted. Pared down to current test:
RewriteCond %{QUERY_STRING} www\.google\.com/humans\.txt [NC]
RewriteRule .* - [F,L]
not working, from what I can tell (get my custom 404 page when trying this (from my logs):
/administrator/admin.php?site_absolute_path=http://www.google.com/humans.txt?
It should work, but on some servers I’ve noticed that query-string directives need to be placed *before* any other mod_rewrite rules. For example, here at Perishable Press, I have to include the query-string portions of my blacklists before the WP permalink rules and other blacklist rules, etc.
Very odd. I have this rule much further down in the list:
RewriteEngine On
RewriteCond %{QUERY_STRING} m=1 [NC]
RewriteRule (.*) $1? [L,R]
and it is working just fine (I have a responsive theme set up, so mobile requests should go to the same URL’s, rather than creating a bunch of 404s as they had been doing).
I’ve also had trouble getting a match on URL login attempts (values with colons in them), which was why I had tried the shorter version. Both types of hits on my sites daily, trying to break in or find weaknesses.
Yes if that is working, then you have a foothold for further testing of the humans.txt query string. Make a backup of the .htaccess file and then try modifying that snippet like so:
RewriteEngine On
RewriteCond %{QUERY_STRING} m=1 [NC,OR]
RewriteCond %{QUERY_STRING} www\.google\.com/humans\.txt [NC]
RewriteRule (.*) $1? [L,R]
That should provide some further clues to work with..
The real question is, how many legit queries end in a question mark? I only allow myself to pass a question mark in a query string. As far as I know, no one else should have to, unless they’re putting a question mark in the search box when looking for something. Which would be pretty useless in my opinion anyways.
I’m wondering if this would be better?
RewriteCond %{QUERY_STRING} \?$
@Karen, are you preforming a query? Or are you just typing the string in your browser as a URL? Because it needs to have a proceeding question mark at the beginning if you’re going to try to use it in a browser URL field to represent a query. Try preforming a search in a search box with that string on your site instead.
THANK YOU for this… you saved me a lot of headaches, Jeff. These are pesky and just a pain (for the most part).
I use Wordfence to keep the really bad stuff away and so far, it has been excellent. No glitches with hacks – which I had before I installed it.
Hello,
Can you explain how does it work?
I can’t get the concept of this hack method.
if my site is w
ww.mysite.com
and requestingwww.mysite.com?query=http://www.google.com/humans.txt
What is the prob here?
Why is this bad?
And what’s the difference between
www.mysite.com?query=http://www.google.com/humans.txt
And simply to this:
www.mysite.com?query=http://www.google.com/another.ext
Thank you
It’s all about control. The problem is the sheer volume of such requests, as explained in the article. The technique helps give control back to the person being attacked. There is no problem with your server returning 404 Not Found, but there is a chance that the scan will reveal a vulnerability; perhaps not directly, but let’s say that any server that responds to the humans.txt scan is also known to be vulnerable to a certain hack. This technique protects against such scenarios.
Okay, thank you very much