Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

Protect Against Humans.txt Query-String Scans

I woke up this morning to the sound of thousands of 404 requests hitting the server. It’s sad that there are kiddies out there who have nothing better to do than buy some pathetic $50 script and then sit there like an imbecile harassing people for hours on end. But alas, that is the world we live in — fortunately it’s less than trivial to block the entire scan with just a few lines of good old .htaccess.

About the scans

Before getting to the code, a bit more about the scans that it protects against. Apparently some despearate lowlife released yet another script for scanning sites for vulnerabilities and exploit opportunities. So suddenly there are waves of scans probing for a vulnerability related to the following query-string parameter:

?whatever=http://www.google.com/humans.txt?

Unless this means something to your server, thousands of scans for it are simply wasting time, bandwidth, energy, money, and everything else. Responding with 404 is fine if you’ve optimized the response, otherwise they’re basically leeches sucking the lifeblood from your business.

In a few months (or less) the script should be much less active, but right now there are too many idiots running it, which means that untold numbers of sites are being assaulted with relentless, malicious URL requests by the thousands, because, apparently, it’s not enough to run the script only once per site: they just keep it on auto-loop and repeat the same set of several hundred queries over and over and..

Here is a sampling of the requests made by the “humans.txt” scans:

http://example.com/admin.php?lang=http://www.google.com/humans.txt
http://example.com/zoomstats/libs/dbmax/mysql.php?GLOBALS[\'lib\'][\'db\'][\'path\']=http://www.google.com/humans.txt?
http://example.com/ytb/cuenta/cuerpo.php?base_archivo=http://www.google.com/humans.txt?
http://example.com/yabbse/Sources/Packages.php?sourcedir=http://www.google.com/humans.txt?
http://example.com/xt_counter.php?server_base_dir=http://www.google.com/humans.txt?
http://example.com/xarg_corner_top.php?xarg=http://www.google.com/humans.txt?
http://example.com/xoopsgallery/init_basic.php?GALLERY_BASEDIR=http://www.google.com/humans.txt?&2093085906=1&995617320=2
http://example.com/xarg_corner_bottom.php?xarg=http://www.google.com/humans.txt?
http://example.com/wsk/wsk.php?wsk=http://www.google.com/humans.txt?
http://example.com/xarg_corner.php?xarg=http://www.google.com/humans.txt?
http://example.com/wp-content/plugins/wp-table/js/wptable-button.phpp?wpPATH=http://www.google.com/humans.txt?
http://example.com/wp-content/plugins/wordtube/wordtube-button.php?wpPATH=http://www.google.com/humans.txt?
http://example.com/wp-content/plugins/mygallery/myfunctions/mygallerybrowser.php?myPath=http://www.google.com/humans.txt?
http://example.com/wp-cache-phase1.php?plugin=http://www.google.com/humans.txt?
http://example.com/worldpay_notify.php?mosConfig_absolute_path=http://www.google.com/humans.txt?
http://example.com/wordpress/wp-content/plugins/sniplets/modules/syntax_highlight.php?libpath=http://www.google.com/humans.txt?
.
.
.

This sort of scan is malicious for a number of reasons, but most annoying is the sheer volume of requests now hitting servers looking for a “humans.txt”-related response. Utterly pathetic how many times this scanning script is being executed with nothing more than a single potential payload (i.e., that made possible via the http://www.google.com/humans.txt? vulnerability). What a waste.

Until the usage of this scan script slows down, you may want to take a moment and check if your server logs show any signs of getting hit — and you’ll know immediately because thousands upon thousands of humans.txt requests are hard to miss.

I could sit here all day and discuss the matter, but my time is limited, and 50-dollar kiddies are simply not worth the effort. Stopping malicious nonsense, however, IS worth my time, so let’s look at a drop-dead simple way to stop the entire “humans.txt” scan cold dead.

Block the humans.txt scanning with .htaccess

As mentioned, all of the requests for this particular scan are targeting, via query string, “humans.txt”, which makes my job super-easy. Crack open your root .htaccess file and add the following snippet:

# block humans.txt scans
<IfModule mod_rewrite.c>
	RewriteCond %{QUERY_STRING} http\:\/\/www\.google\.com\/humans\.txt\? [NC]
	RewriteRule .* - [F,L]
</IfModule>

I have this code installed on most of my sites now, and thankfully the pesky scans have stopped completely. Of course, they’ll be back with a new $50 script next month, but until then it’s nice to conserve server resources and keep error logs clear of nonsense.

I can hear it already: “wait, why are you blocking requests for the humans.txt file?” — I’m not, this technique blocks requests that include http://www.google.com/humans.txt? in the query string. Consider:

# good request, never blocked.
http://example.com/humans.txt

# bad request, always blocked.
http://example.com/?some_path=http://www.google.com/humans.txt?

Hopefully that clears up any potential confusion regarding this simple yet effective solution.

Happy hunting people.

Note: I’ve also added this snippet to the recently posted 2014 Micro Blacklist, so no need to add both ;)

Related

About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
Digging Into WordPress: Take your WordPress skills to the next level.

21 responses to “Protect Against Humans.txt Query-String Scans”

  1. Ian Anderson Gray 2014/02/11 7:43 am

    Thanks for this. I’m always concerned about these kind of scans- looking for vulnerabilities.
    You said you “woke up this morning to the sound of thousands of 404 requests hitting the server”, but I am interested how you knew about these requests in the first place? Do you routinely look through your server logs, and if so, how do you do it?
    The issue is that I have over 60 websites on our server and it would be very time consuming to look through each raw log in detail. Is there an easier way?

    • Jeff Starr 2014/02/11 1:30 pm

      Yes actually I continually monitor my sites in real-time while awake, and then spend about 30 minutes each morning going over the details and putting together block lists, etc. There are programs that help in the analysis of logs, but I use my own system. Try a search for “server log analysis program” or similar and you should get some good results.

      • Ian Anderson Gray 2014/02/11 1:32 pm

        Thanks. I’d be interested to know more about your system. I’ll look into some log analysis programs. It’s also about knowing what to look for!

      • Jeff Starr 2014/02/11 1:39 pm

        I’ve been working on further automating my own system (which currently is too much hands-on), and have plans for a plugin/script as well. Right now though it’s all about finding the time for another project :)

  2. Ian Anderson Gray 2014/02/11 1:50 pm

    Sounds great, I’d be interested in that. However I totally understand the time thing. I’m still lamenting not being able to spend time developing my Twitter app.
    Is 5G/6G ever going to become a WP plugin? I use Better WP Security, I wonder whether the two would play ball or whether there could be some kind of integration? Sorry, just my mind working overdrive….

    • Jeff Starr 2014/02/11 2:42 pm

      Hmm.. I hadn’t thought of making 5G/6G into a plugin as it’s entirely .htaccess, just copy/paste. But it’s an interesting idea that I’ll be rolling around in my head.. I would have to look at Better WP Security, but unless it’s doing something weird, should work fine with a 5G plugin.. What happened with your Twitter app, is what I want to know..

  3. Ian Anderson Gray 2014/02/11 2:52 pm

    Yes, pasting in to .htaccess is the best way, but less techie people might find that hard. I know that Better WP Security has an option to ban bad bots using a list. I never use that option because it blocks image and summary previews with the likes of LinkedIn. I think an option to use 5G/6G would be good there. However, the great thing about pasting it into .htaccess is that you don’t have to include all lines if you don’t want.

    My Twitter app is called Twools. It was my first proper PHP app let alone Twitter app. I also made it into a WordPress plugin. It offers quite powerful filters for your Twitter streams and also outputs RSS feeds, so that you can use with the likes of IFTTT. You can see more at http://iag.me/twools/

    I need some time to clean up the code and get it on GitHub. I’ll get there!

  4. Hi,.

    Can you tell how to rewrite url under godaddy sphere. Like I’m facing lots of issues on this. Please do write an article about this.

    Waiting your kind support.

  5. I don’t use WordPress (but love/use some of the tips on this site). I found these scans on February 4th as well, so far only that day from 1 IP address (I also do real-time monitoring of 404, 403, etc. 24 hours a day – yes, I get up at 3am to deal with these at times).

    I have had some fun with these scanners who look for common WordPress files, like wp-login.php. I recently created files with those filenames with custom code that automatically blocks their IP and the scans have reduced significantly.

  6. I tried adding your code (I had come up with something similar) to my .htaccess, but it doesn’t seem to actually work. I still end up on a 404 page (I’d rather they get denied) and one site keeps getting hit with these.

  7. Originally, the code exactly as you posted. Pared down to current test:

    RewriteCond %{QUERY_STRING} www\.google\.com/humans\.txt [NC]
    RewriteRule .* - [F,L]

    not working, from what I can tell (get my custom 404 page when trying this (from my logs):

    /administrator​/admin.php?site_absolute_path=http:​/​/www.google.com​/humans.txt?

    • Jeff Starr 2014/02/13 5:49 pm

      It should work, but on some servers I’ve noticed that query-string directives need to be placed *before* any other mod_rewrite rules. For example, here at Perishable Press, I have to include the query-string portions of my blacklists before the WP permalink rules and other blacklist rules, etc.

  8. Very odd. I have this rule much further down in the list:

    RewriteEngine On
    RewriteCond %{QUERY_STRING} m=1 [NC]
    RewriteRule (.*) $1? [L,R]

    and it is working just fine (I have a responsive theme set up, so mobile requests should go to the same URL’s, rather than creating a bunch of 404s as they had been doing).

    I’ve also had trouble getting a match on URL login attempts (values with colons in them), which was why I had tried the shorter version. Both types of hits on my sites daily, trying to break in or find weaknesses.

    • Jeff Starr 2014/02/13 6:03 pm

      Yes if that is working, then you have a foothold for further testing of the humans.txt query string. Make a backup of the .htaccess file and then try modifying that snippet like so:

      RewriteEngine On
      RewriteCond %{QUERY_STRING} m=1 [NC,OR]
      RewriteCond %{QUERY_STRING} www\.google\.com/humans\.txt [NC]
      RewriteRule (.*) $1? [L,R]

      That should provide some further clues to work with..

  9. Thomas Oliver 2014/02/15 8:45 am

    The real question is, how many legit queries end in a question mark? I only allow myself to pass a question mark in a query string. As far as I know, no one else should have to, unless they’re putting a question mark in the search box when looking for something. Which would be pretty useless in my opinion anyways.

    I’m wondering if this would be better?

    RewriteCond %{QUERY_STRING} \?$

    @Karen, are you preforming a query? Or are you just typing the string in your browser as a URL? Because it needs to have a proceeding question mark at the beginning if you’re going to try to use it in a browser URL field to represent a query. Try preforming a search in a search box with that string on your site instead.

  10. Jocelyn Myers 2014/02/17 11:04 pm

    THANK YOU for this… you saved me a lot of headaches, Jeff. These are pesky and just a pain (for the most part).

    I use Wordfence to keep the really bad stuff away and so far, it has been excellent. No glitches with hacks – which I had before I installed it.

  11. George Garchagudashvili 2014/02/19 4:38 am

    Hello,
    Can you explain how does it work?
    I can’t get the concept of this hack method.

    if my site is www.mysite.com and requesting
    www.mysite.com?query=http://www.google.com/humans.txt
    What is the prob here?
    Why is this bad?

    And what’s the difference between
    www.mysite.com?query=http://www.google.com/humans.txt
    And simply to this:
    www.mysite.com?query=http://www.google.com/another.ext

    Thank you

    • It’s all about control. The problem is the sheer volume of such requests, as explained in the article. The technique helps give control back to the person being attacked. There is no problem with your server returning 404 Not Found, but there is a chance that the scan will reveal a vulnerability; perhaps not directly, but let’s say that any server that responds to the humans.txt scan is also known to be vulnerable to a certain hack. This technique protects against such scenarios.

  12. George Garchagudashvili 2014/02/22 11:42 am

    Okay, thank you very much

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
SAC Pro: Unlimited chats.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.