Latest TweetsAll o' me plugins freshly updated and ready for WP 5.0 :) profiles.wordpress.org/special…
Perishable Press

404 Fix: Block Nuisance Requests for Non-Existent Files

[ Han Solo shutting up C-3PO in Empire Strikes Back ] As I’ve written before, blocking nuisance requests can help save you money by cutting down on wasted server resources, memory, and so forth. It also saves you time, as your server access and error logs won’t be full of nuisance request spam. So you will have more resources and time for things that matter, like running your business, helping customers, improving code, etc. So to continue the proud tradition of blocking malicious traffic, this post builds upon previous blocking techniques to stop some additional nuisance requests for non-existent files.

Meet the nuisance requests

Here are examples of the types of never-ending requests for files that don’t exist, as in every request triggers a 404 error on the server.

https://example.com/{random}.php.suspected

https://example.com/.well-known/assetlinks.json
https://example.com/.well-known/dnt-policy.txt
https://example.com/.well-known/apple-app-site-association

https://example.com/.git/config
https://example.com/.git/index
https://example.com/.gitignore
https://example.com/.git/HEAD

https://example.com/apple-app-site-association
https://example.com/autodiscover/autodiscover.xml

This is a concise summary of the nuisance requests that have been plaguing servers around the Web. You got the mysterious .php.suspected is super popular these days. Then you’ve got the ’ol .well-known collection that doesn’t exist. And of course gotta scan for all sorts of .git files, and then also throw in apple-app-site-association and autodiscover.xml to boot.

There are people (bots) scanning endlessly for these files. I can understand if bots visit once in awhile looking for common files, for whatever purpose, even if the files don’t exist. Totally normal and expected to get that sort of request.

The problem I have is when stupid bots and scripts continue pestering sites, making the same 404 requests over and over and over again. It’s like, come ON people: program your bot to be “response aware”, so that it “remembers” the response code returned for each request. Doing that would save everyone’s time, energy, and money (including the bot owner).

We’re talking bots 101 here folks. Any bot that keeps making the same requests for non-existent resources is either incompetent or malicious. Neither scenario bodes well for literally anyone. So let’s go ahead and block this sort of nonsense and save everyone the headache.

..shut them all down!

FIRST: make sure the files don’t actually exist on your site. That is, if you are using Git something, well-known, or whatever, you do NOT want to implement this technique. For example, if you are using Git for version control on your site, and then block all requests for .git resources.. things are gonna stop working.

For everyone else not using any of the previously mentioned files, here is the .htaccess to block all previously discussed nuisance requests:

# BLOCK NUISANCE REQUESTS
# perishablepress.com/block-nuisance-requests
<IfModule mod_alias.c>
	RedirectMatch 403 (?i)\.php\.suspected
	RedirectMatch 403 (?i)\.(git|well-known)
	RedirectMatch 403 (?i)apple-app-site-association
	RedirectMatch 403 (?i)/autodiscover/autodiscover.xml
</IfModule>
Note: This technique replaces the previous nuisance-blocking technique.

You can implement that code via Apache config file, or can add directly to your site’s web-accessible root directory. For most sites that the same directory that contains robots.txt, humans.txt, and other common root-level files.

How does the code work? Glad you asked. Basically using mod_alias’ RedirectMatch to check the requested URI and serve a 403 Forbidden response. Here is more of a step-by-step explanation:

  1. First, check if the alias module is available via <IfModule mod_alias.c>
  2. Then we have four RedirectMatch directives, with each matching against a different set of regular expressions. The 403 sets the response code, and the (?i) specifies that pattern matching should be case-insensitive.
  3. Close the <IfModule> container

As a challenge, try to figure out which regular expression (regex) blocks each of the nuisance requests listed above. They should all be covered — no more, no less. In other words, there should be zero false positives with this code. And also zero false negatives ;)

Once implemented, this simple .htaccess technique will put an end to the endless nuisance requests for non-existent (404) resources. And that my friends saves you time, energy, resources, and money.

Jeff Starr
About the Author Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.
Archives
6 responses
  1. Super helpful snippet Jeff, thank you for posting this!

  2. Forgot to ask, is this snippet included in the 6G list?

    • Jeff Starr

      Thanks Pieter! And yes, some aspects of this technique are included via various 5G/6G rules, but nothing comprehensive that covers everything here.

  3. Kristina Ponting August 27, 2018 @ 9:15 pm

    Love you:)
    I was wondering, what in h`ll was asking non files on my server:)

  4. Hi,
    I’ve been seeing hundreds of 404’s in my server logs caused by a malicious bot, I’m guessing. I need some help to get a handle on this excessive traffic. Here’s an example from a recent access log:

    27.115.124.2 - - [11/Oct/2018:00:40:37 -0600] "GET /get?show_env=1 HTTP/1.1" 404 6925 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"

    I can’t block the numerous IPs that seem to be causing these 404s, because in doing a reverse lookup, I get NXDOMAIN. The IP says it’s from China servers. Just in case,I blocked the various IPs, but this has not prevented this abuse (NXDOMAIN).

    Any suggestions you can offer would be greatly appreciated!

    • Jeff Starr

      As I explain in previous posts, the best way to stop rogue server requests is to target the request string itself. In this case, the request string is:

      /get?show_env=1

      So you could either block /get in the request URI, or you could block the show_env=1 in the query string, or you could block both. Blocking /get would result in more false positives than show_env=1 (unless your site actually uses that particular query parameter), so the best thing to block in this case would be the query string.

[ Comments are closed for this post ]