Latest TweetsGreat post about the latest power grab: www.eff.org/deeplinks/2018/09/…
Perishable Press

404 Fix: Block Nuisance Requests for Non-Existent Files

[ Han Solo shutting up C-3PO in Empire Strikes Back ] As I’ve written before, blocking nuisance requests can help save you money by cutting down on wasted server resources, memory, and so forth. It also saves you time, as your server access and error logs won’t be full of nuisance request spam. So you will have more resources and time for things that matter, like running your business, helping customers, improving code, etc. So to continue the proud tradition of blocking malicious traffic, this post builds upon previous blocking techniques to stop some additional nuisance requests for non-existent files.

Meet the nuisance requests

Here are examples of the types of never-ending requests for files that don’t exist, as in every request triggers a 404 error on the server.

https://example.com/{random}.php.suspected

https://example.com/.well-known/assetlinks.json
https://example.com/.well-known/dnt-policy.txt
https://example.com/.well-known/apple-app-site-association

https://example.com/.git/config
https://example.com/.git/index
https://example.com/.gitignore
https://example.com/.git/HEAD

https://example.com/apple-app-site-association
https://example.com/autodiscover/autodiscover.xml

This is a concise summary of the nuisance requests that have been plaguing servers around the Web. You got the mysterious .php.suspected is super popular these days. Then you’ve got the ’ol .well-known collection that doesn’t exist. And of course gotta scan for all sorts of .git files, and then also throw in apple-app-site-association and autodiscover.xml to boot.

There are people (bots) scanning endlessly for these files. I can understand if bots visit once in awhile looking for common files, for whatever purpose, even if the files don’t exist. Totally normal and expected to get that sort of request.

The problem I have is when stupid bots and scripts continue pestering sites, making the same 404 requests over and over and over again. It’s like, come ON people: program your bot to be “response aware”, so that it “remembers” the response code returned for each request. Doing that would save everyone’s time, energy, and money (including the bot owner).

We’re talking bots 101 here folks. Any bot that keeps making the same requests for non-existent resources is either incompetent or malicious. Neither scenario bodes well for literally anyone. So let’s go ahead and block this sort of nonsense and save everyone the headache.

..shut them all down!

FIRST: make sure the files don’t actually exist on your site. That is, if you are using Git something, well-known, or whatever, you do NOT want to implement this technique. For example, if you are using Git for version control on your site, and then block all requests for .git resources.. things are gonna stop working.

For everyone else not using any of the previously mentioned files, here is the .htaccess to block all previously discussed nuisance requests:

# BLOCK NUISANCE REQUESTS
# perishablepress.com/block-nuisance-requests
<IfModule mod_alias.c>
	RedirectMatch 403 (?i)\.php\.suspected
	RedirectMatch 403 (?i)\.(git|well-known)
	RedirectMatch 403 (?i)apple-app-site-association
	RedirectMatch 403 (?i)/autodiscover/autodiscover.xml
</IfModule>
Note: This technique replaces the previous nuisance-blocking technique.

You can implement that code via Apache config file, or can add directly to your site’s web-accessible root directory. For most sites that the same directory that contains robots.txt, humans.txt, and other common root-level files.

How does the code work? Glad you asked. Basically using mod_alias’ RedirectMatch to check the requested URI and serve a 403 Forbidden response. Here is more of a step-by-step explanation:

  1. First, check if the alias module is available via <IfModule mod_alias.c>
  2. Then we have four RedirectMatch directives, with each matching against a different set of regular expressions. The 403 sets the response code, and the (?i) specifies that pattern matching should be case-insensitive.
  3. Close the <IfModule> container

As a challenge, try to figure out which regular expression (regex) blocks each of the nuisance requests listed above. They should all be covered — no more, no less. In other words, there should be zero false positives with this code. And also zero false negatives ;)

Once implemented, this simple .htaccess technique will put an end to the endless nuisance requests for non-existent (404) resources. And that my friends saves you time, energy, resources, and money.

Jeff Starr
About the Author Jeff Starr = Web Developer. Book Author. Secretly Important.
Archives
4 responses
  1. Pieter August 25, 2018 @ 3:18 amReply ]

    Super helpful snippet Jeff, thank you for posting this!

  2. Pieter August 25, 2018 @ 3:19 amReply ]

    Forgot to ask, is this snippet included in the 6G list?

  3. Kristina Ponting August 27, 2018 @ 9:15 pmReply ]

    Love you:)
    I was wondering, what in h`ll was asking non files on my server:)

Drop a Comment  ]
RSS