404 Fix: Block Nuisance Requests for Non-Existent Files
As I’ve written before, blocking nuisance requests can help save you money by cutting down on wasted server resources, memory, and so forth. It also saves you time, as your server access and error logs won’t be full of nuisance request spam. So you will have more resources and time for things that matter, like running your business, helping customers, improving code, etc. So to continue the proud tradition of blocking malicious traffic, this post builds upon previous blocking techniques to stop some additional nuisance requests for non-existent files.
.well-known
requests (among other things), which are required for sites using SSL/HTTPS Let’s Encrypt certificates. So if your site uses Let’s Encrypt, either skip this technique or remove any lines that block .well-known
requests.Meet the nuisance requests
Here are examples of the types of never-ending requests for files that don’t exist, as in every request triggers a 404 error on the server.
https://example.com/{random}.php.suspected
https://example.com/.well-known/assetlinks.json
https://example.com/.well-known/dnt-policy.txt
https://example.com/.well-known/apple-app-site-association
https://example.com/.git/config
https://example.com/.git/index
https://example.com/.gitignore
https://example.com/.git/HEAD
https://example.com/apple-app-site-association
https://example.com/autodiscover/autodiscover.xml
This is a concise summary of the nuisance requests that have been plaguing servers around the Web. You got the mysterious .php.suspected
is super popular these days. Then you’ve got the ’ol .well-known
collection that doesn’t exist. And of course gotta scan for all sorts of .git
files, and then also throw in apple-app-site-association
and autodiscover.xml
to boot.
There are people (bots) scanning endlessly for these files. I can understand if bots visit once in awhile looking for common files, for whatever purpose, even if the files don’t exist. Totally normal and expected to get that sort of request.
The problem I have is when stupid bots and scripts continue pestering sites, making the same 404 requests over and over and over again. It’s like, come ON people: program your bot to be “response aware”, so that it “remembers” the response code returned for each request. Doing that would save everyone’s time, energy, and money (including the bot owner).
We’re talking bots 101 here folks. Any bot that keeps making the same requests for non-existent resources is either incompetent or malicious. Neither scenario bodes well for literally anyone. So let’s go ahead and block this sort of nonsense and save everyone the headache.
..shut them all down!
FIRST: make sure the files don’t actually exist on your site. That is, if you are using Git something, well-known, or whatever, you do NOT want to implement this technique. For example, if you are using Git for version control on your site, and then block all requests for .git
resources.. things are gonna stop working.
For everyone else not using any of the previously mentioned files, here is the .htaccess to block all previously discussed nuisance requests:
# BLOCK NUISANCE REQUESTS
# perishablepress.com/block-nuisance-requests
<IfModule mod_alias.c>
RedirectMatch 403 (?i)\.php\.suspected
RedirectMatch 403 (?i)\.(git|well-known)
RedirectMatch 403 (?i)apple-app-site-association
RedirectMatch 403 (?i)/autodiscover/autodiscover.xml
</IfModule>
You can implement that code via Apache config file, or can add directly to your site’s web-accessible root directory. For most sites that the same directory that contains robots.txt, humans.txt, and other common root-level files.
How does the code work? Glad you asked. Basically using mod_alias’ RedirectMatch
to check the requested URI and serve a 403 Forbidden response. Here is more of a step-by-step explanation:
- First, check if the alias module is available via
<IfModule mod_alias.c>
- Then we have four
RedirectMatch
directives, with each matching against a different set of regular expressions. The403
sets the response code, and the(?i)
specifies that pattern matching should be case-insensitive. - Close the
<IfModule>
container
As a challenge, try to figure out which regular expression (regex) blocks each of the nuisance requests listed above. They should all be covered — no more, no less. In other words, there should be zero false positives with this code. And also zero false negatives ;)
Once implemented, this simple .htaccess technique will put an end to the endless nuisance requests for non-existent (404) resources. And that my friends saves you time, energy, resources, and money.
6 responses to “404 Fix: Block Nuisance Requests for Non-Existent Files”
Super helpful snippet Jeff, thank you for posting this!
Forgot to ask, is this snippet included in the 6G list?
Thanks Pieter! And yes, some aspects of this technique are included via various 5G/6G rules, but nothing comprehensive that covers everything here.
Love you:)
I was wondering, what in h`ll was asking non files on my server:)
Hi,
I’ve been seeing hundreds of 404’s in my server logs caused by a malicious bot, I’m guessing. I need some help to get a handle on this excessive traffic. Here’s an example from a recent access log:
27.115.124.2 - - [11/Oct/2018:00:40:37 -0600] "GET /get?show_env=1 HTTP/1.1" 404 6925 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
I can’t block the numerous IPs that seem to be causing these 404s, because in doing a reverse lookup, I get NXDOMAIN. The IP says it’s from China servers. Just in case,I blocked the various IPs, but this has not prevented this abuse (NXDOMAIN).
Any suggestions you can offer would be greatly appreciated!
As I explain in previous posts, the best way to stop rogue server requests is to target the request string itself. In this case, the request string is:
/get?show_env=1
So you could either block
/get
in the request URI, or you could block theshow_env=1
in the query string, or you could block both. Blocking/get
would result in more false positives thanshow_env=1
(unless your site actually uses that particular query parameter), so the best thing to block in this case would be the query string.