Anyone who is paying attention to their server access and error logs has probably noticed that Google and other bots have been making endless requests for
apple-app-site-association, and various related files. This quick post explains how to save some server bandwidth and resources by blocking such repetitive requests, and also looks at a related problem with certain search engines <cough> not respecting a standard “410 Gone” server response.
.well-known requests (among other things), which are required for sites using SSL/HTTPS Let’s Encrypt certificates. So if your site uses Let’s Encrypt, either skip this technique or remove any lines that block
For the past several months I’ve noticed an uptick for requests for the following resources:
Googlebot especially is continually snooping for these files, even if there is nothing that actually links to them. I first noticed this trend while examining my sites in Google Webmaster Tools. Every site, every crawl, googlebot and others are requesting these files.
And that’s not a bad thing IF the files actually exist. But they don’t on my server, and I am getting tired of googlebot not heeding a simple “410 Gone” response, which I serve here on this site for example, for any/all requests for any of the above files.
And why is Google reporting 410 responses as if they were 404? By definition a 410 response is designed to convey a clear message that the resource does not exist; i.e., it’s GONE. 410 is meant to provide webmasters with a way to clean up their servers. According to specification:
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent.
So please pay attention and make a note, googlebot. 410 does NOT mean please keep checking over and over and over again because the resource is not found — it means that the resource is GONE. Please wake up, Google.
So after a few months of getting these endless requests for this particular set of files, I finally decided to do something about it. Here is a quick snippet that I’ve been adding to my sites, basically telling unruly bots to shut up:
# Block nuisance requests
RedirectMatch 403 (?i)\.well-known
RedirectMatch 403 (?i)apple-app-site-association
That will block all requests for
apple-app-site-association. So only implement these directives if you’re sure that these files do not exist on your server. Notice that we’re serving a crystal-clear 403 forbidden response. At the time of this writing, Google seems to understand and respect the meaning of this particular response code, and thus the requests do not appear in the “errors” section of Webmaster Tools.
Case in point: these stupid 410 URLs are IMPOSSIBLE to get rid of because Webmaster Tools doesn’t respect 410
Technically a 410 Gone response would be more accurate, but as explained Google doesn’t seem to comprehend the meaning of an explicit message telling them that the requested resource is GONE. Most good bots understand and respect 410, and remove the resource from memory, so as to not keep endlessly requesting it. You know, so they’re not wasting time, energy, and resources.
Thus the whole point and not-so-hidden moral of the story:
410 was once used to erase a resource from memory, but now alas it’s meaningless because the largest search engine in the world treats it like a common 404.
And there’s nothing that any of us can do about it.