Eliminate 404 Errors for PHP Functions
Recently, I discussed the suspicious behavior recently observed by the Yahoo! Slurp crawler. As revealed by the site’s closely watched 404-error logs, Yahoo! had been requesting a series of nonexistent resources. Although a majority of the 404 errors were exclusive to the Slurp crawler, there were several instances of requests that were also coming from Google, Live, and even Ask. Initially, these distinct errors were misdiagnosed as existing URLs appended with various JavaScript functions. Here are a few typical examples of these frequently observed log entries:
https://perishablepress.com/press/category/websites/feed/function.opendir
https://perishablepress.com/press/category/websites/feed/function.array-rand
https://perishablepress.com/press/category/websites/feed/function.mkdir
https://perishablepress.com/press/category/websites/feed/ref.outcontrol
Fortunately, an insightful reader named Bas pointed out that the errors were actually PHP functions. Bas explains:
The two functions (
array_rand
andopendir
) you define as javascript functions are PHP functions. Some servers generate clickable links to the php manual (which uses function.NAMEOFFUNCTION in their URL’s) in php scripting error messages. Maybe that’s also the cause of these problems.
Using this information to investigate the issue, I learned that PHP contains a function called html_errors
that “produces hypertext links that direct the user to a page describing the error or function in causing the error.”1 Together with docref_ext
and docref_root
, the html_errors
function controls the presence and formatting of PHP’s docref
error messages.
The error messages generated by the html_errors
function appear in several locations, including PHP log files, user error handlers, and $php_errormsg
variables. Of course, the revealing of sensitive error information should be disabled in the server’s PHP configuration settings because it reveals potential security vulnerabilities. Live reporting of error information may be useful during the development stage, but it is wise to disable such functionality on production servers.
robots.txt + .htaccess
Equipped with the previous information, we return to our previously discussed 404 errors. As we now may see, the bizarre errors that have been baffling us for many months turn out to be caused from the relatively linked function references that are produced by the PHP function, html_errors
. It all makes sense now. Before we realized this, we were forced to throw down a tough set of disallow
rules via robots.txt
to prevent search engines from following these automatically generated function links:
User-agent: *
Disallow: */function.array-rand
Disallow: */function.require
Disallow: */function.opendir
Disallow: */function.mkdir
Disallow: */ref.outcontrol
Disallow: */function.main
That seemed effective enough — at least it stopped us from seeing the symptoms of the underlying problem. Now that we have discovered the source of the misdirected links, we may focus our attention on eliminating them entirely. Once again, thanks to the functional symbiosis between Apache and PHP, we summon the mysterious powers of htaccess to forge a true solution. And, despite the confusing nature of our bizarre 404 errors, the answer is ridiculously simple:
# disable display of php errors
php_flag display_startup_errors off
php_flag display_errors off
php_flag html_errors off
php_value docref_root 0
php_value docref_ext 0
Simply copy & paste the previous code into your site’s root htaccess file and say goodbye to the pointless 404 errors. This snippet of htaccess code employs a series of php_flag
funtions to override the settings in your server’s php.ini
file. Here, we are disabling all startup_errors
, display_errors
, and html_errors
.
docref_root
and docref_ext
) to disable the formatting of the HMTL error links. Technically, this isn’t necessary because we are disabling the links themselves with the previous three lines, so feel free to omit these last two directives at will.</update>
Using these htaccess directives is ideal for shared hosting environments where access to php.ini
is not a reality. However, in situations where php.ini
is accessible, a better solution is to adjust the configuration settings directly:
display_startup_errors = Off
display_errors = Off
html_errors = Off
Of course, it is always a good idea to maintain a private log of PHP errors. To keep an eye on our now-suppressed errors, place a copy of the following code into your root htaccess file:
# log php errors
php_flag log_errors on
php_value error_log /home/domain/private/php-errors.log
Remember to edit the path to php-errors.log
and ensure that the file is writable by the server.
Finally, a big thanks to Bas and Sick of Debt for taking the time to respond to my suspicious article. With their insightful help, I was able to locate the information required to solve the issue. Thank you!
Footnotes
- 1 Check out more information about PHP’s html_errors function »
13 responses to “Eliminate 404 Errors for PHP Functions”
No problem, I learned something from this article too.
Hah. I’ve been getting these as well. Really annoying…
Thanks for the fix!
My pleasure!
This was just what I needed! Thanks for the insight. I was afraid it might’ve been a hacker or something.
Erika
Happy to help! :)
Although in an ideal world you would:
– make sure that the errors were enabled and displayed during development or to developers
– try to solve them before pushing the code to production (I know, it doesn’t really work that way)
And finally… You can make those links point to PHP of course. Might be because I’m tired but it seemed you glossed over it. I point all my error links at the PHP manual now. To be honest, I’ve never actually used that functionality, but it feels quite complete to have it done…
Perhaps you can help me understand this.. Do the hyperlinks for PHP errors point directly at the corresponding page in the PHP manual? Are the links absolute? Do they point to the official PHP site, or do they point to a local copy of the manual?
As far as I recall, the errors that I am trying to eliminate in this article are the result of enabling the auto-generation of error hyperlinks. When these errors appear to spiders, and the links are relatively incomplete, they are crawled to their logical conclusion: a 404 page. Such errors quickly add up, consuming system resources and requiring additional attention.
(if this comment is submitted and looks weird, it’s because I was greeted with a blank screen the first few times I tried to submit it and I am hacking it up to try to get over whatever the problem was.
Please remove any “” characters from my commend in your mind as you read it.)
Ahh! Yes, drop this into your php.ini (adjust for us or whatever if desired)
docref_root = "ht
tp://nz2.php.net/"
I have an extension xdebug that desires it to be set AGAIN in a different directive but yeah, if you do that then all the links will point to a real manual.
What you’re actually supposed to do is point them to your local manual for php, but frankly I’m not sure if everyone even has one let alone the path to it.
So when an error comes up for function.in_array it is just making a hyperlink to it. Now with this line in your config, you will instead be linking to http://nz2.php.net/function.in_array which is most likely the name of a valid page on php.net explaining the function that caused the error.
So for through apache you would say this, I think:
php_value docref_root http://nz2.php.com/
Ohhhh I apologize for the triple post :( Don’t kill me… But after scrolling up and seeing your citation I thought I would go find one and add it. Here it is:
http://us.php.net/manual/en/errorfunc.configuration.php
That’s all from me for now, I think :)
Thanks for this information! It explains many questions that I had regarding PHP errors, log entries, relative links, and 404 errors. Now that I understand this, I think the problem that I was having way back when I wrote this article involved the fact that the relative error links were pointing to virtually nowhere. I am on a shared server and I don’t think it’s configured to automatically generate proper error links to the server’s local PHP manual (most likely to conserve server resources, etc.). The link you provide to the error function page is also great. Thanks!
Hello,
Iam new to this blogging, Google webmaster tools showing 404 errors which links are not existing, here iam giving the link http://www.indianomics.com/category/india/page/9/about this is not the actual way to go about page so it generating 404 error? why is it so?
please help me
Hi sailu, in general, there are many possible sources for 404 errors — moved pages, incoming spam links, referral spam, faulty scripts, mistyped external links, and so on. If I recall, Google provides a way to check the source of various types of crawl errors within Webmaster Tools. Look for a link next to each error — there should be enough specific information provided there to help get you started. Good luck!
Hi jeff,
I resolved the problem.I found this problem from google webmaster tools.One more thing I want Is there any way to remove old errors from google crawl
Thank you