New Bookstore! Save 20% on books with discount code: LAUNCH
Web Dev + WordPress + Security

Eliminate 404 Errors for PHP Functions

Recently, I discussed the suspicious behavior recently observed by the Yahoo! Slurp crawler. As revealed by the site’s closely watched 404-error logs, Yahoo! had been requesting a series of nonexistent resources. Although a majority of the 404 errors were exclusive to the Slurp crawler, there were several instances of requests that were also coming from Google, Live, and even Ask. Initially, these distinct errors were misdiagnosed as existing URLs appended with various JavaScript functions. Here are a few typical examples of these frequently observed log entries:

https://perishablepress.com/press/category/websites/feed/function.opendir
https://perishablepress.com/press/category/websites/feed/function.array-rand
https://perishablepress.com/press/category/websites/feed/function.mkdir
https://perishablepress.com/press/category/websites/feed/ref.outcontrol

Fortunately, an insightful reader named Bas pointed out that the errors were actually PHP functions. Bas explains:

The two functions (array_rand and opendir) you define as javascript functions are PHP functions. Some servers generate clickable links to the php manual (which uses function.NAMEOFFUNCTION in their URL’s) in php scripting error messages. Maybe that’s also the cause of these problems.

Using this information to investigate the issue, I learned that PHP contains a function called html_errors that “produces hypertext links that direct the user to a page describing the error or function in causing the error.”1 Together with docref_ext and docref_root, the html_errors function controls the presence and formatting of PHP’s docref error messages.

The error messages generated by the html_errors function appear in several locations, including PHP log files, user error handlers, and $php_errormsg variables. Of course, the revealing of sensitive error information should be disabled in the server’s PHP configuration settings because it reveals potential security vulnerabilities. Live reporting of error information may be useful during the development stage, but it is wise to disable such functionality on production servers.

robots.txt + .htaccess

Equipped with the previous information, we return to our previously discussed 404 errors. As we now may see, the bizarre errors that have been baffling us for many months turn out to be caused from the relatively linked function references that are produced by the PHP function, html_errors. It all makes sense now. Before we realized this, we were forced to throw down a tough set of disallow rules via robots.txt to prevent search engines from following these automatically generated function links:

User-agent: *
Disallow: */function.array-rand
Disallow: */function.require
Disallow: */function.opendir
Disallow: */function.mkdir
Disallow: */ref.outcontrol
Disallow: */function.main

That seemed effective enough — at least it stopped us from seeing the symptoms of the underlying problem. Now that we have discovered the source of the misdirected links, we may focus our attention on eliminating them entirely. Once again, thanks to the functional symbiosis between Apache and PHP, we summon the mysterious powers of htaccess to forge a true solution. And, despite the confusing nature of our bizarre 404 errors, the answer is ridiculously simple:

# disable display of php errors
php_flag display_startup_errors off
php_flag display_errors off
php_flag html_errors off
php_value docref_root 0
php_value docref_ext 0

Simply copy & paste the previous code into your site’s root htaccess file and say goodbye to the pointless 404 errors. This snippet of htaccess code employs a series of php_flag funtions to override the settings in your server’s php.ini file. Here, we are disabling all startup_errors, display_errors, and html_errors.

Update: I have also added two directives (docref_root and docref_ext) to disable the formatting of the HMTL error links. Technically, this isn’t necessary because we are disabling the links themselves with the previous three lines, so feel free to omit these last two directives at will.</update>

Using these htaccess directives is ideal for shared hosting environments where access to php.ini is not a reality. However, in situations where php.ini is accessible, a better solution is to adjust the configuration settings directly:

display_startup_errors = Off
display_errors = Off
html_errors = Off

Of course, it is always a good idea to maintain a private log of PHP errors. To keep an eye on our now-suppressed errors, place a copy of the following code into your root htaccess file:

# log php errors
php_flag log_errors on
php_value error_log /home/domain/private/php-errors.log

Remember to edit the path to php-errors.log and ensure that the file is writable by the server.

Finally, a big thanks to Bas and Sick of Debt for taking the time to respond to my suspicious article. With their insightful help, I was able to locate the information required to solve the issue. Thank you!

Footnotes

Jeff Starr
About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
The Tao of WordPress: Become your own WordPress guru.

13 responses to “Eliminate 404 Errors for PHP Functions”

  1. Jeff Starr

    Hi sailu, yes there is a way to remove unwanted URLs from the index via Google Webmaster Tools, but I am not sure about removing errors. I always try to correct the errors if they are present on my site, or else I will contact the webmaster of any sites that have botched links to my site and ask them to please fix them. Otherwise, if it’s spam-related, there may be little you can do about it.

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
USP Pro: Unlimited front-end forms for user-submitted posts and more.
Thoughts
Book updates complete! DigWP, .htaccess, Tao-WP, and WP Themes books all updated and current with all the latest.
Stop giving so much juice to social media. Get a site and OWN your content.
I would give my left testicle for macOS Finder to remember column widths.
The chemical name for titin (the largest known protein) has 189,819 characters and takes several hours to pronounce.
Working on book updates, should be available for download sometime next week.
iCloud is like the Terminator. It will never stop trying to get your data. An endless fight on each Apple device to keep iCloud disabled and empty.
Take a screenshot with Firefox (no extension required). Open Developer Tools Settings and enable the “Take a screenshot” button. Then click the button :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.