Pimp Your 404: Presentation and Functionality
I have been wanting to write about 404 error pages for quite awhile now. They have always been very important to me, with customized error pages playing a integral part of every well-rounded web-design strategy. Rather than try to re-invent the wheel with this, I think I will just go through and discuss some thoughts about 404 error pages, share some useful code snippets, and highlight some suggested resources along the way. In a sense, this post is nothing more than a giant “brain-dump” of all things 404 for future reference. Hopefully you will find it useful in pimping your own 404.
When requested page is not found by server, error message is returned; this is the essence of the 404 — Ancient Chinese proverb
What is a 404 and why should I care?
Technically, you don’t need to even think about 404 errors. They are handled automatically by server. But the problem is that, by default, your server is going to deliver one sick-looking message:
Default Apache 404 error page
While that sort of message is fine for uber-geeky tech sites, chances are that “normal” visitors are going to crap themselves if they see such a thing, thinking:
What does that mean? Did I break something? I don’t have time for this. I’m outta here!
You don’t want that, and neither do your visitors. Granted, most visitors are experienced enough to “deal with it,” but there are many that just don’t understand. This is why many designers take the time to customize their 404 error pages to make them a little more “user-friendly” and not so bloody frightening. It’s not like we’re a bunch of robots, after all.
A nice, “user-friendly” 404 error page @
http://fryewiles.com/templateserrors/404.html
Bottom line, here are three reasons why you should give a flying flip about 404 error pages:
- Default 404 errors are ugly and scary to regular folks
- Default 404 errors result in higher bounce rates, because folks are scared
- Custom 404 pages tell the reader that you care so much about them, and that it’s “all good”
- User-friendly 404 pages draw the reader into your site, instead of scaring them away
- Default 404 errors are useless, but your custom 404 pages can actually help the user find what they are looking for
- You get the idea..
In short, a well-designed 404 page keeps the user engaged and helps build trust. It just makes for an all-around better site.
Another “user-friendly” 404 error page @
http://headscape.co.uk/404
I’m sold, how do I do it?
Here are some tried and true best practices for creating that perfect 404 error page for your site.
- Keep it familiar
- Your custom error page should look like a well-integrated, natural part of your website. One of the reasons why default 404 error pages are so hideous is that they look so alien to your visitors. Unless your site is nothing but black serif fonts on plain white background, the default just isn’t going to “blend in.” So keep it real, and make sure that your 404s (and other error pages) share the same design as the rest of your site.
- Explain the situation
- Everyone in their right mind understands that errors happen. There is no need to apologize for anything, but you do want to explain the situation. This doesn’t have to be anything too serious, just a brief sentence or two telling the user that “something happened” and that the “requested page was not found.” Try to match the tone and flow of your site. If your site is formal, best to stay that way. If your site is hilarious, don’t blow it on the 404. In general: aim for consistency.
- Provide some guidance
- A user sitting there staring at a 404 error page is obviously lost. Try to provide some guidance with a search bar, a site map, or perhaps a recipe for some strong, mixed drinks. There is a good chance that the visitor is looking for some of your popular content, so you may want to suggest a few popular site destinations. It is also helpful to include a search form (or a link to one), so that the user may report the error or ask a question about that darling resource they couldn’t find.
- Display the requested URL
- As the user sits there scratching their head, it may be helpful to show them the URL that was received by the server. For example, the user may think that they entered
http://yoursite.com/blondie/
, but in reality they might have entered something likehttp://yoursite.com/blodie/
instead. Echoing back the originally requested URL makes it easier for the user to spot any mistakes. Later in the article, we’ll look at a nice, easy way to do this. - Be mindful of file size
- As design guru Chris Coyier points out, keeping an eye on the overall size of your 404 page is a good idea. Your 404 will be delivered for every missing request, not just the ones triggered by your visitors. This includes everything from bizarre favicon requests and non-existent robots.txt files to missing scripts, stylesheets, images, and everything in-between. Needless to say, all of these 404s can add up quickly, so if bandwidth is an issue, be sure to keep an eye on total 404 size.
- Take advantage
- Just because life is sucking for the people who can’t find your content doesn’t mean that you can’t benefit. When visitors trigger 404 errors, collect some data about the event so you can fix the issue and prevent further occurrences. For each 404 error, there is a great deal of information that is available to you, such as the requested URL, referrer info, IP address, and much more. Once you have this data recorded somewhere, cleaning things up is much easier. Later in the article, we’ll look at some cool code snippets to help you implement some keen functionality for your 404s.
- Have some fun
- A confused visitor is a scared visitor. The 404 page is a great opportunity to lighten things up with a little humor. Nothing too condescending, but just enough to let ‘em know that you’re on their side, and that you hate being lost just as much as they do. A quick laugh can gloss over just about anything, and you never know — if your page is clever enough, it might be featured in one of those ridiculously popular 404 error-page galleries ;) So be creative, consistent, and have some fun.
Later in the article, we’ll check out some classic examples of effective and useful 404 pages. For now, let’s see how to set ‘em up..
How do I implement a custom 404 page?
That depends on how your site is setup. If you are running WordPress, simply create a theme file named 404.php
and add whatever code and content you wish. Also in WordPress is the intra-page error, which is triggered when no content is found for a specific type of page view. Those familiar with the WordPress loop will recognize this as the portion of code located after the final else
condition. This is not technically a 404 error — more like an intra-WordPress 404 — but it may also be customized and optimized for your visitors.
This portion of loop code is responsible for the “intra-WordPress” 404 error message
For non-WordPress sites, the easiest way to override the default 404 page and deliver your own customized page is to tweak your .
htaccess
with the following directive:
ErrorDocument 404 /error/404.php
This directive will replace the default 404 error page with the one specified (/error/404.php
in our example). Edit the path according to the location of your custom file. Apache handles the rest. Thanks Apache.
Check out my previous article for more information on customizing error messages with HTAccess. You can do much more than custom 404s!
Simply brilliant, let’s see some code snippets
As mentioned, the underlying functionality of your 404 pages is just as important as the user-friendly interface. You could have the swellest 404 on teh Web, but unless you are actively working to resolve the errors, their frequency will inevitably increase. There are many awesome ways to track errors and enhance the underlying functionality of your 404 page. Here are some of my favorites.
Smart 404 for WordPress
Michael Tyson shares this excellent method to make WordPress’ 404 handler a little bit smarter:
I changed my template’s 404 page to do a search for what the viewer was really after, and redirect them there. If it can’t find an exact match, it’ll perform a search with keywords extracted from the URL. If it finds a single result, it’ll redirect, otherwise it’ll put up a few results as suggestions on the 404 page.
Sounds nice, doesn’t it. And extremely helpful as well. Here is the code to place into the top of your active theme’s 404.php
file, immediately preceding the get_header()
tag:
<?php global $wp_the_query;
$search = preg_replace(array("@[_-]@", "@\.html$@"),array(" ",""),urldecode(basename($_SERVER["REQUEST_URI"])));
$posts = $wp_the_query->query(array("name" => $search));
if (count($posts) == 1) {
wp_redirect(get_permalink($posts[0]->ID), 301);
exit();
}
$posts = $wp_the_query->query(array("s" => $search));
if ( count($posts) == 1 ) {
wp_redirect(get_permalink($posts[0]->ID), 301);
exit();
} ?>
That’s the juice, right there. No editing required. Then, once that is in place, you can provide the user with some helpful suggestions. Just place this code somewhere within the <body>
of your 404.php
page:
<?php if (count($posts) > 0) : ?>
<ul>
<?php foreach ($posts as $post) : ?>
<li><a href="<?php echo get_permalink($post->ID); ?>"><?php echo $post->post_title; ?></a></li>
<?php endforeach; ?>
</ul>
<?php endif; ?>
With that code in place, the user will see a list of potentially relevant list of posts that may include something useful or of interest. There is quite a bit that can be done with this technique, so have some fun with it.
Automatic email alerts with all the trimmings
Delivering user-friendly 404 pages to your visitors is great, but don’t let that stop you from eliminating as many lost pages as possible. Cleaning up loose ends makes your site tighter, cleaner, and more usable. One way to keep an eye on your 404 errors is to simply crack open a log and dig in. In other situations, you may prefer to have the error information emailed to you in real-time.
Here is a sweet little script that will do just that — send you an informative email every time a user triggers a 404 error. The email will contain a ton of related information, including everything from IP address and server name to requested URI and user agent. This strategy isn’t recommended for high-volume sites, but for smaller blogs and niche sites, it may be just what the 404 doctor ordered.
Here is the script that makes it happen:
<?php
// 404 auto-mailer script from Perishable Press
function errorEmailAlerts() {
header("HTTP/1.1 404 Not Found");
header("Status: 404 Not Found");
// configure next two lines with your info
$site = "Your Site Name";
$email = "your-email@address.com";
// gather some data
$http_host = $_SERVER['HTTP_HOST'];
$server_name = $_SERVER['SERVER_NAME'];
$remote_ip = $_SERVER['REMOTE_ADDR'];
$remote_host = $_SERVER["REMOTE_HOST"];
$request_uri = $_SERVER['REQUEST_URI'];
$cookie = $_SERVER["HTTP_COOKIE"];
$http_ref = $_SERVER['HTTP_REFERER'];
$query_string = $_SERVER['QUERY_STRING'];
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$error_date = date("D M j Y g:i:s a T");
// prepare teh email
$subject = "404 Alert";
$headers = "Content-Type: text/plain"."\n";
$headers .= "From: ".$site." <".$email.">"."\n";
$message = "404 Error Report for ".$site."\n";
$message .= "Date: ".$error_date."\n";
$message .= "Requested URL: http://".$http_host.$request_uri."\n";
$message .= "Query String: ".$query_string."\n";
$message .= "Cookie: ".$cookie."\n";
$message .= "Referrer: ".$http_ref."\n";
$message .= "User Agent: ".$user_agent."\n";
$message .= "IP Address: ".$remote_ip." - ".$remote_host."\n";
$message .= "Whois: http://ws.arin.net/cgi-bin/whois.pl?queryinput=".$remote_ip;
// send teh email
mail($email, $subject, $message, $headers);
} ?>
Include this script in your active theme’s functions.php
file and edit the first two variables with your specific information. Then, place the following function call at the very top of your theme’s 404.php
file:
<?php errorEmailAlerts(); ?>
Once in place and properly configured, this function will ensure that the proper 404 header is sent to the client, collect as much useful information as possible, and then send you an email with all the trimmings. Larger, high-volume sites may want to consider using a more robust method of logging and analyzing error data, but smaller, low-traffic sites and blogs will certainly benefit from tracking their 404 errors in real time.
Whitelist specific user-agents and URL-request patterns
As you begin to monitor your errors, you will inevitably discover three different types of 404:
- 404 errors caused by malicious behavior (exploit scanning, etc.)
- 404 errors caused by benign requests for non-existent resources
- legitimate 404 errors caused from missing resources or mistyped URLs
Of these three types of 404 errors, the first is by far the most common. The Web is full of unscrupulous bastards who insist on spamming, cracking and exploiting even the humblest of sites. Here at Perishable Press, I devote many resources to the fighting of this type of malicious behavior, including blacklists, scripts, and education.
For the second type of 404 error, we are referring mainly to persistent requests made by otherwise benign scripts that check for the presence of legitimate, proprietary extensions, files, and other things. A good example of this is seen in location mapping systems, which typically request the following strings when crawling your site:
http://domain.tld/path/resource/_vpi.xml
http://domain.tld/path/resource/_vti_bin/
These resources exist on servers that have been configured to accommodate visitors running programs such as Weblin. When they are discovered on a domain, they facilitate the discovery of site resources; when they are not found on a domain, the requests result in 404 errors. The number of these errors can be quite numerous and may appear as malicious.
In addition to specific file requests, there are also certain user agents that may exhibit unusual behavior when crawling your site. Good examples of this include bots that don’t understand fragment identifiers and search engines that try guessing for the presence of expected resources.
To prevent these sorts of benign requests from polluting your 404 monitoring process, you can either blacklist them or add them to a whitelist. To whitelist select agents, we can insert the following snippet into our previous errorEmailAlerts()
function:
function errorEmailAlerts() {
if (
($_SERVER['HTTP_USER_AGENT'] != 'ia_archiver') &&
($_SERVER['HTTP_USER_AGENT'] != 'Yahoo! Slurp') &&
(strpos($request_uri, '/_vpi.xml') === false) &&
(strpos($request_uri, '/_vti_bin') === false)
) {
// function contents go here
}
}
Once in place, this conditional statement will check for any “whitelisted” user agents and/or request strings in the URL. You can easily whitelist additional items by emulating the existing code. I am thinking this is self-explanatory, but don’t hesitate to ask any specific questions in the comments section.
Fabulous, let’s see more examples of great 404 pages
Finally, let’s wrap things up with a look at some great examples of 404 pages. Here are some of my favorite 404 error pages from around the Web.
http://patterntap.com/404
http://www.expansionbroadcast.com/404
http://slonky.com/404
https://monzillamedia.com/404
http://www.productplanner.com/404
http://thehpage.com/404
http://wakinglimb.com/404/
http://24-7media.de/404
http://southcreative.com.au/404.shtml
http://trentwalton.com/404
That’s all folks!
18 responses to “Pimp Your 404: Presentation and Functionality”
Jeff: excellent article as usual. Couple of heads-ups:
1) The full-blown Apache/Wordpress 404 error page is typically only served when the primary URL is borked, not when individual assets of the page can’t be found. In the latter case a 404 header is returned, but it relates only to the request for that component.
2a) WordPress users have a couple of additional considerations. The .htaccess rules that WordPress generates for a permalink setup will cause any request for a file that can’t be found in the file system to be redirected to index.php. As a default action this is dumb. Unless your WordPress theme is explicitly capable of (or designed for) monitoring and handling requests for assets existent or otherwise, such requests are a waste of your server’s resources. Your Apache is perfectly capable of handling the failed asset request without waking PHP out of its slumber, or even breaking into a sweat (by kicking-in the WP engine).
2b) WordPress’ ‘is_404()’ function is quaint. Properly used, it’s darned useful. But although it will send the ‘HTTP 1.x 404 Not Found’ header, it doesn’t (at time of writing) send the ‘Status: 404 Not Found’ header, which an ancient gremlin in my head keeps reminding me is required practice. I build-in a check for ‘is_404()’ and add the extra status header. I also inject my own meta robots deflectors when 404 is true, since I don’t want that bad boy in anyone’s index.
So I suggest WP users with permalink setups add a line to the WP .htaccess rule set:
Add this line:
RewriteCond %{REQUEST_URI} !\.(ico|jpe?g|gif|png|js|css)$ [NC]
Immediately before these:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
That new line covers a few assets, others might add some. YMMV. Easy to test the effect, fire up your real-time performance monitor (Task Manager?) and compare server load with the line active, and with it commented out. QED.
On the other hand, I maight be freakin’ wrong. Guess I’ll find out soon…
Thanks to share this, was looking for this :)
This post helps a lot, thanks! I’m not too sure about people being actually ‘scared’ (“Default 404 errors may result in a higher bounce rate, because people are scared”), it’s more likely they think your website is broken ;-)
This is something i had long realized as important. I so hate those canned messages. This is the 404 page i’ve been using for a while now:
http://www.azevedoecastro.eu/something.htm
It does it’s job breaking the ice.
MAC :)
Excellent excellent work! I am gonna start on mine as soon as possible. Thanks for the tips and the code!
Very creative idea. thanks a lot for posting a such tremendous posting
Wow. This is one of the most all-encompassing post on 404 error pages for WP-powered sites. I’ve seen plenty of featured 404 designs on other sites but this is one of the rare instances where the author actually dissects the problem and inner workings of 404, coupled with useful tips to create a more user-friendly 404 page. Thanks Jeff, yet again you’ve churned out a wonderful article.
I love the recommended posts for smart 404 pages. Majority of the 404 pages usually include a message, sometimes an image, and then a search field and that’s the end of the story. Rendering a list of posts will help users to re-orientate themselves after crashing into a 404 page :) that’s a very smart tip! Just one question though – since you’ve mentioned that WP will redirect failed requests to the 404 page, and that we’re calling entries from the database, will that overload the server? Not a very major issue though – since probably the index.php will overload the server first during a traffic spike, or the post.php / page.php.
Have a great weekend! Sorry for not dropping by lately, university has been sucking the life outta me :P hope you’re doing good!
@JeffM: Great info. Thanks for sharing and making the article even better. On your points:
1) Yes, that is correct, the actual 404 error pages are only served for missing pages, not individual assets such as CSS/JS files, images, and so on. For these, the server will still return a 404 response, which enables us to monitor missing files via error logs et al.
2a) Good call. This is an unfortunate side-effect of ignoring directories and files that actually exist in order for the permalink rules to be applied only for the generated URLs specified in the database. Otherwise, life on planet WordPress would be quite chaotic, confusing, and rather impossible. Thankfully, your recommended directive virtually eliminates this issue.
2b) Indeed, the 404 Status header is definitely something that needs to be integrated into any theme’s 404 handling process. Unfortunately, 99% of themes out there simply use the default WordPress 404 page (or some derivative thereof), which is easily verified by running a free online site checker such as the duplicate content checker at Virante.com (404 link removed 2014/02/14). Along with the requisite
meta
tags, adding the status header to your theme’s404.php
page is essential to keeping duplicate content out of the search engines.Thanks again for taking the time to elaborate on these important points. I am looking forward to checking the results of adding your recommended htaccess directive. ;)
@kodegeek: My pleasure — glad you enjoyed it :)
@Leon: yes, I suppose that was a bit over the top — although some people do scare quite easily.
@Miguel Azevedo e Castro: Nice one! Looks like you’re staying plenty busy!
@Bene, @yustana: You’re welcome! Thanks for the feedback.
Hi Teddy, thanks for dropping by! Always nice to see you online :)
I am glad you enjoyed the article, and will try to answer your question about overloading the server. Yes, it is a possibility, but as JeffM points out, there is an easy way to eliminate a great deal of 404s for individual page assets. By adding the htaccess directive to your permalink rules, you ensure that WordPress ignores any requests for CSS/JS files, images, and so on. This alone will save resources, but it is also important to optimize your theme’s
404.php
page. Because it is served for every missing URL on under your WordPress directory, it is a good idea to keep it as lightweight as possible (but not at the expense of usability!). At one point, the number of 404s was so great here at Perishable Press (mostly because of spammers, crackers, et al), that I removed the404.php
file completely and just let Apache deliver its ultra-lightweight default error page. Then, once I gained better control of the malicious requests, things calmed down and I reinstated the404.php
file.I hope you are doing well! Best of luck to you at the University! :)
Jeff:
Thanks for that. Can we just make it crystal clear that the ‘Status’ we’re referring to is a server response header that’s in addition to the HTTP 1.x header:
if ( is_404() ) header('Status: 404 Not Found');
or similar.
Use FF’s ‘LiveHeaders’ or IE’s ‘ieHTTPHeaders’ addons to watch the request/response dialogues.
On Redirection:
My philosophy on 404 is this: it’s a dead end, so deal with it in as lightweight a fashion as possible. Pragmatic, huh?
I never automatically redirect, to ensure that when the 404 error page hits the browser, the borked URL is visible in the address bar. With a nice custom 404 like those in the gallery above, your visitor has landed in your safety net, and your job is done at a minimum cost in resources. If your custom page has done its job, all the visitor needs is a text link to a fairly static page like ‘Home’ and they’ll surely come to roost.
‘Smart’ responses sometimes go too far, and are a waste of human resources IMHO ;)
Greetings,
I really liked this article. I will be using it for reference.
There was one site that I do not remember the url of, but which had an amazing 404 system. Each time you went to the 404 page it displayed a different page. Each one had a comedy conversation between the server and the browser (or something like that). We kept on going back to it over and over just to read the 404 page!