Given my propensity to discuss matters involving error log data (e.g., monitoring malicious behavior, setting up error logs, and creating extensive blacklists), I am often asked about the best way to go about monitoring 404 and other types of server errors. While I consider myself to be a novice in this arena (there are far brighter people with much greater experience), I do spend a lot of time digging through log entries and analyzing data. So, when asked recently about my error monitoring practices, I decided to share my response here at Perishable Press, and hopefully get some good feedback concerning best practices for error monitoring. Here is my email response to the question:
I don’t know if there is a “best” way to monitor error logs. There are many different ways to do it, and you just have to pick the method(s) that works best for you and provides the information you need. For me, I take the process to be a bit of an art form, with the end result being an improved understanding of the traffic patterns relating to my domains.
That said, the methods I use to track errors are three-fold:
- First, I keep a close eye on PHP errors with a few lines in my HTAccess file (Linux/Apache)
- I also like to watch my Apache error log, which is available through cPanel on a shared host
- Most importantly (to me), I keep a tight watch on all 404 errors via a custom 404 script
If you would like more information on any of these methods, let me know and I will do my best to share them with you.
As for monitoring and analyzing, I do everything manually, line by line. I manage a large number of sites and spend around 2 hours per week checking things out, taking notes, and fixing or locking things down as needed. I began this process around four years ago with a single site. As I got better analyzing the 404 error log — which includes URL, referrer, query string, host, IP, and several other metrics — the number of sites increased as well. Many people prefer to automate the process using Excel or some other software, but I take great pleasure in a more organic approach. The key here is to remember that practice improves skill (as I am sure you already know), and will enable you to scan thousands of lines of log entries in very short periods of time.
[Editor’s Note: Michael had also inquired about a sequence of 404 errors that he had sent..]
As for the error log entries you sent, I have the following observations:
First of all, the pattern suggests some sort of an automatic/mass downloader or crawler of some sort (obviously). It doesn’t look like a typical crawler following links, as there are other types of resources that were requested (e.g., JS and RSS files). Whatever it is, it isn’t friendly and should be blocked. Before doing so, I would reverse-lookup the IP/Host information and see if you can gain any insight into what it is or what it’s doing. I see entries like this all the time, and would certainly feel justified blacklisting the related IP address.
I hope this helps, Michael — feel free to ask any additional questions about this — it is one of my favorite activities!