How to Verify the Four Major Search Engines

.htaccess made easy

Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. Thus, it is a good idea to implement a procedure for scanning and checking select agents for authenticity. In general, the verification process involves a “forward/reverse” DNS lookup, which is then cross-verified with the search engine in question. Let’s have a quick look at how to do this..

First, visit and bookmark the following articles (and/or this article). These resources explain how to identify and verify the agents for each of the four major search engines: Google, Yahoo!, MSN/Live, and Ask.

After reading up on each of these recommended verification techniques, the moral of the story becomes crystal clear: the best way to verify the validity of any questionable IP address involves performing a “reverse/forward” DNS lookup. Indeed, this is an excellent technique for investigating any suspicious behavior happening at your domain. For many, employing reverse-forward DNS lookups is common practice — an important part of any serious security strategy. For those unfamiliar with the technique or otherwise interested in refreshing those critical skills, here is a quick tutorial covering the basics.

Reverse DNS Lookup

After gathering information for the agent in question, locate a decent reverse-DNS lookup service (such as this one) and lookup the suspect’s IP address (e.g., 123.123.123.123) to determine the registered host name of the agent’s machine. After performing the reverse IP lookup, you should find data similar to the following:

Hostname: lj511156.crawl.yahoo.net
Addr: 74.6.28.79

Forward DNS Lookup

Finally, locate a suitable forward-DNS lookup service such as this one and enter the host name data determined in the previous step. Verify the results by comparing the returned IP address with the original IP address entered for the reverse lookup. The IP addresses should be identical — if they’re not identical, the suspected agent is indeed bogus and should be dealt with swiftly and without mercy. In the next section, I provide an easy way to block subsequent access for any sneaky little bastards that you may happen to find.

Block via htaccess

Blocking site access via htaccess is one of my favorite pastimes. Once you have determined the IP address(es) that you would like to block, edit the following code to match your numbers and copy to the root htaccess file of your site. Add as many or as few addresses as you need to stop the spam worms from digging through your business.

<Limit GET POST PUT>
 order allow,deny
 allow from all
 deny from 111.111.111
 deny from 222.222.222
 deny from 333.333.333
</LIMIT>

Block via PHP

On the other hand, you may prefer to employ a quick bit of PHP to block the IP addresses of illegitimate children of incestuous cave-dwelling australopithecines. Here is the code required to do so; simply edit the array of IP addresses to suit your needs and place at the top of any PHP file for which you would like selectively to deny access. For WordPress users, a great choice for this would be your theme’s header.php file.

<?php
$deny = array("111.111.111", "222.222.222", "333.333.333");
if (in_array ($_SERVER['REMOTE_ADDR'], $deny)) {
   header("location: http://www.google.com/");
   exit();
} ?>

For more information on blocking IP addresses with PHP, check out our aptly named article, How to Block IP Addresses with PHP.