Welcome to the new design! Please report any bugs or issues, thanks :)
Web Dev + WordPress + Security

How to Verify the Four Major Search Engines

Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar.

So it’s always a good idea to implement a procedure for scanning and checking select agents for authenticity. In general, the verification process involves a “forward/reverse” DNS lookup, which is then cross-verified with the search engine in question. So if you want to verify the four major search engines — Google, Bing, Yahoo!, Ask, or anything else for that matter — this quick tutorial will show you how it’s done.

Intro

The best way to verify the validity of any questionable IP address involves performing a “reverse/forward” DNS lookup. It is an excellent technique for investigating any suspicious activity happening at your domain. For many, employing reverse-forward DNS lookups is common practice — an important part of any serious security strategy. For those unfamiliar with the technique or otherwise interested in refreshing those critical skills, here are the steps..

Step 1: Reverse DNS Lookup

To do a reverse DNS lookup, you need the IP address of whatever it is that you want to investigate. For example, if I find some rogue bot terrorizing my site’s access logs, I copy it’s IP address to my clipboard for further analysis. So one way or another, you will need an actual IP address in order to do a reverse lookup of the associated DNS information.

Once equipped with an IP address, locate a decent Reverse DNS lookup tool. There, enter the suspect’s IP address (e.g., 66.249.66.1) in the “IP Address” box, and then click “Find Host Name”. That will return the registered hostname along with some other information, should look something like this:

IP Address : 66.249.66.1
Location   : United States (95% accuracy)
Host Name  : crawl-66-249-66-1.googlebot.com

That’s the money right there. Now we want to verify that the host name isn’t spoofed, such that the IP address resolves to the hostname and vice versa.

Step 2: Forward DNS Lookup

To verify that the host name is not spoofed, revisit the DNS lookup page. There, enter the hostname (e.g., crawl-66-249-66-1.googlebot.com) in the “Host Name” box, and then click “Find IP Address”. That will return the registered IP address along with some other information, should look something like this:

Host Name  : crawl-66-249-66-1.googlebot.com
IP Address : 66.249.66.1
Location   : United States (94% accuracy)

And that’s all there is to it. The actual verification part happens when you compare the results to make sure that everything matches up.

The IP address and hostname returned by the reverse lookup should be identical to the IP address and hostname returned by the forward lookup. If they’re not identical, the suspect bot/agent is indeed bogus and should be dealt with swiftly and without mercy. In the next section, I provide two easy ways to block subsequent access for any sneaky little bastards that you may happen to find. And of course, if there is any doubt, try the forward-reverse lookup again using one or two different DNS lookup tools.

Optional: Blocking the IP Address

Detecting malicious activity and blocking site access via .htaccess is one of my favorite pastimes. So if you find some nasty bot that you want to block from accessing your site, you can make it happen via .htaccess (recommended) or PHP (solid technique but not as fast).

Block via .htaccess

Once you have determined the IP address(es) that you would like to block, edit the following code to match, and then copy to the root .htaccess file of your site. Add as many or as few addresses as needed to stop bad bots and spam worms from digging through your business.

<Limit GET POST PUT>
	Order allow,deny
	Allow from all
	Deny from 111.111.111
	Deny from 222.222.222
	Deny from 123.123.123
</Limit>

Block via PHP

If .htaccess is not an option, you can employ a quick bit of PHP to block the IP addresses of any incestuous cave-dwelling australopithecines that may be asking for it. Here is the code required to make it happen:

<?php $deny = array("111.111.111", "222.222.222", "333.333.333");
if (in_array ($_SERVER['REMOTE_ADDR'], $deny)) {
   header("location: http://www.google.com/");
   exit();
} ?>

Simply edit the array of IP addresses to suit your needs and place at the top of any PHP file for which you would like selectively to deny access. For WordPress users, a great choice for this would be your theme’s header.php file, or even better write a quick function and hook into something like init or better.

For more information on blocking IP addresses with PHP, check out our aptly named article, How to Block IP Addresses with PHP.

Jeff Starr
About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
Digging Into WordPress: Take your WordPress skills to the next level.
Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
WP Themes In Depth: Deep dive into WP theme development.
Thoughts
Playing the long game.
They have weaponized the idiots.
Good software never steals focus from the user. Even during startup.
After 10 years running my own business, I still manage schedules and tasks using old school post-it notes, sometimes simple sometimes very elaborate.
You know those sites, where you're trying to just grab a quick bit of information but the page is shifting all over the place as it loads up 3 million advertisements.
Selling two of my top WordPress domains, wp-zen.com & zen-wp.com $300 for both. Aged 9 years. Drop a line if interested.
Never force your users to type out a password (or any long string of characters) by blocking the paste function. Typing long strings leads to MORE errors than simple copy/paste.