Latest TweetsDifference between mod_alias and mod_rewrite perishablepress.com/difference…
Perishable Press

CLI Forward-Reverse Lookup

[ The circle is now complete. ]

In previous posts, I’ve explained how to verify identity of search engines and other bots, by looking up the host name and then doing a reverse lookup to cross-check the IP address. This is often referred to as a forward-reverse lookup, or something to that effect. The point is, there are plenty of free online tools available for performing forward-reverse IP/host lookups. And online tools are great, but it’s also possible to do forward/reverse lookups directly via the command line, aka CLI. Depending on your workflow, lookups via the command line are much faster.

Forward-Reverse Lookup via Command Line

So for this, you can use whatever command-line/CLI tools you normally use. I’m on Mac, so I use Terminal mostly.

Step 1: Reverse Lookup

So whatever you’re using, open the program and enter the following command:

host 64.207.179.70

..which returns the correct domain name for my server:

70.179.207.64.in-addr.arpa domain name pointer monzillamedia.com.

Here we are using the host command to perform a reverse DNS lookup for the IP address of my own server, 64.207.179.70. You can of course use any valid IP address for this step.

Step 2: Forward Lookup

Next, we want to verify that the domain name matches the IP addressed used in step 1. To do this, we perform a forward DNS lookup for the returned domain name, again using the host command:

host monzillamedia.com

..which returns the correct domain name for my server:

monzillamedia.com has address 64.207.179.70

And so the circle is now complete: from IP address to domain name, and then from domain name back to IP address. The identity is verified ONLY IF everything matches up. Otherwise, if either IP address or hostname does not match, the identity is not confirmed, and should be investigated further, if necessary.

More Examples

Here are a couple more examples to consider.

Example 1

Say we want to verify Google reporting an IP address of 66.249.66.1. We first run host on the IP:

host 66.249.66.1

That command should return this line:

1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

So now we can verify by running host on the returned domain name:

host crawl-66-249-66-1.googlebot.com

That command should return this line:

crawl-66-249-66-1.googlebot.com has address 66.249.66.1

So yeah, everything matches up. The IP address is verified as Google.

Example 2

Here is another example verifying another of Google’s many IP addresses:

host 66.249.90.77

..returns:

77.90.249.66.in-addr.arpa domain name pointer rate-limited-proxy-66-249-90-77.google.com.

And then:

host rate-limited-proxy-66-249-90-77.google.com

..returns:

rate-limited-proxy-66-249-90-77.google.com has address 66.249.90.77

Yahoo! Another confirmation of Google identity ;)

Jeff Starr
About the Author Jeff Starr = Creative thinker. Passionate about free and open Web.
Archives
4 responses
  1. How to do it on Windows machine?

    • Jeff Starr

      Great question! The main command we are using for the lookups is host, so I would guess it’s just a matter of finding the Windows equivalent for it.

  2. Jim S Smith October 9, 2018 @ 5:59 pmReply ]

    Very good!

    On WINDOWS:

    nslookup is your command of choice (unless you are using the very old “98” and older versions)

    Can be found here: https://www.computerhope.com/nslookup.htm

    OAN (“On another note”):

    I use the PHP equivalents in my programming:

    //	This array holds the data for known/recognized search-bots and their URL patterns.
    $known_bots=array(
    			'Ahrefs'	=>	array('URL_part'=>'.a.ahrefs.com',		'pattern'=>'#hydrogen[d]{1,3}.a.ahrefs.com#i'),
    			'archive'	=>	array('URL_part'=>'.archive.org',		'pattern'=>'#crawl[d]{1,3}.us.archive.org#i'),
    			'baidu'		=>	array('URL_part'=>'.baidu.com',			'pattern'=>'#baiduspider(-[d]{1,3}){4}.crawl.baidu.com#i'),
    			'bingbot'	=>	array('URL_part'=>'.search.msn.com',	'pattern'=>'#msnbot(-[d]{1,3}){4}.search.msn.com#i'),
    			'Exabot'	=>	array('URL_part'=>'.exabot.com',		'pattern'=>'#crawl[d]{1,3}.exabot.com#i'),
    			'gigabot'	=>	array('URL_part'=>'.gigabot.com',		'pattern'=>''),
    			'Googlebot'	=>	array('URL_part'=>'.googlebot.com',		'pattern'=>'#crawl(-[d]{1,3}){4}.googlebot.com#i'),
    			'msnbot'	=>	array('URL_part'=>'.search.msn.com',	'pattern'=>'#msnbot(-[d]{1,3}){4}.search.msn.com#i'),
    			'slurp'		=>	array('URL_part'=>'.yse.yahoo.net',		'pattern'=>'#b[d]{6}.yse.yahoo.net#i'),
    			'Yahoo!'	=>	array('URL_part'=>'.yse.yahoo.net',		'pattern'=>'#b[d]{6}.yse.yahoo.net#i'),
    			'Yandex'	=>	array('URL_part'=>'.yandex.com',		'pattern'=>'#spider(-[d]{1,3}){4}.yandex.com#i'),
    			);
    
    //	Does visitor claim to be a search-bot?
    function claim_to_be_se_bot($this_UA='') {
    	global $known_bots;
    
    	$bot_expected=false;
    
    	if (empty($this_UA)) $this_UA=htmlentities($_SERVER['HTTP_USER_AGENT'],ENT_QUOTES,'UTF-8');
    
    //	NOTE: Must use "array_keys" here, because of nested associative arrays!
    	foreach (array_keys($known_bots) as $this_bot) {
    		if (stristr($this_UA,$this_bot)!==false) {
    			$bot_expected=$this_bot;
    			break;
    			}
    		}
    
    //	Now, which BOT does visitor claim to be, again?
    	return $bot_expected;
    	}
    
    //	THE Search-bot "Interrogator" function, returns - using "tri-state" logic.
    function true_bot_or_not() {
    	global $known_bots;
    
    	$bot_expected = claim_to_be_se_bot();
    
    //	NOPE! It's not claiming to be a recognized search-bot here. Okay.
    	if (!$bot_expected) return 0;
    
    //	Get this "bot's" expected domain name.
    //	$bots_domain = $known_bots[$bot_expected]['URL_part'];
    
    //	Get cleaned-up IP Address from the reported remote address.
    	$r_addr = get_real_IP();
    
    //	Get suspected "bot's" domain name from its reported IP Address.
    	$find_da_bot = gethostbyaddr($r_addr);
    
    //	Now, do a reverse-lookup to see if its IP-Address reports the same as the remote address.
    	$da_bot_rHost = gethostbyname($find_da_bot);
    
    //	NOPE - TRUE search-bots have an IP Address that resolves to a proper domain name! BAD.
    	if (trim($find_da_bot) == $r_addr) return -1;
    
    //	NOPE - If the Bot's expected IP Address and actual remote address do not match! BAD.
    	if ($da_bot_rHost!=$r_addr) return -1;
    
    //	Says "false" if it is NOT a legitimate search-bot from its expected domain name. BAD.
    	if (strstr($find_da_bot,$known_bots[$bot_expected]['URL_part']) === false) return -1;
    
    //	More detailed pattern-matching: "False" says NOT a legitimate search-bot. BAD.
    	if (preg_match($known_bots[$bot_expected]['pattern'],$find_da_bot) === false) return -1;
    
    //	Must be a genuine searchbot! It seems to have passed all my tests. YEAY!
    	return 1;
    
    /*	NOTE: Returned values:
    
    	0	= Not claiming to be a SE-bot of any kind, OKAY.
    	1	= YES! It's a true search-bot,
    	-1	= It's a FAKE "search-bot", BAD BOY!
    
    */
    	}

    I hereby release this bit-o-code as GPL 3.0 Licensed.

    If one were to database the passing IP-Addresses in a quick-lookup table, then one could accept these vetted IP-Addresses as genuine. – That way, only once per IP would be needed (at least as a thought, anyway).

    HERE is a sample list of example UA’s and the URL-patterns for some legitimate SE-Bots:

    compatible; Baiduspider/2.0;	baiduspider-180-76-15-159.crawl.baidu.com	("baidu",".baidu.com")
    	preg_match('#baiduspider(-[d]{1,3}){4}.crawl.baidu.com#i',$find_da_bot);
    
    compatible; AhrefsBot/5.2;		hydrogen037.a.ahrefs.com					("Ahrefs",".a.ahrefs.com")
    	preg_match('#hydrogen[d]{1,3}.a.ahrefs.com#i',$find_da_bot);
    
    compatible; Googlebot/2.1;		crawl-66-249-70-30.googlebot.com			("Googlebot",".googlebot.com")
    	preg_match('#crawl(-[d]{1,3}){4}.googlebot.com#i',$find_da_bot);
    
    compatible; YandexBot/3.0;		spider-199-21-99-216.yandex.com				("Yandex",".yandex.com")
    	preg_match('#spider(-[d]{1,3}){4}.yandex.com#i',$find_da_bot);
    
    compatible; Yahoo! Slurp;		b115384.yse.yahoo.net						("Yahoo!","Slurp",".yse.yahoo.net")
    	preg_match('#b[d]{6}.yse.yahoo.net#i',$find_da_bot);
    
    compatible; bingbot/2.0;		msnbot-40-77-167-38.search.msn.com			("bingbot","msnbot","msn",".search.msn.com",".bing.com")
    	preg_match('#msnbot(-[d]{1,3}){4}.search.msn.com#i',$find_da_bot);
    
    compatible; archive.org_bot		crawl837.us.archive.org						("archive",".archive.org",".archive.is")
    	preg_match('#crawl[d]{1,3}.us.archive.org#i',$find_da_bot);
    
    compatible; Exabot/3.0;			crawl28.exabot.com							("Exabot",".exabot.com")
    	preg_match('#crawl[d]{1,3}.exabot.com#i',$find_da_bot);

    Hope this “whet’s” some appetites. ;-)

    – Jim S

  3. Jim S Smith October 9, 2018 @ 6:03 pmReply ]

    I also intended to introduce the “dig” command from the Linux CLI.

    Like: dig -x

    This neat little command helped me to find a Reverse-DNS lookup error, and contact my VPS-hosting to correct a bad PTR record on their end.

    So, doing a reverse-DNS can greatly help in troubleshooting DNS-related problems too.

    – Jim S

Drop a Comment  ]
RSS