Yahoo! Once Again Caught Disobeying Robots.txt Rules

Post #594 categorized as Websites, last updated on Aug 18, 2008
Tagged with blacklist, ip, robots, search, security, server, spider, yahoo

Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders:

IP:   74.6.17.163
Host: 163.17.6.74.in-addr.arpa llf520189.crawl.yahoo.net

[2008-07-30 (Wed) 16:51:59] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   74.6.0.0 - 74.6.255.255
CIDR:       74.6.0.0/16

NetName:    INKTOMI-BLK-6
NetHandle:  NET-74-6-0-0-1
Parent:     NET-74-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM

RegDate:    2006-02-13
Updated:    2007-03-09

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  netblockadmin@yahoo-inc.com

# ARIN WHOIS database, last updated 2008-07-29 19:10

I enjoy the control provided by the robots.txt protocol and look forward to the day when all of the major search engines get it right.

Subscribe to Perishable Press


2 Responses

TopLeave a comment

[ Gravatar Icon ]

#1Donace

This just shows that Google’s nearest competitor is just oh so far behind in the game.

[ Gravatar Icon ]

#2Jeff Starr

So true. I mean, how difficult is it for a computer to obey the operational rules it’s been given? I honestly can’t think of any excuse for Slurp’s recent deviancy into explicitly forbidden territory. Given that computers do not make “mistakes,” such illicit crawl behavior seems intentional. In either case, it’s not looking good for Yahoo.

Share your thoughts..

TopRead official comment policy

The rules are simple. Comment intelligently. Stay on-topic. Don’t spam! Suspected spam will be deleted. Use your real name or nickname, not a site name or business name. Using a site name or business name is a good way to get your link or comment removed. Certain comments are moderated; if your comment does not appear after several days, or if you wish to comment privately, contact me. Also, by posting a comment, you grant this site a perpetual license to reproduce your comment, name, and website URL. Lastly, you may use basic HTML markup, but please do not use <pre> tags. Instead, wrap your code with <code> tags. Use a new set of <code> tags for each code term or phrase, as well as for each individual line of code (i.e., multiple lines of code require multiple code tags). Please see the complete comment policy for more information.