Yahoo! Once Again Caught Disobeying Robots.txt Rules
Post #594 categorized as Websites, last updated on Aug 18, 2008
Tagged with blacklist, ip, robots, search, security, server, spider, yahoo
Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders:
IP: 74.6.17.163
Host: 163.17.6.74.in-addr.arpa llf520189.crawl.yahoo.net
[2008-07-30 (Wed) 16:51:59] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
OrgName: Inktomi Corporation
OrgID: INKT
Address: 701 First Ave
City: Sunnyvale
StateProv: CA
PostalCode: 94089
Country: US
NetRange: 74.6.0.0 - 74.6.255.255
CIDR: 74.6.0.0/16
NetName: INKTOMI-BLK-6
NetHandle: NET-74-6-0-0-1
Parent: NET-74-0-0-0-0
NetType: Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
RegDate: 2006-02-13
Updated: 2007-03-09
RAbuseHandle: NETWO857-ARIN
RAbuseName: Network Abuse
RAbusePhone: +1-408-349-3300
RAbuseEmail: network-abuse@cc.yahoo-inc.com
OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName: Network Abuse
OrgAbusePhone: +1-408-349-3300
OrgAbuseEmail: network-abuse@cc.yahoo-inc.com
OrgTechHandle: NA258-ARIN
OrgTechName: Netblock Admin
OrgTechPhone: +1-408-349-3300
OrgTechEmail: netblockadmin@yahoo-inc.com
# ARIN WHOIS database, last updated 2008-07-29 19:10
I enjoy the control provided by the robots.txt protocol and look forward to the day when all of the major search engines get it right.
#1 — Donace
This just shows that Google’s nearest competitor is just oh so far behind in the game.