Yahoo! Once Again Caught Disobeying Robots.txt Rules
Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to
robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders:
IP: 22.214.171.124 Host: 126.96.36.199.in-addr.arpa llf520189.crawl.yahoo.net [2008-07-30 (Wed) 16:51:59] "GET /blackhole/ HTTP/1.0" Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) OrgName: Inktomi Corporation OrgID: INKT Address: 701 First Ave City: Sunnyvale StateProv: CA PostalCode: 94089 Country: US NetRange: 188.8.131.52 - 184.108.40.206 CIDR: 220.127.116.11/16 NetName: INKTOMI-BLK-6 NetHandle: NET-74-6-0-0-1 Parent: NET-74-0-0-0-0 NetType: Direct Allocation NameServer: NS1.YAHOO.COM NameServer: NS2.YAHOO.COM NameServer: NS3.YAHOO.COM NameServer: NS4.YAHOO.COM NameServer: NS5.YAHOO.COM RegDate: 2006-02-13 Updated: 2007-03-09 RAbuseHandle: NETWO857-ARIN RAbuseName: Network Abuse RAbusePhone: +1-408-349-3300 RAbuseEmail: firstname.lastname@example.org OrgAbuseHandle: NETWO857-ARIN OrgAbuseName: Network Abuse OrgAbusePhone: +1-408-349-3300 OrgAbuseEmail: email@example.com OrgTechHandle: NA258-ARIN OrgTechName: Netblock Admin OrgTechPhone: +1-408-349-3300 OrgTechEmail: firstname.lastname@example.org # ARIN WHOIS database, last updated 2008-07-29 19:10
I enjoy the control provided by the robots.txt protocol and look forward to the day when all of the major search engines get it right.
About the Author
Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.
This just shows that Google’s nearest competitor is just oh so far behind in the game.
So true. I mean, how difficult is it for a computer to obey the operational rules it’s been given? I honestly can’t think of any excuse for Slurp’s recent deviancy into explicitly forbidden territory. Given that computers do not make “mistakes,” such illicit crawl behavior seems intentional. In either case, it’s not looking good for Yahoo.