New Bookstore! Save 20% on books with discount code: LAUNCH
Web Dev + WordPress + Security

Yahoo! in my Blackhole

Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has anyone else experienced such unexpected behavior from one the four major search engines? Hmmm.. let’s dig a little further..

Here is the carefully formulated, highly specific, properly placed robots.txt rule that explicitly and strictly forbids all agents from accessing my blackhole spider trap for bad bots:

User-agent: *
Disallow: */blackhole/*

Nothing unusual here. This is standard stuff, right? But wait, what about that crazy wildcard character? Does Yahoo! acknowledge such a creature? Sure they do (404 link removed). So what’s up, then? I surely don’t know. This is such unexpected behavior from such a popular, highly visible search engine. Thus, let’s make sure that we are actually dealing with ol’ Slurp, and not some nasty impostor. To do this, we’ll follow Yahoo’s own advice and perform a forward-reverse IP lookup for verification. Here are the results:

Reverse lookup for IP: 74.6.26.167

lj511353.crawl.yahoo.net
OrgName:    Inktomi Corporation 
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   74.6.0.0 - 74.6.255.255 
CIDR:       74.6.0.0/16 
NetName:    INKTOMI-BLK-6
NetHandle:  NET-74-6-0-0-1
Parent:     NET-74-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:    
RegDate:    2006-02-13
Updated:    2007-03-09

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse 
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com
.
.
.

Forward lookup for hostname: lj511353.crawl.yahoo.net

74.6.26.167

Yup, apparently, all checks out — it was Yahoo’s machine alright. Tsk, tsk — naughty Slurp! Even worse, this is not the first time Slurp has been caught sniffing around where it should not be sniffing. Needless to say, I will definitely be keeping a close eye on Yahoo! from now on.

Finally, just for the record, here is the log entry for the blackhole event to which this article refers:

IP Address:  74.6.26.167
Date/Time:   2007-11-21 (Wed) 20:52:37
URI Request: GET /blackhole/ HTTP/1.0
User Agent:  Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   74.6.0.0 - 74.6.255.255
CIDR:       74.6.0.0/16
NetName:    INKTOMI-BLK-6
NetHandle:  NET-74-6-0-0-1
Parent:     NET-74-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:   
RegDate:    2006-02-13
Updated:    2007-03-09

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  netblockadmin@yahoo-inc.com

# ARIN WHOIS database, last updated 2007-11-20 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Know what’s up? Drop a comment and share the wealth..

Jeff Starr
About the Author
Jeff Starr = Fullstack Developer. Book Author. Teacher. Human Being.
Blackhole Pro: Trap bad bots in a virtual black hole.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
WP Themes In Depth: Build and sell awesome WordPress themes.
Thoughts
Book updates complete! DigWP, .htaccess, Tao-WP, and WP Themes books all updated and current with all the latest.
Stop giving so much juice to social media. Get a site and OWN your content.
I would give my left testicle for macOS Finder to remember column widths.
The chemical name for titin (the largest known protein) has 189,819 characters and takes several hours to pronounce.
Working on book updates, should be available for download sometime next week.
iCloud is like the Terminator. It will never stop trying to get your data. An endless fight on each Apple device to keep iCloud disabled and empty.
Take a screenshot with Firefox (no extension required). Open Developer Tools Settings and enable the “Take a screenshot” button. Then click the button :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.