Yahoo! Slurp too Stupid to be a Robot

by Jeff Starr on Sunday, March 15, 2009 21 Responses

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our websites every minute of the day.

For the most part, there are effective methods available enabling us to protect our sites against the endless hordes of irrelevant and mischievous bots. Such evil is easily blocked with virtually zero side-effects because their presence is simply irrelevant.

But what about bad bots that aren’t exactly irrelevant, such as Yahoo’s mindless Slurp crawler? By disobeying the robots.txt protocol as promised, Yahoo’s Slurp clearly falls into the “bad-bot” category. Unlike typical “nonsense” bots, Slurp is not exactly irrelevant (yet), so simply blocking them is not a reasonable solution.

And Yahoo must know this. Why else would they allow their Slurp software to flagrantly disobey robots.txt directives? Yahoo certainly benefits from proclaiming standards compliance, wherein they front credibility by claiming adherence to the same guidelines as industry leaders such as Google and Microsoft. I have never seen (nor heard of) a single instance of either googlebot (Google’s web crawler) or msnbot (Microsoft’s web crawler) appearing in locations forbidden by robots.txt directives.

So what’s up, Yahoo? There are only two possibilities here: Slurp is disobeying either erroneously or intentionally. Either case does not look good for Slurp’s master, Yahoo. There is either an error in the Slurp software that is causing Slurp to roam around like a drunken sailor, or else the software is correct and Slurp is behaving exactly as it has been directed. If the problem is an error, you would think that Yahoo would have been able to get a handle on it after a few days, months, or even years. If the problem is that complex or unsolvable, then Slurp should be retired immediately. Nobody benefits from a stupid web crawler — not you, not me, and certainly not Yahoo.

On the other hand, if there is no problem with Slurp’s ability to obey its own programming, then the programming must be instructing Slurp to disobey robots.txt directives. This of course is an even worse case scenario than if Slurp were simply malfunctioning. Yahoo would then be guilty of lying to users, webmasters, and shareholders by claiming to obey the rules while secretly programming Slurp to disobey them. Hopefully, this is not the case and this whole mess is easily explained by the simple fact that Yahoo’s Slurp is too stupid to be a robot.

What do you think? Why does Slurp continue to disobey the clearly stated and agreed-upon robots.txt directives? Is it because Slurp is broken or because it has been told to do so?

Log entries showing Yahoo’s Slurp crawler accessing forbidden directories

At the bottom of the source code of my current theme (opens new window or tab), I include the following markup:

<!-- Warning: please to NOT follow the next link ("Welcome to the Blackhole") or you may be banned from this site -->
<div style="display:none;">
	<a href="http://perishablepress.com/blackhole/" title="Welcome to the Blackhole" rel="nofollow">Attention: Do NOT follow this link!</a>
</div>

Then, in my site’s robots.txt file, I include the following directives:

User-agent: *
Disallow: */blackhole/*

Taken together, the message is clear: stay OUT of my blackhole, especially if you are a robot. As simple and clear as it gets, right? Google certainly gets it, and so does good ‘ol MSN. Yet somehow, Yahoo’s stupid Slurp crawler can’t seem to figure it out. Consider the following entries taken from the access log of my blackhole directory:

Yahoo Slurp disobeys robots.txt directives on November 19th, 2008

72.30.81.166
[2008-11-19 (Wed) 12:06:09] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  rauschen@yahoo-inc.com

# ARIN WHOIS database, last updated 2008-11-18 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on December 12th, 2008

74.6.17.165
[2008-12-12 (Fri) 01:08:23] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   74.6.0.0 - 74.6.255.255
CIDR:       74.6.0.0/16
NetName:    INKTOMI-BLK-6
NetHandle:  NET-74-6-0-0-1
Parent:     NET-74-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2006-02-13
Updated:    2007-03-09

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  abechtel@inktomi.com

# ARIN WHOIS database, last updated 2008-12-11 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on January 5th, 2009

72.30.142.217
[2009-01-05 (Mon) 08:20:15] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  rauschen@yahoo-inc.com

# ARIN WHOIS database, last updated 2009-01-04 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on February 25th, 2009

72.30.79.95
[2009-02-25 (Wed) 01:38:02] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  rauschen@yahoo-inc.com

# ARIN WHOIS database, last updated 2009-02-24 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on March 3rd, 2009

72.30.65.54
[2009-03-03 (Tue) 18:26:43] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  rauschen@yahoo-inc.com

# ARIN WHOIS database, last updated 2009-03-02 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on March 3rd, 2009 (again)

72.30.65.54
[2009-03-03 (Tue) 18:26:43] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  rauschen@yahoo-inc.com

# ARIN WHOIS database, last updated 2009-03-02 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on March 10th, 2009

72.30.65.54
[2009-03-10 (Tue) 05:34:38] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  rauschen@yahoo-inc.com

# ARIN WHOIS database, last updated 2009-03-09 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

Yahoo Slurp disobeys robots.txt directives on March 15th, 2009

72.30.142.167
[2009-03-15 (Sun) 06:41:23] "GET /blackhole/ HTTP/1.0"
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
OrgName:    Inktomi Corporation
OrgID:      INKT
Address:    701 First Ave
City:       Sunnyvale
StateProv:  CA
PostalCode: 94089
Country:    US

NetRange:   72.30.0.0 - 72.30.255.255
CIDR:       72.30.0.0/16
NetName:    INKTOMI-BLK-5
NetHandle:  NET-72-30-0-0-1
Parent:     NET-72-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.YAHOO.COM
NameServer: NS2.YAHOO.COM
NameServer: NS3.YAHOO.COM
NameServer: NS4.YAHOO.COM
NameServer: NS5.YAHOO.COM
Comment:
RegDate:    2005-01-28
Updated:    2005-10-19

RAbuseHandle: NETWO857-ARIN
RAbuseName:   Network Abuse
RAbusePhone:  +1-408-349-3300
RAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgAbuseHandle: NETWO857-ARIN
OrgAbuseName:   Network Abuse
OrgAbusePhone:  +1-408-349-3300
OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com

OrgTechHandle: NA258-ARIN
OrgTechName:   Netblock Admin
OrgTechPhone:  +1-408-349-3300
OrgTechEmail:  abechtel@inktomi.com

# ARIN WHOIS database, last updated 2009-03-14 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.

As you can see, despite the clear robots.txt directives, and despite the fact that every respectable web crawler manages to obey them, my infamous blackhole directory is a regular destination for the disobedient Yahoo Slurp crawler. This behavior is not only disrespectful to the entire online community, but it makes Yahoo look either incompetent, dishonest, or both.

About the author

[ Jeff Starr ]

Jeff Starr is a web developer, graphic designer and content producer with over 10 years of experience and a passion for quality and detail. Jeff is co-author of the book Digging into WordPress and strives to help people be the best they can be on the Web. + Follow Jeff on Twitter and subscribe to Perishable Press for awesome web-design content delivered fresh.


21 Responses

Add a comment

[ Gravatar Icon ]

Brad#1

So is there no way to stop them from doing this? Is the robots.txt file the only defense one has, albeit an on-your-honor defense?

BTW, excellent website and excellent articles.

[ Gravatar Icon ]

fuzion#2

*/blackhole/* doesn’t look standard.

[ Gravatar Icon ]

Jeff Starr#3

@Brad: That’s the whole dilemma, right there. Generally, dealing with disobedient bots and user agents is as simple as adding them to the blacklist. As one of the major search engines, however, it may not be in your site’s best interests to deny Yahoo access.

@fuzion: Well, let’s break it down and see. Alphanumeric characters are obviously supported, as are forward slashes. The asterisks are actually wildcard characters that represent any valid character. Wildcards may not be standard, but Google and MSN support them, as does Yahoo (or so they claim).

[ Gravatar Icon ]

fuzion#4

I’ve used the following for as long as my domain’s been up and haven’t had any issues:

User-agent: *
Disallow: /banme/

/banme/index.php being a standard “add ip to ../.htaccess deny list” script

/banme/index.php is added to the header in a similar way

I use it on 5 domains (3 are writing to a shared root .htaccess) and get new bans almost daily.

I saw on wiki that Disallow: * just isn’t recommended (for reasons not well explained). Perhaps it has something to do with drafts that allow for regex where the original standard does not.
http://www.conman.org/people/spc/robots2.html#format.directives.disallow

[ Gravatar Icon ]

Jeff Starr#5

Yeh, I have actually tried using a variety of different directives (at the same time) for my blackhole directory, including:

Disallow: /blackhole
Disallow: /blackhole/
Disallow: */blackhole
Disallow: */blackhole/*

Unfortunately, Slurp still couldn’t manage to understand and/or obey them, and continually found itself visiting explicitly forbidden territory, as discussed here. Nevertheless, it should be emphasized that Yahoo! claims to support wildcards. But maybe that’s the issue here — perhaps Slurp doesn’t support wildcards after all..

[ Gravatar Icon ]

Patrick#6

I noticed that your /blackhole/ directory displays:

Please note that the following Whois data will be reviewed carefully. If it is determined that you suck, you will be banned from this site forever.

followed by the WHOIS information. I was wondering how you obtained this information.

[ Gravatar Icon ]

Jeff Starr#7

@Patrick: The WHOIS data is obtained by a script that queries the WHOIS database and echoes the formatted results to browser. I am thinking about sharing the technique with my readers, and probably will as soon as my book is finished.

[ Gravatar Icon ]

Phil#8

Maybe “Slurp is not exactly irrelevant (yet)” is wrong.

My heaviest traffic site is a regional news blog doing 5 daily media releases and occasional in-depth, so Gbot is all over it like a rash (and, yep, slurp like a drunken sailer :0).

Yahoo visitors are 1/10 of Live who are 1/100 of Google.

[ Gravatar Icon ]

Jeff Starr#9

Good point, Phil. My stats are similar, with Yahoo traffic representing less than a fraction of a percentage of total visitors. Unless they pull a magic rabbit out of their hat, Yahoo may be joining the ranks of Ask et al.

[ Gravatar Icon ]

Monkey#10

YES! Slurp has been crawling my site, but actually accessing files and folders not allowed by my robots.txt. I just assume block it - it does no good for the site by completely disregarding the robots.txt file.

I was waiting for an article on this to be posted!

-Monkey

P.S. First post to your site, but I must say I absolutely love it. The best articles I’ve read in a long while.

[ Gravatar Icon ]

Jeff Starr#11

@Monkey: I am one step away from blacklisting them myself. Will I miss the three daily visitors they send? Maybe. But protection against the malicious Slurp crawler would be well worth the sacrifice. I’m glad you found the article useful!

Also, thanks for the compliment on the site — it’s a labor of love, so I am glad you enjoy the content. :)

[ Gravatar Icon ]

Monkey#12

Haha just maybe ;)

And something tells me Slurp might actually be a malicious script that uses the name from a not originally created by Yahoo!. This blatently disobeying of the robots.txt is just too weird, and Yahoo, at least I would hope, would not be as intrusive as this on purpose. Perhaps a programmer just spaced on the universal rules of a robots.txt file? O.o

-Monkey

P.S. Yes, It’s my new favorite site! Especially since I’ve been hacked about 3 times in the past 2 days, customized 3G blacklist and all -.-” I’ll keep pounding away in a desperate attempt to stop them.

[ Gravatar Icon ]

Jeff Starr#13

Yes, that’s the whole point right there: this is not the kind of behavior that people would expect from one of the “big three” search engines. But the sad truth of the matter is that Yahoo Slurp is definitely and verifiably demonstrating malicious behavior. The verification comes from doing a forward/reverse IP/Host lookup. I wrote an article describing this technique not too long ago.

Also, bummer about your site getting hacked. Did you know that there is a “4G” version of the blacklist available? In any case, keep fighting the good fight against those nefarious cracker scumbags.

[ Gravatar Icon ]

Emanuel#14

holy mother! i was in the blackhole, please don’t ban me from this site.

btw: hopefully after switching yahoo! to the bing-alghorithm, this incompetence/impudence will have an end.

[ Gravatar Icon ]

Jeff Starr#15

@Emanuel: Lol! I think you’re safe — let me know if you get locked out :)

[ Gravatar Icon ]

Sara#16

Hi all, glad ( not realy) to see many have my same problem.
the fact is that i don’t understand much i just know tha my bandwidth in going all for surp yet i told him to stay out!!

he comes in every day and there goes 4.38GB

well just wanted to say, thank form your info.

[ Gravatar Icon ]

Jeff Starr#17

@Sara: That seems excessive even for Slurp. If Yahoo is really chewing up over four gigs of bandwidth every visit, you may want to weigh the pros and cons of blacklisting them. How much traffic are they sending to your site? Hardly anything here.

[ Gravatar Icon ]

Sara#18

Oh you are right . my hosting service explan to me that 4.35gb is per month,
but it is true that i told him to stay out and he come in anyway.
i told them thay are not respecting the robot.txt
i’m still waiting for an answer.
well for the trafiic thay are sending i’m not sure but i do have many pages on yahoo search now.

thank you for your reply.

[ Gravatar Icon ]

Jeff Starr#19

Oh per month, you mean for the entire bandwidth usage, not just for Slurp, correct? If so, that seems more reasonable, but I would still keep an eye on how much bandwidth and resources Slurp is using — I have heard scary reports that would surprise most people.

It’s good that you reported the issue to them, although I can tell you that they know about it. I have posted many times about Yahoo’s flagrant lying about obeying the robots.txt directives. Also, remember that indexed pages are worthless in an engine that nobody is using.

[ Gravatar Icon ]

Sara#20

No, per month only for slurp.

(it is a lot more then google. and get i visits from google)

This is what thay say:

Hello Sara,

Thank you for writing to Yahoo! Search.

I understand that you’d like to prevent Yahoo! Slurp from crawling your
website. I’d be happy to look into this for you.

The robots.txt file you submitted needs a space after the colon as
follows:

User-agent: Slurp
Disallow: /

although this will not block Slurp immediately, it will take a few days.
Please correct the robots.txt file and then within a few days the
crawling should stop. We will continue checking the robots.txt file for
changes so we may still access your server, but we shouldn’t crawl any
file other than the robots.txt after that point.

We hope this information has helped. Please let us know if our answer
did not resolve your issue and if you need further assistance.

Thank you again for contacting Yahoo! Search.

the strang thing is that i always copy and past. so the space should of been there.
So lets wait and see.

[ Gravatar Icon ]

Jeff Starr#21

..but we shouldn’t crawl any file other than the robots.txt after that point.

Notice the word “shouldn’t” there? That tells me that these guys know what Slurp is doing but refuse to do anything about it. In other words, saying that Slurp “will not” would’ve been more reassuring (even if it is a complete lie), but they can’t say that, so they don’t.

Then accusing you of an error in your syntax.. that’s exactly how they responded to one of my posts here in the past. Honestly, I have tried every possible way of writing the robots.txt and scrutinized everything, and have waited for months for Slurp to “get it” and begin to do what it’s told. But no dice — it doesn’t matter how perfect your robots.txt directives are, because Slurp does whatever Yahoo wants anyway.

Thanks for sharing the feedback from Yahoo. Let us know if the new rules are effective at keeping out that nasty Slurp bot.

Share your thoughts..

Read Comment Policy

Comment Rules: No spam. No profanity. Use your real name. You may use simple HTML tags for style. Wrap all code in <code> tags. Learn more.



Attention: Do NOT follow this link!