Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security
52 posts related to: Wutsearch: Search Engine Launchpad

Unexplained Crawl Behavior Involving Tagged Query Strings

I need your help! I am losing my mind trying to solve another baffling mystery. For the past three or four months, I have been recording many 404 Errors generated from msnbot, Yahoo-Slurp, and other spider crawls. These errors result from invalid requests for URLs containing query strings such as the following: https://example.com/press/page/2/?tag=spam https://example.com/press/page/3/?tag=code https://example.com/press/page/2/?tag=email https://example.com/press/page/2/?tag=xhtml https://example.com/press/page/4/?tag=notes https://example.com/press/page/2/?tag=flash https://example.com/press/page/2/?tag=links https://example.com/press/page/3/?tag=theme https://example.com/press/page/2/?tag=press Note: For these example URLs, I replaced my domain, perishablepress.com with the generic example.com. Turns out that listing the plain-text […] Continue reading »

Taking Advantage of the X-Robots Tag

Controlling the spidering, indexing and caching of your (X)HTML-based web pages is possible with meta robots directives such as these: <meta name="googlebot" content="index,archive,follow,noodp"/> <meta name="robots" content="all,index,follow"/> <meta name="msnbot" content="all,index,follow"/> I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots”1 crawl and represent my (X)HTML-based content in search results. For other, non-(X)HTML types of content, however, using meta robots directives to control indexing and caching is not an option. An […] Continue reading »

Perishable News: Site Upgrades, Upcoming Interview, and PageRank Update

[ Photo: Perishable ]

Ever since writing that last review article, I have been feeling the need to cut loose, relax, and blog about something a little more “down-to-earth,” like recent things that have been happening around here. If you are new to Perishable Press, rest assured that I try to keep these “site/personal news” update posts down to a minimum. Whenever possible, I save up a bunch of interesting off-topic things that I want to talk about, and then cram them all together […] Continue reading »

Optimizing Google Analytics Performance

[ Image: Global Map Icon ]

It has occurred to me lately that I no longer use Google Analytics for Perishable Press. Instead, I find myself keeping an eye on things using Mint almost exclusively. So, the question now is: do I continue serving the GA JavaScript to keep the profile active just in case I ever need the additional stats? I mean, Mint already does a great job at recording all of information I could ever need, so I no longer see the use for […] Continue reading »

Yahoo! Slurp in My Blackhole (Yet Again)

Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): Continue reading »

Yahoo! in my Blackhole

Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has […] Continue reading »

Prevent JavaScript Elements from Breaking Page Layout when Following Yahoo Performance Tip #6: Place Scripts at the Bottom

[ Screenshot: broken footer positioning in IE 7 ]

By now, everyone is familiar with the Yahoo Developer Network’s 14 best-practices for speeding up your website. Certainly, many (if not all) of these performance optimization tips are ideal for high-traffic sites such as Yahoo or Google, but not all of them are recommended for smaller sites such as Perishable Press. Nonetheless, throughout the current site renovation project, I have attempted to implement as many of these practices as possible. At the time of this writing, I somehow have managed […] Continue reading »

How to Verify the Four Major Search Engines

Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. So it’s always a good idea to implement a procedure for […] Continue reading »

Stop WordPress from Leaking PageRank to Admin Pages

During the most recent Perishable Press redesign, I noticed that several of my WordPress admin pages had been assigned significant levels of PageRank. Not good. After some investigation, I realized that my ancient robots.txt rules were insufficient in preventing Google from indexing various WordPress admin pages. Specifically, the following pages have been indexed and subsequently assigned PageRank: Continue reading »

Seven Ways to Beef Up Your Best Pages for the Next Google PR Update

[ Image: Grotesquely muscular older man ]

Time is running out! Soon, it will be time for the next Google PageRank (PR) update. While it is difficult to predict how your site will perform overall, it seems likely that your highest ranking pages will continue to rank well. The idea behind this article is to improve your site’s overall pagerank by totally beefing up your most popular pages. Of course, every page on your site is important. Ideally, you would want to employ these techniques to every […] Continue reading »

Suspicious Behavior from Yahoo! Slurp Crawler

[ Image: Black and white illustration of the upper half of a man's suspicious, paranoid face ]

Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming the door on […] Continue reading »

Allow Google Reader Access to Hotlink-Protected Images

[ Image: Google Reader Icon ]

In our previous article, we explain the process of allowing Feedburner to access your hotlink-protected images. The article details the entire process, which covers the basics of hotlink protection and involves adding several lines of code to your htaccess file. In this article, we skip the detailed explanations and present only the main points. The discussion is very similar for both Feedburner and Google Reader, and may be extrapolated to serve virtually any purpose. If you are using htaccess to […] Continue reading »

SEO 101: Best Practices

[ Image: Abstracted Documents ]

After studying Peter Kent’s excellent book, Search Engine Optimization for Dummies, several key methods emerged for optimizing websites for the search engines. Although the book is written for people who are new to the world of search engine optimization (SEO), many of the principles presented throughout the book remain important, fundamental practices even for the most advanced SEO-wizards. This article divulges these very useful SEO practices and organizes them into manageable chunks. Continue reading »

Search Engine Registration Notes

In his excellent book, Search Engine Optimization for Dummies, Peter Kent explains that many search engines actually get their search results from one (or more) of the larger search engines, such as Google or The Open Directory Project. Therefore, the author concludes that it may not be necessary to spend endless hours registering with thousands of the smaller search sites. Rather, the author provides a brief list of absolutely essential search sites with which it is highly recommended to register. […] Continue reading »

SEO 101: Establishing and Evolving an Effective Link Strategy

Optimizing your website for the search engines involves many important aspects including keyword development, search engine registration, and SEO logging. This Perishable Press tutorial scopes yet another critical weapon in the SEO wars: establishing and evolving an effective link campaign. We will begin our article by focusing on incoming and outgoing link strategies, proceed with a few tips for internal links, and then conclude with some ideas for getting links. Continue reading »

Automatic Language Translation Methods

[ Google Logo 2006 ]

As you may have noticed, Perishable Press recently added automatic language translation to each of our articles. The free, automatic translations are available as a series of image links (via corresponding country flag icons) next to each article’s individual post view. We have found that providing this free service is important as many of our visitors come from countries other than the United States, and therefore may be unable to read our articles as presented in the English language. Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
.htaccess made easy: Improve site performance and security.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.