There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page [...] • Read more »
Perishable Press
WordPress, Web Design, Code & Tutorials
- Viewing page 1 of 2
- ← View older posts
- Visit the Archives
SEO Experiment: Let Google Sort it Out
One way to prevent Google from crawling certain pages is to use <meta /> elements in the <head></head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: <meta name=”googlebot” content=”noindex,noarchive” /> I’m no SEO guru, but it is my general understanding that it is possible to manipulate the flow of page rank throughout a [...] • Read more »
Yahoo! Slurp too Stupid to be a Robot
I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our websites every minute of the day. For the most part, there are effective [...] • Read more »
Custom OpenSearch Functionality for Your Website
I recently added OpenSearch functionality to Perishable Press. Now, OpenSearch-enabled browsers such as Firefox and IE 7 alert users with the option to customize their browser’s built-in search feature with an exclusive OpenSearch-powered search option for Perishable Press. The autodiscovery feature of supportive browsers detects the custom search protocol and enables users to easily add it to their collection of readily available site-specific search options. Now, users may search the entire Perishable Press domain with the click [...] • Read more »
Yahoo! Lies about Obeying Robots.txt Directives
There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: • Read more »
Yahoo! Once Again Caught Disobeying Robots.txt Rules
Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders: • Read more »
Choosing the Best Title Separators
While writing my previous article on creating the perfect WordPress title tags, I deliberately avoided discussing the use of separators in titles. I feel that the topic is worthy of its own article, enabling a more thorough exploration of the details. Title separators are the symbols, punctuation, and other characters used to distinguish between various parts of the page title. For example, a title may include the blog name, post title and blog description, with each element [...] • Read more »
Yahoo Incongruities.
When frustration builds, and finally reaches its the boiling point, it’s nice to be able to express yourself to someone. Although I really don’t enjoy ranting about things, but when it comes to certain aspects of Yahoo!, I just can’t he’p myse’f. So, thanks to recent attempt at using My Yahoo!, it’s time to get some of this off my chest, clear the decks, and give Yahoo! (yet another) chance to clean up its act. Here are [...] • Read more »
Taking Advantage of the X-Robots Tag
Controlling the spidering, indexing and caching of your (X)HTML-based web pages is possible with meta robots directives such as these: <meta name=”googlebot” content=”index,archive,follow,noodp”/> <meta name=”robots” content=”all,index,follow”/> <meta name=”msnbot” content=”all,index,follow”/> I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots” 1 crawl and represent my (X)HTML-based content in search results. For other, non-(X)HTML types of content, however, using meta robots directives to control indexing and caching is not [...] • Read more »
Yahoo! Slurp in My Blackhole (Yet Again)
Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): • Read more »
Focus on the Details: Optimizing Images for Humans and Machines
In this article, I discuss how to get the most out of your site’s images by optimizing them for both people and search engines.. For many sites, images play an important role in the communication process. If used correctly, images have the power to make your articles come alive with clarity and vibrancy. Some visitors may merely notice the image and continue reading, while others will want to know more about your images and dig deeper. While [...] • Read more »
Yahoo! in my Blackhole
Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the [...] • Read more »
Seven Ways to Beef Up Your Best Pages for the Next Google PR Update
Time is running out! Soon, it will be time for the next Google PageRank (PR) update. While it is difficult to predict how your site will perform overall, it seems likely that your highest ranking pages will continue to rank well. The idea behind this article is to improve your site’s overall pagerank by totally beefing up your most popular pages. Of course, every page on your site is important. Ideally, you would want to employ these [...] • Read more »
Suspicious Behavior from Yahoo! Slurp Crawler
Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming [...] • Read more »
SEO 101: Best Practices
After studying Peter Kent’s excellent book, Search Engine Optimization for Dummies, several key methods emerged for optimizing websites for the search engines. Although the book is written for people who are new to the world of search engine optimization (SEO), many of the principles presented throughout the book remain important, fundamental practices even for the most advanced SEO-wizards. This article divulges these very useful SEO practices and organizes them into manageable chunks 1. Text Essentials The golden [...] • Read more »
Smooth Operators: Sharpen your Google Search Skills
Coming soon to the World Wide Web: Everything. The perpetually evolving sum of human knowledge available online. Anywhere. Anytime. So, what are you looking for? Information concerning something, somewhere, about somebody.. You know it’s there somewhere. Sure, you could waste time by digging through that immense labyrinth of browser bookmarks, maybe eventually finding that one link that may or may not lead you to the page that you remember.. No thanks. The Web is far too rich [...] • Read more »