New book on WordPress Theme Development: WordPress Themes In Depth
search
Tag Archive

Tell Google to Not Index Certain Parts of Your Page

There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This level of control over which pages are crawled and indexed is helpful, but what if you need to control how […] Read more »

SEO Experiment: Let Google Sort it Out

One way to prevent Google from crawling certain pages is to use <meta /> elements in the <head></head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: <meta name=”googlebot” content=”noindex,noarchive” /> I’m no SEO guru, but it is my general understanding that it is possible to manipulate the flow of page rank throughout a site through strategic implementation of <meta /> directives. Read more »

Yahoo! Slurp too Stupid to be a Robot

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our websites every minute of the day. For the most part, there are effective methods available enabling us to protect our sites against the endless hordes of irrelevant and mischievous bots. Such evil is easily blocked with […] Read more »

Custom OpenSearch Functionality for Your Website

I recently added OpenSearch functionality to Perishable Press. Now, OpenSearch-enabled browsers such as Firefox and IE 7 alert users with the option to customize their browser’s built-in search feature with an exclusive OpenSearch-powered search option for Perishable Press. The autodiscovery feature of supportive browsers detects the custom search protocol and enables users to easily add it to their collection of readily available site-specific search options. Now, users may search the entire Perishable Press domain with the click of a button. And you can do it too! Adding customized OpenSearch-powered search functionality to your own site is a great way to […] Read more »

Yahoo! Lies about Obeying Robots.txt Directives

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: Read more »

Yahoo! Once Again Caught Disobeying Robots.txt Rules

Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders: Read more »

Choosing the Best Title Separators

While writing my previous article on creating the perfect WordPress title tags, I deliberately avoided discussing the use of separators in titles. I feel that the topic is worthy of its own article, enabling a more thorough exploration of the details. Title separators are the symbols, punctuation, and other characters used to distinguish between various parts of the page title. For example, a title may include the blog name, post title and blog description, with each element separated by a hyphen. Any Google search will reveal that some of the most commonly used title separators include the hyphen, the dash, […] Read more »

Yahoo Incongruities.

When frustration builds, and finally reaches its the boiling point, it’s nice to be able to express yourself to someone. Although I really don’t enjoy ranting about things, but when it comes to certain aspects of Yahoo!, I just can’t he’p myse’f. So, thanks to recent attempt at using My Yahoo!, it’s time to get some of this off my chest, clear the decks, and give Yahoo! (yet another) chance to clean up its act. Here are a few complaints I have against various aspects of the Yahoo! enterprise.. First and foremost, Yahoo! sends virtually zero traffic. I know this […] Read more »

Taking Advantage of the X-Robots Tag

Controlling the spidering, indexing and caching of your (X)HTML-based web pages is possible with meta robots directives such as these: <meta name=”googlebot” content=”index,archive,follow,noodp”/> <meta name=”robots” content=”all,index,follow”/> <meta name=”msnbot” content=”all,index,follow”/> I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots” 1 crawl and represent my (X)HTML-based content in search results. For other, non-(X)HTML types of content, however, using meta robots directives to control indexing and caching is not an option. An excellent example of this involves directing Google to index and cache PDF documents. The last time I checked, meta tags […] Read more »

Yahoo! Slurp in My Blackhole (Yet Again)

Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): Read more »

Focus on the Details: Optimizing Images for Humans and Machines

In this article, I discuss how to get the most out of your site’s images by optimizing them for both people and search engines.. For many sites, images play an important role in the communication process. If used correctly, images have the power to make your articles come alive with clarity and vibrancy. Some visitors may merely notice the image and continue reading, while others will want to know more about your images and dig deeper. While checking out your images, inquisitive guests will explore any clues available to them: alt tags, title tags, and captions, for example. Likewise, when […] Read more »

Yahoo! in my Blackhole

Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has anyone else experienced such unexpected behavior from one the four major search engines? Hmmm.. let’s dig a little further.. Here […] Read more »

Seven Ways to Beef Up Your Best Pages for the Next Google PR Update

Time is running out! Soon, it will be time for the next Google PageRank (PR) update. While it is difficult to predict how your site will perform overall, it seems likely that your highest ranking pages will continue to rank well. The idea behind this article is to improve your site’s overall pagerank by totally beefing up your most popular pages. Of course, every page on your site is important. Ideally, you would want to employ these techniques to every article on your site. But time is short, and Google is coming soon! The next PageRank update is slated for […] Read more »

Suspicious Behavior from Yahoo! Slurp Crawler

Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming the door on Yahoo! would be unwise, but if their Slurp crawler continues behaving suspiciously, I may have no choice. Check out the […] Read more »

SEO 101: Best Practices

After studying Peter Kent’s excellent book, Search Engine Optimization for Dummies, several key methods emerged for optimizing websites for the search engines. Although the book is written for people who are new to the world of search engine optimization (SEO), many of the principles presented throughout the book remain important, fundamental practices even for the most advanced SEO-wizards. This article divulges these very useful SEO practices and organizes them into manageable chunks 1. Text Essentials The golden rule for developing a popular website is to create a useful site and share it with as many people as possible. When designing […] Read more »

Smooth Operators: Sharpen your Google Search Skills

Coming soon to the World Wide Web: Everything. The perpetually evolving sum of human knowledge available online. Anywhere. Anytime. So, what are you looking for? Information concerning something, somewhere, about somebody.. You know it’s there somewhere. Sure, you could waste time by digging through that immense labyrinth of browser bookmarks, maybe eventually finding that one link that may or may not lead you to the page that you remember.. No thanks. The Web is far too rich in information to limit it with a few bookmarks. Ah yes, tags — that’s it! Social bookmarking to the rescue. Okay now, let’s […] Read more »

Latest Tweets Sale on printed copies of .htaccess made easy (with free shipping): htaccessbook.com/printed-htacc…