27 posts

Wutsearch Update v1.8

Pleased to announce a new update for my pet project, Wutsearch, a search launchpad that puts all of the best search engines on one page. So you can easily search your favorite engines with a few clicks. I use Wutsearch as my personal homepage on my local devices. It is lightweight, fully customizable, with no ads and is free forever. This year celebrates Wutsearch’s 5th year in service. Continue reading »

Wutsearch: Search Engine Launchpad

Recently launched a simple one-page website called Wutsearch. Wutsearch is a search-engine launchpad that I use for my homepage. Usually I work with several browsers open, each set to open multiple tabs on startup. For a long time, all those tabs were set to the Google homepage. Then DuckDuckGo. Now, it’s Wutsearch 🚀 Continue reading »

Cleaning Up Google Search Results

This post is about how I cleaned up an incorrect URL in the Google search results. My business site is basically a one-page portfolio site, located at the URL https://monzillamedia.com/. But in the Google search results, the URL was showing as https://monzilla.biz/, which did not exist. So all potential customers were getting an error page. Fortunately I was able to re-acquire the monzilla.biz domain and redirect all traffic to monzillamedia.com. Continue reading »

How to Download Entire Websites on Mac

Concept One These days in this crazy world it makes sense to archive locally any critical online data. That way when the Internet is not working (for whatever reason), you still have access to your important infos and data. For those who are listening and interested in being prepared, here is the quickest, easiest way that I have been able to find to archive complete offline copies of websites to your Mac (or iOS). It makes use of a free […] Continue reading »

What Chrome Predictive URLs Look Like on the Server

Awhile ago, I was confused by repetitive 404 “Not Found” errors in my server logs. The 404 requests look like someone is typing out various words, a few letters at a time. This post shows what these weird 404s look like from the server’s perspective, and then goes on to explain why they happen and why there is no practical way of preventing them. Continue reading »

List of All User Agents for Top Search Engines

Here is a working list of all user agents for the major, top search engines. I use this information frequently for my plugins such as Blackhole for Bad Bots and BBQ Pro, so I figured it would be useful to post the information online for the benefit of others. Having the user agents for these popular bots all in one place helps to streamline my development process. Each search engine includes references and a regex pattern to match all known […] Continue reading »

Tell Google NOT to Index Certain Parts of Your Web Pages

There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This […] Continue reading »

SEO Experiment: Let Google Sort it Out

One way to prevent Google from crawling certain pages is to use <meta /> elements in the <head></head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: Continue reading »

Yahoo! Slurp too Stupid to be a Robot

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our sites every minute of the day. For the most part, there are effective methods available enabling […] Continue reading »

Custom OpenSearch for Your Website

I recently added OpenSearch functionality to Perishable Press. Now, OpenSearch-enabled browsers such as Firefox and IE 7 alert users with the option to customize their browser’s built-in search feature with an exclusive OpenSearch-powered search option for Perishable Press. The autodiscovery feature of supportive browsers detects the custom search protocol and enables users to easily add it to their collection of readily available site-specific search options. Now, users may search the entire Perishable Press domain with the click of a button. […] Continue reading »

Yahoo! Lies about Obeying Robots.txt Directives

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: Continue reading »

Yahoo! Once Again Caught Disobeying Robots.txt Rules

Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders: Continue reading »

Choosing the Best Title Separators

While writing my previous article on creating the perfect WordPress title tags, I deliberately avoided discussing the use of separators in titles. I feel that the topic is worthy of its own article, enabling a more thorough exploration of the details. Title separators are the symbols, punctuation, and other characters used to distinguish between various parts of the page title. For example, a title may include the blog name, post title and blog description, with each element separated by a […] Continue reading »

Yahoo Incongruities.

When frustration builds, and finally reaches its the boiling point, it’s nice to be able to express yourself to someone. Although I really don’t enjoy ranting about things, but when it comes to certain aspects of Yahoo!, I just can’t he’p myse’f. So, thanks to recent attempt at using My Yahoo!, it’s time to get some of this off my chest, clear the decks, and give Yahoo! (yet another) chance to clean up its act. Here are a few complaints […] Continue reading »

Taking Advantage of the X-Robots Tag

Controlling the spidering, indexing and caching of your (X)HTML-based web pages is possible with meta robots directives such as these: <meta name="googlebot" content="index,archive,follow,noodp"/> <meta name="robots" content="all,index,follow"/> <meta name="msnbot" content="all,index,follow"/> I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots”1 crawl and represent my (X)HTML-based content in search results. For other, non-(X)HTML types of content, however, using meta robots directives to control indexing and caching is not an option. An […] Continue reading »

Yahoo! Slurp in My Blackhole (Yet Again)

Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): Continue reading »

12 • Previous Posts »