With the explosion of social media, networking, and bookmarking services, there are a zillion ways to add “Share This Post” functionality to your WordPress-powered sites. In addition to the myriad services and plugins, we can also add these links directly, using nothing more than a little markup and a few choice PHP snippets. Such individual links provide full control over the selection, layout, and styling of each link without requiring the installation of yet another WordPress plugin. This article shares […] Continue reading »
There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: Continue reading »
Hmmm.. Let’s see here. Google can do it. MSN/Live can do it. Even Ask can do it. So why oh why can’t Yahoo’s grubby Slurp crawler manage to adhere to robots.txt crawl directives? Just when I thought Yahoo! finally figured it out, I discover more Slurp tracks in my Blackhole trap for bad spiders: Continue reading »
I need your help! I am losing my mind trying to solve another baffling mystery. For the past three or four months, I have been recording many 404 Errors generated from msnbot, Yahoo-Slurp, and other spider crawls. These errors result from invalid requests for URLs containing query strings such as the following: https://example.com/press/page/2/?tag=spam https://example.com/press/page/3/?tag=code https://example.com/press/page/2/?tag=email https://example.com/press/page/2/?tag=xhtml https://example.com/press/page/4/?tag=notes https://example.com/press/page/2/?tag=flash https://example.com/press/page/2/?tag=links https://example.com/press/page/3/?tag=theme https://example.com/press/page/2/?tag=press Note: For these example URLs, I replaced my domain, perishablepress.com with the generic example.com. Turns out that listing the plain-text […] Continue reading »
Controlling the spidering, indexing and caching of your (X)HTML-based web pages is possible with meta robots directives such as these: <meta name="googlebot" content="index,archive,follow,noodp"/> <meta name="robots" content="all,index,follow"/> <meta name="msnbot" content="all,index,follow"/> I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots”1 crawl and represent my (X)HTML-based content in search results. For other, non-(X)HTML types of content, however, using meta robots directives to control indexing and caching is not an option. An […] Continue reading »
Recently, reader Luke Knowles asked how to customize the sort order of his posts in WordPress. Looking into a solution to this question proved quite enlightening. Within moments I was able to discern 4 methods for modifying post order, and then several days later I discovered 2 additional custom sorting techniques. After updating the reply to Luke’s comment, it seemed like some good information that other WordPressers may find useful. So, here are six ways to customize the sort order […] Continue reading »
In this article, my goal is to help you optimize WordPress by replacing a few common plugins with their correspondingly effective code equivalents. As we all know, WordPress can be a very resource-hungry piece of software, especially when running a million extraneous plugins. Often, many common plugins are designed to perform relatively simple tasks, such as redirect a feed, display a random image, query the database, etc. For those of us comfortable with editing PHP and htaccess code, there is […] Continue reading »
Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): Continue reading »
Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has […] Continue reading »
WordPress users employing permalinks via htaccess to optimize their dynamic URLs transform complicated-looking links such as: http://example.com/blog/index.php?page=33 ..into search-engine friendly links such as: http://example.com/blog/post-title/ Every rewritten URL relies on a common set of htaccess rules to transform the links. The htaccess rules for all WordPress permalinks look like this for root WP installations: # BEGIN WordPress <ifmodule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </ifmodule> # END WordPress ..and like this for […] Continue reading »
Recently, while deliberating an optimal method for eliminating nofollow link attributes from Perishable Press, I collected, installed, tested and reviewed every WordPress no-nofollow/dofollow plugin that I could find. In this article, I present a concise, current, and comprehensive reference for WordPress no-nofollow and dofollow plugins. Every attempt has been made to provide accurate, useful, and complete information for each of the plugins represented below. Further, as this subject is a newfound interest of mine, it is my intention to keep […] Continue reading »
During the most recent Perishable Press redesign, I noticed that several of my WordPress admin pages had been assigned significant levels of PageRank. Not good. After some investigation, I realized that my ancient robots.txt rules were insufficient in preventing Google from indexing various WordPress admin pages. Specifically, the following pages have been indexed and subsequently assigned PageRank: Continue reading »
Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming the door on […] Continue reading »
Web developers trying to control comment-spam, bandwidth-theft, and content-scraping must choose between two fundamentally different approaches: selectively deny target offenders (the “blacklist” method) or selectively allow desirable agents (the “opt-in”, or “whitelist” method). Currently popular according to various online forums and discussion boards is the blacklist method. The blacklist method requires the webmaster to create and maintain a working list of undesirable agents, usually blocking their access via htaccess or php. The downside of blacklisting is that it requires considerable […] Continue reading »
In our never-ending battle against spammers, leeches, scrapers, and other online undesirables, we have implemented several powerful security measures to improve the operational integrity of our perpetual virtual existence. Here is a rundown of the new behind-the-scenes security features of Perishable Press. Continue reading »
If you have yet to encounter the content-scraping site, bitacle.org, consider yourself lucky. The scum-sucking worm-holes at bitacle.org are well-known for literally, blatantly, and piggishly stealing blog content and using it for financial gains through advertising. While I am not here to discuss the legal, philosophical, or technical ramifications of illegal bitacle behavior, I am here to provide a few critical tools that will help stop bitacle from stealing your content. Continue reading »