Humans.txt Template

Years ago, I thought the whole humans.txt thing was just silly, and even explained how to block humans.txt requests. But the concept actually has grown on me to the point where I now include a customized humans.txt file for most of my projects. It just seems like some useful information to make available for those who are looking for it. You know, all about the site, author, team, and such. And I have seen plenty of requests for the humans file in my log files, so it’s definitely worth the effort and something worth providing, especially now that more people […] Read more »

What Chrome Predictive URLs Look Like on the Server

Awhile ago, I was confused by repetitive 404 “Not Found” errors in my server logs. The 404 requests look like someone is typing out various words, a few letters at a time. This post shows what these weird 404s look like from the server’s perspective, and then goes on to explain why they happen and why there is no practical way of preventing them. Read more »

Example of a Spoofed Search Engine Bot

While solving the recent search engine spoofing mystery, I came across two excellent examples of spoofed search engine bots. This article uses the examples to explain how to identify any questionable bots hitting your site. Read more »

List of All User Agents for Top Search Engines

Here is a working list of all user agents for the top search engines. I use this information frequently for my plugins such as Blackhole for Bad Bots and BBQ Pro, so I figured it would be useful to post the information online for the benefit of others. Having the user agents for these popular bots all in one place helps to streamline my development process. Each search engine includes references and a regex pattern to match all known user agents. Read more »

Add Google+ Share Button to Any Site

g+ Share button Word on the streets is that the new Google+ Share button is the best way yet to benefit from Google’s myriad social-media services and all-important search-engine. And Google makes it SO easy to add the new Share button to your website. This article explains what it is, where it fits in with all the other social-Google stuff, and of course how to add the g+ Share button to any site. Read more »

Multiple Sitemaps

Yes you can have multiple sitemaps for your site. Create the sitemaps you need, and then specify them in your robots.txt file. For example, here are the robots.txt directives for the two sitemaps used here at Perishable Press: Read more »

Hacked by Google?

The setup: I recently launched a new plugin that included a Demo page. To keep things flexible, I set up the Demo as a page on my experimental “Labs” WordPress installation, which is entirely nofollow, noindex and noarchive, meaning that Google can’t legitimately see what’s there. The story: So I launch my plugin and the traffic starts rolling in and some of it goes to the Demo page, as planned. Everything was going fine for a number of hours – people were checking out the Demo, submitting sample posts into the ether, and all was well. There were no tricks, […] Read more »

Clean Up Malicious Links with HTAccess

I recently spent some time analyzing Perishable Press pages as they appear in the search results for Google, Bing, et al. Google Webmaster Tools provides a wealth of information about crawl errors, as well as the URLs of any pages that link to missing content. Combined with your site’s access/error logs, you have everything needed to track down 404 errors and clean up your listings in the search engine results. Read more »

Better Robots.txt Rules for WordPress

Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file. I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt. Read more »

Yoast WP SEO vs All in One SEO

Update (2013/08/26): This article was written in 2011 and both plugins have changed quite a bit. Yoast SEO pretty much covers the entire spectrum, while All in One SEO covers more than it has in the past. So either plugin could be great depending on your site’s specific SEO needs, but if you’re just looking for a solid way to implement the SEO basics, check out DigWP.com for a lightweight DIY Alternative to WordPress SEO Plugins. While setting things up here at the new site (new WP install), I’m trying to keep the custom functions and plugins down to a […] Read more »

Simple IP-Detection Bad for SEO

In general, Perishable Press enjoys generous ranking in Google’s search-engine results. The site’s many pages bring in lots of traffic for some great keywords, and a direct search for “Perishable Press” returns the first spot, with eight featured site links even. And recently, after switching servers, traffic increased even further. Things were going well, and it seemed like the perfect opportunity to finally renovate and redesign the site. So I dive in.. And then approximately 24-48 hours after beginning work on the new design, BAM – suddenly Google cuts my traffic by 75% and removes most of my pages from […] Read more »

Fixing WordPress Infinite Duplicate Content Issue

Jeff Morris recently demonstrated a potential issue with the way WordPress handles multipaged posts and comments. The issue involves WordPress’ inability to discern between multipaged posts and comments that actually exist and those that do not. By redirecting requests for nonexistent numbered pages to the original post, WordPress creates an infinite amount of duplicate content for your site. In this article, we explain the issue, discuss the implications, and provide an easy, working solution. Understanding the “infinite duplicate content” issue Using the <!–nextpage–> tag, WordPress makes it easy to split your post content into multiple pages, and also makes it […] Read more »

Tell Google to Not Index Certain Parts of Your Page

There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This level of control over which pages are crawled and indexed is helpful, but what if you need to control how […] Read more »

Dynamic Link Insertion via Unobtrusive External JavaScript

In my recent guest post at The Nexus, I discuss Google’s new nofollow policy (404 link removed 2013/02/08) and suggest several ways to deal with it. In that article, I explain how Google allegedly has changed the way it deals with nofollow links. Instead of transferring leftover nofollow juice to remaining dofollow links as they always have, Google now pours all that wonderful nofollow juice right down the drain. This shift in policy comes as a terrible surprise to many webmasters and SEO gurus, especially those who have invested vast amounts of time, effort and money engaging in supposedly lucrative […] Read more »

Dealing with Google’s New Nofollow Policy

Note: This article was originally posted at a domain that’s unfortunately turned to the dark side.. so the post itself is no longer available in its original location, so it’s been reposted here for reference purposes.</note> Anyone plugged into the Web these days has heard about how Google has supposedly changed the way it deals with nofollow attributes. According to a number of speculative reports, Google will no longer apply unused nofollow PageRank to other links on the page. So, let’s say that you have some sites that have been PageRank “sculpted” by way of strategically applied nofollow tags. For […] Read more »

SEO Experiment: Let Google Sort it Out

One way to prevent Google from crawling certain pages is to use <meta /> elements in the <head></head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: <meta name=”googlebot” content=”noindex,noarchive” /> I’m no SEO guru, but it is my general understanding that it is possible to manipulate the flow of page rank throughout a site through strategic implementation of <meta /> directives. Read more »

Latest Tweets Block nuisance requests for .well-known, apple-app-, etc. perishablepress.com/block-requ… #security #htaccess pic.twitter.com/dVCb0zMJZH