Jeff Morris recently demonstrated a potential issue with the way WordPress handles multipaged posts and comments. The issue involves WordPress’ inability to discern between multipaged posts and comments that actually exist and those that do not. By redirecting requests for nonexistent numbered pages to the original post, WordPress creates an infinite amount of duplicate content for your site. In this article, we explain the issue, discuss the implications, and provide an easy, working solution.
Understanding the “infinite duplicate content” issue
Using the <!--nextpage--> tag, WordPress makes it easy to split your post content into multiple pages, and also makes it easy to paginate the display of your comment threads. For both paged posts and paged comments, WordPress appends the page number to the permalink. So for example, if we have a post split into 3 pages, WordPress will generate the following set of completely valid permalinks (based on name-only permalink structure):
In my recent guest post at The Nexus, I discuss Google’s new nofollow policy and suggest several ways to deal with it. In that article, I explain how Google allegedly has changed the way it deals with nofollow links. Instead of transferring leftover nofollow juice to remaining dofollow links as they always have, Google now pours all that wonderful nofollow juice right down the drain. This shift in policy comes as a terrible surprise to many webmasters and SEO gurus, especially those who have invested vast amounts of time, effort and money engaging in supposedly lucrative PR-sculpting pursuits.
Of course, this new policy leaves many of us wondering how to deal with it. If (and it remains a big “if” until Google clarifies their position) — if nofollow link equity simply vanishes into the ether, the repercussions may be significant. For example, webmasters who now rely on nofollow to salvage link juice otherwise leaked through lengthy comment threads will need to devise another strategy or suffer an inevitable loss of valuable PageRank. There are many good strategies available, including everything from long-term reorganization of site structure to short-term fixes involving much-despised tricks such as iframes and JavaScript links. Personally, I wouldn’t touch iframes with a ten-foot pole, but in the case of an emergency, I certainly would take a look at using external JavaScript to get the job done.
One way to prevent Google from crawling certain pages is to use <meta> elements in the <head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document:
I’m no SEO guru, but it is my general understanding that it is possible to manipulate the flow of page rank throughout a site through strategic implementation of <meta> directives.
In addition to your choice collection of “Share This” links, you may also want to provide visitors with a link that enables them to quickly and easily send the URL permalink of any post to their friends via email. This is a great way to increase your readership and further your influence. Just copy & paste the following code into the desired location in your page template:
<a href="mailto:?subject=Fresh%20Linkage%20@%20Perishable%20Press&body=Check%20out%20<?php the_permalink(); ?>%20from%20Perishable%20Press" title="Send a link to this post via email" rel="nofollow">Share this post via email</a>
Within the code, you will need to edit both instances of the string “Perishable%20Press” to reflect your own site name. Note that the “%20” is the encoded equivalent of a blank space, and is required to ensure validation of parameterized query strings. As is, the code will generate an email that is populated with the following information:
Ever wanted to provide automatic language translations of your web pages without installing another plugin? Here is a valid, SEO-friendly technique that takes advantage of Google’s free translation service. All you need is a PHP-enabled server and you’re good to go. Just copy and paste the following code into the desired location in your page template and enjoy the results. Once in place, this code will produce translation links for eight common languages for every page on your site. Grab, gulp and go:
With the explosion of social media, networking, and bookmarking services, there are a zillion ways to add “Share This Post” functionality to your WordPress-powered sites. In addition to the myriad services and plugins, we can also add these links directly, using nothing more than a little markup and a few choice PHP snippets. Such individual links provide full control over the selection, layout, and styling of each link without requiring the installation of yet another WordPress plugin.
This article shares SEO-friendly code snippets for ten of the most popular social media sites using completely valid XHMTL-Strict markup. All of the following code snippets feature:
Aaron Wall on SEO, the future of the Web, Google dominance, and life as a professional taste tester
As someone who keeps a close eye on the mystical world of Search Engine Optimization, one of my favorite sources of information is SEO-guru Aaron Wall. Aaron is the author of the immensely popular SEOBook.com, where he shares his knowledge, ideas, and opinions on a wide range of SEO-related topics. I have always admired the direct, informative way in which Aaron presents his content, which itself is always insightful and intriguing. Having read much of Aaron’s thoughts on SEO and marketing, I wanted to “zoom out” and ask Aaron a few questions about the possible future of SEO and life on the Web in general. Recently, Aaron was generous enough to respond to some of these rather eclectic questions, including some interesting “behind-the-scenes” questions revealing how Aaron works on the Web..
I use these directives here at Perishable Press and they continue to serve me well for controlling how the “big bots” 1 crawl and represent my (X)HTML-based content in search results.
For other, non-(X)HTML types of content, however, using metarobots directives to control indexing and caching is not an option. An excellent example of this involves directing Google to index and cache PDF documents. The last time I checked, meta tags can’t be added to PDFs, Word documents, Excel documents, text files, and other non-(X)HTML-based content. The solution, of course, is to take advantage of the relatively new 2 HTTP header, X-Robots-Tag.
When I first began Perishable Press two years ago, in August of 2005, WordPress quickly became my blogging platform of choice. Everything about WordPress was great, so I had no trouble overlooking a few seemingly insignificant quirks, such as the nofollow attributes that are automatically applied to all comment links. In fact, at first, I really had no idea what they were or how they affected my site.
Eventually, as I began delving deeper into the Blogosphere, I realized that those harmless-looking nofollow tags were considered by many to be detrimental to the livelihood of the blogging community and its way of life. The arguments against nofollow and the reasoning behind the “no nofollow” movement resonated well with my sense of social equity on the Internet.
The more I looked into the nofollow issue, the more opposed I became to the idea of default WordPress installations generating nofollow links by default. In fact, after arming myself with as much information as possible, I made haste to jump on the anti-nofollow bandwagon and publicly regurgitated the arguments against the implementation of nofollow links.
During the most recent Perishable Press redesign, I noticed that several of my WordPress admin pages had been assigned significant levels of PageRank. Not good. After some investigation, I realized that my ancient robots.txt rules were insufficient in preventing Google from indexing various WordPress admin pages. Specifically, the following pages have been indexed and subsequently assigned PageRank:
WP Admin Login Page
WP Lost Password Page
WP Registration Page
WP Admin Dashboard
Needless to say, it is important to stop WordPress from leaking PageRank to admin pages. Instead of wasting our hard-earned link-equity on non-ranking pages, let’s redirect it to more important pages and posts. In order to accomplish this, we will attack the problem on three different fronts: admin links, robots.txt rules, and meta tags. Let’s take a look at each of these methods..
Time is running out! Soon, it will be time for the next Google PageRank (PR) update. While it is difficult to predict how your site will perform overall, it seems likely that your highest ranking pages will continue to rank well. The idea behind this article is to improve your site’s overall pagerank by totally beefing up your most popular pages.
Of course, every page on your site is important. Ideally, you would want to employ these techniques to every article on your site. But time is short, and Google is coming soon! The next PageRank update is slated for any day now, probably before I manage to post this article. ;) Thus, our strategy is to focus on pages that already have some Google juice flowing to them. Your most popular articles. Your best-ranked pages. Your top ten posts.
After studying Peter Kent’s excellent book, Search Engine Optimization for Dummies, several key methods emerged for optimizing websites for the search engines. Although the book is written for people who are new to the world of search engine optimization (SEO), many of the principles presented throughout the book remain important, fundamental practices even for the most advanced SEO-wizards. This article divulges these very useful SEO practices and organizes them into manageable chunks 1.
Text Essentials
The golden rule for developing a popular website is to create a useful site and share it with as many people as possible. When designing a site for search engine popularity, use clear, readable text. Replace non-standard text characters with standard equivalents. By all means, check the spelling, grammar, and syntax of your text manually, or at the very least, using an automatic spell-checker. If you are targeting the giant Google search engines, your design mantra should be, “black text on white background” — that is, keep it simple, straightforward, and focus on quality content. And finally, never use image-based text in place of searchable, text-based content.
In his excellent book, Search Engine Optimization for Dummies, Peter Kent explains that many search engines actually get their search results from one (or more) of the larger search engines, such as Google or The Open Directory Project. Therefore, the author concludes that it may not be necessary to spend endless hours registering with thousands of the smaller search sites. Rather, the author provides a brief list of absolutely essential search sites with which it is highly recommended to register. Further, by registering with the following sites, your site will be listed in a significant majority of all search engines.
Optimizing your website for the search engines involves many important aspects including keyword development, search engine registration, and SEO logging. This Press post scopes yet another critical weapon in the SEO wars: establishing and evolving an effective link campaign.
Within your SEO log, you should devote an entire section to the logging of all link-related activity associated with optimizing your site for the internet. For example, you may wish to subcategorize your link campaign according to whether the links are elsewhere, pointing to your site (referring/incoming links), or present within your site, pointing to other sites (external/outgoing links). Users may also benefit from tracking activity for internal links, which point to other locations within the same domain.
We will begin our article by focusing on incoming and outgoing link strategies, proceed with a few tips for internal links, and then conclude with some ideas for getting links.
If you have yet to encounter the content-scraping site, bitacle.org, consider yourself lucky. The scum-sucking worm-holes at bitacle.org are well-known for literally, blatantly, and piggishly stealing blog content and using it for financial gains through advertising. While I am not here to discuss the legal, philosophical, or technical ramifications of illegal bitacle behavior, I am here to provide a few critical tools that will help stop bitacle from stealing your content.
The htaccess Finger
Perhaps the most straightforward and effective method for keeping the bitacle thieves away from your site, adding the following htaccess rules to your root htaccess file will literally block bitacle’s IP address and return a 403 Forbidden message (for more information on htaccess files, see our article, Stupid htaccess Tricks, referenced below). Add this to your site’s root htaccess file:
Next up, another effective anti-bitacle method that instructs the bitacle bots to stay away from your site. This method uses a robots.txt file in your site’s root directory and literally denies bitacle agents crawl-access to all site contents. Simply add the following lines to your site’s root robots.txt file (for more information on robots.txt, see our article, Robots Notes Plus, referenced below):
For more help on the anti-plagiarism front, check out Redalt’s Antileech Plugin and MaxPower’s Digital Fingerprint Plugin. These fine WordPress plugins come highly recommended and are definitely worth checking out.
Other Essential Tools
Beyond the essential preventative methods discussed above, there are many other resources and tools now available for dealing with site scrapers, content thieves, and other worthless garbage. A worthwhile website is Copyscape, which provides an excellent tool that enables users to search the web for stolen content. If you find that your content has indeed been plagiarized, read up on how to respond properly and effectively. Finally, try searching for various search terms, such as "plagiarism tools", "content scraping", "copyright protection", "syndication theft", etc. Good Luck!
Welcome to Perishable Press! This article covers a plethora of search-engine optimization resources. For more excellent SEO information, check out the Optimization category archive. If you like what you see, I encourage you to subscribe to Perishable Press for a periodic dose of online enlightenment ;)
Search engine optimization (SEO) is the business of every serious webmaster. The process of optimizing a website for the search engines involves much more than properly constructed document headers and anchor tags. Websites are like trees: their roots are the growing collection of content presented through the branching universe of the World Wide Web. Or something. The point is that optimizing a website requires nurturing the site itself while also ensuring proper exposure to the requisite elements of the internet.
The process of optimizing your first website may seem daunting. There are many aspects to consider and many websites with which to deal. Search engine registration, keyword development, and an evolving link campaign are all required for any home-grown, roll-your-own website optimization. Further, for each site you intend to optimize, there is a plethora of related data — site links, usernames, passwords, email addresses, etc. — that needs to be collected, organized, and updated. Therefore, it is essential to properly record and consistently maintain a carefully crafted SEO log.
Keywords play a vital role in search engine optimization (SEO), and — if used properly — have the potential to increase the flow of traffic to your site. It is beneficial to maintain an active list of keywords for each of your websites. Each list should be a continually evolving set of important, relevant keywords. The idea here is to develop a consistent practice of actively seeking better keywords, thereby producing your very own customized keyword library.
Perishable Press vehemently opposes The great corporate/commercial campaign to implement the rel="nofollow" anchor. The proposal suggests that use of nofollow will reduce spam and improve search engine results.
This couldn’t be further from the truth, regardless of what the commercial giant$ may tell you.
Examine these helpful references and see for yourself: