Stop WordPress from Leaking PageRank to Admin Pages
During the most recent Perishable Press redesign, I noticed that several of my WordPress admin pages had been assigned significant levels of PageRank. Not good. After some investigation, I realized that my ancient robots.txt
rules were insufficient in preventing Google from indexing various WordPress admin pages. Specifically, the following pages have been indexed and subsequently assigned PageRank:
- WP Admin Login Page
- WP Lost Password Page
- WP Registration Page
- WP Admin Dashboard
Needless to say, it is important to stop WordPress from leaking PageRank to admin pages. Instead of wasting our hard-earned link-equity on non-ranking pages, let’s redirect it to more important pages and posts. In order to accomplish this, we will attack the problem on three different fronts: admin links, robots.txt rules, and meta tags. Let’s take a quick look at each of these three methods..
1. Eliminate sitewide links to your admin pages
Even better, bookmark your login page and eliminate all links to your admin pages. Unless you are actively encouraging people to register with your blog, linking to the login/registration/password page is pointless. If you must link to your admin pages, consolidate them into one location by using if_home()
or something similar. The goal here is to eliminate sitewide links to your admin pages. Not only will it help stop the PR leakage, it will simplify your site as well. To add login and register links to your home page only, customize and insert the following code into your sidebar or other target location:
<?php if (is_home()) { ?>
<h3>Administration</h3>
<ul><?php wp_register(); ?>
<li><?php wp_loginout(); ?></li>
</ul>
<?php } ?>
2. Disallow search engines from your admin pages via robots.txt rules
Disallowing search engines from crawling areas that do not need indexing is a great way to conserve link equity and redistribute it to more critical parts of your site. Although Google seems to obey robots.txt
rules, I am still unconvinced that the other search engines follow suit. Nonetheless, we are focusing on Google here, and if the other major engines play along, then more power to us. Just keep in mind that robots.txt
rules currently exist as more of an ideal, standardized method of controlling crawl behavior. Adding rules may be a good thing to do, but it is far from a complete solution. To formally disallow Google and all other (obedient) search engines from accessing any behind-the-scenes admin pages, add these rules to your site’s robots.txt
file:
User-agent: *
Disallow: */wp-admin/*
Disallow: */wp-login.php
Disallow: */wp-register.php
3. Add noindex, nofollow meta tags to your admin pages
Perhaps the best way to prevent search engines from crawling your admin pages is to explicitly mark them with noindex, nofollow
meta robots tags. In my experience, while only a few search engines obey rules specified via robots.txt
, all of the four major search engines (Ask, Google, Live, and Yahoo!) seem to obey rules specified via meta tags. Further, while robots.txt
rules disallow crawling in general, meta rules allow differentiation between crawling, indexing, archiving, caching, and much more. For our admin pages, we want to forbid all search engine activity — no crawling or indexing allowed. Implementing such meta tags throughout your admin area involves three different WordPress files and requires six insertions of the following code:
<meta name="googlebot" content="noindex,noarchive,nofollow" />
<meta name="msnbot" content="noindex,nofollow" />
<meta name="robots" content="noindex,nofollow" />
The previous set of meta
tags explicitly forbids Google, MSN, and all other search engines from crawling and indexing the page. We are going to copy & paste these tags into each of the head
elements located in the following files:
/wp-admin/admin-header.php
(1x)/wp-login.php
(2x)/wp-register.php
(3x)
In each of the previous files, locate each instance of “<head>
”. Copy & paste the three meta elements (provided above) somewhere within each of the head
elements. For example, place the tags immediately after the <title>
element. Upon completion, you will have added a total of six (6) sets of meta
tags, according to the numbers specified in the list of files presented above. After uploading your files, check their source code in a browser to verify the tags have been added correctly.
If using all three of these methods seems like overkill, using only the third method should be sufficient. I currently employ the second two techniques, and plan on removing sitewide admin links during the next redesign. Of course, if you really want to lock down your admin pages, nothing works better than htaccess, but we’ll save that for another article.. ;)
15 responses to “Stop WordPress from Leaking PageRank to Admin Pages”
I heard that page rank doesn’t leak… I should read more about it though, because it wouldn’t leak only to your admin pages. It would be leaking everywhere I think, and then cause people to be very stingy with their links. Maybe you can answer this question for me.. I have a few sites in WordPress that get Search Engine hits for terms that are in the sidebar, like as part of a plugin and have no real value as the people just get mislead by them. Is there a way to just have google index the pertinent data, like the posts and comments? I have experimented with several ways before but I am not satisfied. Also, if you have a related posts plugin, how to stop that from being indexed, Because Google is sending people to the wrong pages on my site! Cheers!
~ John
Oooh, I just love the heavier comments!
As far as I know, when people talk about “leaking” pagerank, they are referring to pagerank that is wasted on pages that provide no benefits from an SEO perspective. WordPress admin pages are a great example, because they are meant to be “private” and thus should not be indexed by the search engines. Any links that transfer equity to admin pages are effectively throwing it away.
As for preventing search engines from indexing certain portions of a site, there are several ways that I can think of just off the top of my head (i.e., without taking the time to search around, etc.). If you were serious about hiding say, the sidebar and footer, you could use JavaScript to write the content directly to the browser. Search engines do not process JavaScript, and thus only target areas would get indexed. A similar method would employ Flash content to serve the non-indexed regions of the site. As with JavaScript, search engines do not read Flash content, which is itself highly functional. A third option would be to summon some crafty frame or iframe trickery to get the job done. From what I’ve read, frames and iframes are not considered to be part of your site by search engines. Thus, with some careful planning, using frames may prove useful.
Keep in mind, however, that with most of these methods, there are some serious usability/accessibility issues involved. For example, visitors without JavaScript, Flash, or frame-support would be left in the dark as to your sidebar and footer content. On the other hand, depending on your primary audience, a majority of visitors may be assumed to have such functionality. Of course, if you were absolutely determined about serving only select regions of content without all of the giant usability concerns, you could probably rig up some hardcore PHP and htaccess voodoo to selectively deploy various server-side includes based on whether or not the client happened to be a search engine bot. Then you get into the whole realm of forking, cloaking, and the “dark side” of SEO..
And finally, if I understand correctly your question about the Related Posts plugin, it seems that adding
nofollow
attributes to the link markup generated by the plugin’s PHP would definitely help. Nofollowing the related links would prevent Google from following the link, however the pages may remain in the index. If you are certain that you do not want the related pages indexed, you would need to deploy some well-targetedmeta
tags such as the following:<meta name="googlebot" content="noindex,noarchive,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="msnbot" content="noindex,follow" />
These tags instruct the search engines to follow but not index the pages on which they are located. I use such tags on several types of pages here at Perishable Press.
Cheers,
Jeff
Thing is, I want them followed because they are useful in creating internal links to other pages on the site and boosting them a bit. I just don’t want google looking there or in the sidebar and mistaking that for the important content of the site. I want them to notice the links and spread the PR wealth to those pages though…
Hmm..I thought that Google’s algorithm would already know this problem and simply ignore the admin pages. I mean, there are millions of WordPress users out there so Google must have set their spiders/bots to recognize and simple ignore these pages.
Anyways, I’ll be adding the robots.txt file onto my site.
Thank you for the code. I very much appreciate it.
Yes, Google is very good at not indexing admin pages, especially for WordPress, however, thousands of such pages have indeed made their way into the primary index. For example, try the following search:
inurl:wp-login Lost your password register back remember username
..which is a search for all of the terms present on a typical (default) WordPress login page, along with a filter to return only results that actually have “wp-login” in the URL, thereby limiting our search to only WordPress-powered sites. As you can see, Google is smart, but that doesn’t mean we shouldn’t keep an eye on things..
I’m aware of this but never thought it would be quite an issue.
Lisa, i thought the same way too. With so many WP users, The big G should have set it on their algorithm.
Anyway, by default my WP theme used the no. 1 tip (eliminating sitewide link) which i wasn’t aware until i read this post.
Thanks for the tip, I’ll be adding the ‘no index’ and ‘nofollow’ on the admin pages.
My pleasure — glad to be of service :)
Thank you for your article but I have trouble with your instructions,
/wp-login.php (2x)
Where abouts do you put the 2nd lot of tags?
/wp-register.php (3x)
The file looks like this:
Where abouts do I put the nofollow tags?
Thanks.
Hi Ed,
Open each file, and do a search/find for the phrase, “
<head>
”. If you are running a typical installation of WordPress, you will find 2 instances of<head>
in thewp-login.php
file and 3 instances in thewp-register.php
file. Once you have located all of the document<head>
s, proceed to follow the steps in the article.I hope that helps!
Nice summary. I should add, however, that if your admin pages *are* already in the Google results, you need to get them out before you stop the Googlebot going there any longer. If you make all the links to these pages NOFOLLOW, the Googlebot may never get to the pages to find that you’ve now NOINDEXed them. Best to set the NOINDEX tags (step 3 above) first, and let nature take its course, before removing all the links. You could also speed the removal process up by using Google Webmaster Tools.
Excellent points, Chris — thanks for sharing them. After using the methods described in the article, Google eventually cleared the index of my admin pages, however it did take quite some time (about six months). Certainly, following your advice — especially concerning the Google Webmaster Tools — would have greatly facilitated (i.e., sped up) the process.
As an update, the “Remove URLs” feature in Google Webmaster Tools took my wp-admin pages out of the Google results within 24 hours, which was cool. Now that’s proved successful (and only now) have I made sure they don’t get crawled again using robots.txt (step 2 above).