Simple IP-Detection Bad for SEO
In general, Perishable Press enjoys generous ranking in Google’s search-engine results. The site’s many pages bring in lots of traffic for some great keywords, and a direct search for “Perishable Press” returns the first spot, with eight featured site links even. And recently, after switching servers, traffic increased even further. Things were going well, and it seemed like the perfect opportunity to finally renovate and redesign the site. So I dive in..
And then approximately 24-48 hours after beginning work on the new design, BAM – suddenly Google cuts my traffic by 75% and removes most of my pages from appearing in the search results. For example, the home page was not among the search results for “perishablepress.com
” – so obviously something bad had happened, and my long-standing, reputable website had been penalized by Google.
What happened
While designing the new site, I needed a way to detect IP address for any requests for the home page:
https://perishablepress.com/
During development, requests for that URL returned the root index.php
file with the following PHP logic:
<?php // IP-based WP-loading
if ($_SERVER['REMOTE_ADDR'] == '123.456.789.0') {
define('WP_USE_THEMES', true);
require_once("./perish/wp-blog-header.php");
} else {
define('WP_USE_THEMES', true);
require_once("./press/wp-blog-header.php");
} ?>
The new design is happening via second installation of WordPress in its own subdirectory, /perish/
. The previous site also exists in its own subdirectory, /press/
. So during development, I needed a way to load the new WordPress installation for my IP address, and the old WordPress installation for all other IPs. And that’s exactly what the above logic handles so elegantly.
Google don’t like it
After implementing the IP-detection, I continued site development and everything was working great, until about 24-48 hours later when I noticed that my pages were being excluded from the search results, seriously decreasing traffic from Google. Just prior to the traffic drop, only three significant changes were made to the site:
- Installed new subdirectory WordPress
- Setup IP-based loading of WordPress
- Removed a canonical redirect of
/press/
to root
The removal of the canonical redirect resulted in one page of duplicate content (on both /press/
and home page), but that wouldn’t be reason for such drastic measures from Google. After much scrambling to determine the issue, it became apparent that Google had detected the IP-detection script that I was using to conditionally load WordPress for the site’s home page.
I don’t have any solid evidence to support this, but my best guess is that Google somehow detected the script, disapproved, and penalized my site by dropping it from the search results. I would have contacted someone at Google to verify this, but apparently they are too big to be bothered with us humans.
Why Google hates them
Since discovering/reasoning all this, I’ve removed the IP-detection script and will continue with the redesign live & in real-time. While we wait and see whether or not that in fact resolves the issue, it is interesting to consider why Google penalizes something as simple as an IP-detection script. Here’s what Google Webmaster Central has to say about cloaking, sneaky Javascript redirects, and doorway pages:
Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
Although it doesn’t mention IP addresses, the take-home message seems to imply that any form of cloaking – via user-agent, IP, referrer, etc. – is strictly forbidden. I get the logic behind this policy, but a quick message in the Webmaster Tools dashboard would have been so absolutely helpful and time-saving.
Here is an example of a simple message that would have saved significant time, energy, and resources:
We have detected you detecting us. Please stop or we will shut you down. – Love, Google
Something as simple and automated as that would alleviate much stress:
- You can’t just “contact” Google and ask them what’s up
- You’d know why your pages no longer appear in the search results
- You’d know that Google requires action
- You’d have a good idea of how to resolve the issue
- You’d know that Google has the “shoot first, you deal with it” mentality
And so even better than an “oh-by-the-way” message would be Google sending notification before killing your site. Why not give people a chance to resolve potential issues before just sending in the terminators to wipe them out.
Lesson learned, moving on
Moral of the story: If you need to serve different content to different users, use something more stealth than a simple PHP script to make it happen. If Google even gets a whiff of anything it doesn’t approve, it will shut you down with absolutely zero notice.
40 responses to “Simple IP-Detection Bad for SEO”
Sorry to say, but I guess you’re wrong with this point. How should Google know your IP detection script? I guess you put just your own address in for development? They would never possibly know, then.
What might be possible (without lookng into the case further) is that you somehow made the new design available publicly – by accidently overwriting the sitemap.xml or something similar.
Really interesting case, though – would like to see more opinions on that matter.
I admit it’s all speculation, but since removing the IP-detection, my site is suddenly appearing in the search results, although without the sitelinks. Traffic is returning as well, albeit slowly, I think, because of the usual weekend slump.
While diagnosing the issue, I definitely checked the sitemap, htaccess, and a million other things, but only the cloaking made sense.
Good to see you’re back in the serps. Would be great to figure out what the issue really was – maybe I’m wrong and the IP thing did affect your rankings (though I can’t imagine that :))
Sorry Jeff, but Jan is right. There is absolutely no way Google could have known about your IP dection script (unless they have a sniffer on your local computer).
There must have been something else that Google reacted to. My guess is that Google somehow got to know about your second installation and noticed something was going on.
I have to agree with Jan.
If the redirection to a different site happened just for your IP, there’s no way Google would find out, unless they are employing Magic Unicorns or something like that.
Isn’t it because you used a subdomain (
/press/
) for the public site?It’s more likely that you experienced a temporary bump when Google changed their algorithm. Here’s one of the many stories about it: http://ser.bz/fC2SyL.
Thanks, I think that may have something to do with it. For all I know it was a combination of things, and that sounds like it affected similar sites.
Have to admit, I’m foxed by the idea of a bot’s client request being able to detect some your PHP. Perhaps they send a request as Googlebot, then send another request with a fake user agent header, and compare the two?
Yes, and it would be trivial for them to also test IP cloaking using proxy IP address(es). But as others have made clear, there would be no way for Google to view the content from my IP address. Sites that are intentionally detecting Google IPs for cloaked content are probably at risk.
I would ask Matt Cutts.
Good idea ;)
Weird that you didn’t get a message in Webmaster Tools (https://www.google.com/webmasters/tools/). In all honesty, I have found that Matt Cutts will listen to your question if you ask him, seriously. If you follow his blog you will notice that he sometimes will even answer a Google Spam question in his blog comments. There are some forums (http://www.google.com/support/forum/p/Webmasters) as well that I believe have Googler’s monitoring where this question might have been answered.
I feel all of us have complained about Google not answering us directly or having a place to call at one time or another, but they are doing a pretty good job.
I like the new design, what made you want to go away from the dark, dark theme?
Yeah, I’m not trying to snub Google or anything, it’s just that Google could help users by letting them know when algorithm changes, penalties, or anything may affect their sites. Webmaster tools would be the perfect place to receive valuable information like that.
I haven’t asked Matt Cutts, but some of the zillionz of forum threads on this topic provided some useful clues.
I’m with Jan on this one. The PHP is consumed on the server, so Google doesn’t see it. What they DO see is served content… ALL the served content. If they find duplicate content, that may be enough to trigger a set of punitive actions on their part. Remember, in spite of your very reasonable reasons for duplicating your content, even temporarily, it may match, in Google’s eyes, the behaviour of a reprobate site.
The moral of the story is walk softly and carry a big stick. Keep a close eye on your analytics, especially when making fundamental changes to your site.
BTW, any idea of what effect your changes have had on other search engines?
Absolutely well said. I’m thinking now that this whole thing is the cumulative result of 1) changing host 2) changing structure 3) recent downtime 4) algorithm changes.
I seemed to be getting the usual amount of traffic from the other search engines, Bing e al. Fortunately they’re not as hyper-sensitive as big G.
One more thing: if you want special treatment by your website, I set a cookie. For instance, I have log messages strewn throughout my code. This is great for development, but once the site goes live the logfile can quickly grow massive. And forget about finding a specific action when the file is being hammered by hundreds (or even dozens) of requests. So I set a cookie on my machine, and that turns on logging; otherwise, it’s off. You could use the same technique to select the alternative directory.
Also, set robots.txt to ignore your development directory. (OK, so that was two more things…)
Jeff
I feel your pain. Did you ever consider using a port other than 80 for your dev site? A port that only you would know?
Just a thought.
Good luck, I’m sure it’ll all work out.
And a good one. I’ll keep it in mind for next time.
So far so good..
That comes across like a quote from Miller in ‘Green Zone’
Just kidding ;)
Feck. Needs to RTFM.
Like I said on the Twitters, moving servers/hosts tends to do things to Google results temporarily. This post is anecdotal evidence and it was likely just a coincidence that the timings seemed to work out.
Google detects cloaking by visiting your site with its usual bot user agent and then visiting it with a non-bot user agent. If the content differs then it’s obvious you’re different things to bots in order to beef your Google rankings.
PHP scripts are server-side. The only way for Google to know about your IP script is if they got your PHP served to them without being run. For that to happen either your server would have had to completely fuck things up or Google would have had to hack into your server.
Not very likely in either case. :P
Agreed. If the IP you used was your home IP and it was limited to that, and assuming the script didn’t leak something to google (which it doesn’t look like it is), there’s no way for Google to detect this.
I’ve done similar tricks with my home page and with extremely large sites with very high google page ranks and never had any problems.
Register for a google webmaster account and check it for any warnings/errors that they’ve found while crawling your site. You can also submit a site map and submit your pages for re-indexing via the webmaster account. It’s really handy.
Thanks, I did check with Webmaster Tools, and it looks like some recent network outages prevented Google from accessing a large portion of my site. That happened about three days prior to the traffic drop, so it looks like it may be related.
Brando, yes that is the most likely explanation for all of this. Lots of changes sort of coalescing into a penalty from Google. The timing of the traffic flow with the IP-detect script was pretty freaky: I installed it, traffic dropped; I removed it, traffic returned. But thinking about it further its obvious that Google would have no way of detecting this.
Dude, you need to take this new thing offline until it’s sorted.
No way dude. No. Way.
I’m doing this LIVE now – it’s ON.
Right on, I’m all about no turning back.
Dude!
It’s not about the size of the dog in the fight; it’s about the size of the fight in the dog.
Rock on, Jeff!