By now most have heard about the WP Sitemaps feature introduced in WordPress version 5.5. From what I’ve read most existing sites that needed a sitemap already had one via one of the many free sitemap plugins. But for new WordPress sites going forward, having all the sitemap code in the WordPress core now means that new sites have the option of rolling with the default WordPress sitemaps, or use a dedicated plugin to do the job. This post is […] Continue reading »
The ones I know of: ads.txt humans.txt robots.txt security.txt This site makes use of robots.txt and humans.txt. I don’t need ads.txt because 3rd-party ads aren’t currently running on the site, and security.txt seems not necessary as the site’s contact form is easy enough for anyone to find. Continue reading »
This post is about how I cleaned up an incorrect URL in the Google search results. My business site is basically a one-page portfolio site, located at the URL https://monzillamedia.com/. But in the Google search results, the URL was showing as https://monzilla.biz/, which did not exist. So all potential customers were getting an error page. Fortunately I was able to re-acquire the monzilla.biz domain and redirect all traffic to monzillamedia.com. Continue reading »
Years ago, I thought the whole humans.txt thing was just silly, and even explained how to block humans.txt requests. But the concept actually has grown on me to the point where I now include a customized humans.txt file for most of my projects. It just seems like some useful information to make available for those who are looking for it. You know, all about the site, author, team, and such. And I have seen plenty of requests for humans dot […] Continue reading »
While solving the recent search engine spoofing mystery, I came across two excellent examples of spoofed search engine bots. This article uses the examples to explain how to identify any questionable bots hitting your site. Continue reading »
Here is a working list of all user agents for the major, top search engines. I use this information frequently for my plugins such as Blackhole for Bad Bots and BBQ Pro, so I figured it would be useful to post the information online for the benefit of others. Having the user agents for these popular bots all in one place helps to streamline my development process. Each search engine includes references and a regex pattern to match all known […] Continue reading »
g+ Share button Word on the streets is that the new Google+ Share button is the best way yet to benefit from Google’s myriad social-media services and all-important search-engine. And Google makes it SO easy to add the new Share button to your website. This article explains what it is, where it fits in with all the other social-Google stuff, and of course how to add the g+ Share button to any site. Continue reading »
The setup: I recently launched a new plugin that included a Demo page. To keep things flexible, I set up the Demo as a page on my experimental “Labs” WordPress installation, which is entirely nofollow, noindex and noarchive, meaning that Google can’t legitimately see what’s there. Continue reading »
I recently spent some time analyzing Perishable Press pages as they appear in the search results for Google, Bing, et al. Google Webmaster Tools provides a wealth of information about crawl errors, as well as the URLs of any pages that link to missing content. Combined with your site’s access/error logs, you have everything needed to track down 404 errors and clean up your listings in the search engine results. Continue reading »
Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file. I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt. This post summarizes my research and gives you a near-perfect robots file, so you can copy/paste completely “as-is”, or use a template to give you […] Continue reading »
In general, Perishable Press enjoys generous ranking in Google’s search-engine results. The site’s many pages bring in lots of traffic for some great keywords, and a direct search for “Perishable Press” returns the first spot, with eight featured site links even. And recently, after switching servers, traffic increased even further. Things were going well, and it seemed like the perfect opportunity to finally renovate and redesign the site. So I dive in.. And then approximately 24-48 hours after beginning work […] Continue reading »
There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This […] Continue reading »
In my recent guest post at The Nexus, I discuss Google’s new nofollow policy and suggest several ways to deal with it. In that article, I explain how Google allegedly has changed the way it deals with nofollow links. Instead of transferring leftover nofollow juice to remaining dofollow links as they always have, Google now pours all that wonderful nofollow juice right down the drain. This shift in policy comes as a terrible surprise to many webmasters and SEO gurus, […] Continue reading »
Anyone plugged into the Web these days has heard about how Google has supposedly changed the way it deals with nofollow attributes. According to a number of speculative reports, Google will no longer apply unused nofollow PageRank to other links on the page. So, let’s say that you have some sites that have been PageRank “sculpted” by way of strategically applied nofollow tags. For example, you may have nofollowed all of your comment, footer, or sidebar links. Ever since Google […] Continue reading »
One way to prevent Google from crawling certain pages is to use <meta /> elements in the <head></head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: Continue reading »
Given my propensity to discuss matters involving error log data (e.g., monitoring malicious behavior, setting up error logs, and creating extensive blacklists), I am often asked about the best way to go about monitoring 404 and other types of server errors. While I consider myself to be a novice in this arena (there are far brighter people with much greater experience), I do spend a lot of time digging through log entries and analyzing data. So, when asked recently about […] Continue reading »