There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This […] Continue reading »
In my recent guest post at The Nexus, I discuss Google’s new nofollow policy and suggest several ways to deal with it. In that article, I explain how Google allegedly has changed the way it deals with nofollow links. Instead of transferring leftover nofollow juice to remaining dofollow links as they always have, Google now pours all that wonderful nofollow juice right down the drain. This shift in policy comes as a terrible surprise to many webmasters and SEO gurus, […] Continue reading »
Just like last year, this Spring I have been taking some time to do some general maintenance here at Perishable Press. This includes everything from fixing broken links and resolving errors to optimizing scripts and eliminating unnecessary plugins. I’ll admit, this type of work is often quite dull, however I always enjoy the process of cleaning up my HTAccess files. In this post, I share some of the changes made to my HTAccess files and explain the reasoning behind each […] Continue reading »
One way to prevent Google from crawling certain pages is to use <meta /> elements in the <head></head> section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: Continue reading »
Given my propensity to discuss matters involving error log data (e.g., monitoring malicious behavior, setting up error logs, and creating extensive blacklists), I am often asked about the best way to go about monitoring 404 and other types of server errors. While I consider myself to be a novice in this arena (there are far brighter people with much greater experience), I do spend a lot of time digging through log entries and analyzing data. So, when asked recently about […] Continue reading »
You have seen user-agent blacklists, IP blacklists, 4G Blacklists, and everything in between. Now, in this article, for your sheer and utter amusement, I present a collection of over 8000 blacklisted referrers. Shortcut: skip the article and jump to Disclaimer and Download » Referrer Spam Sucks For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the […] Continue reading »
As discussed in my recent article, Eight Ways to Blacklist with Apache’s mod_rewrite, one method of stopping spammers, scrapers, email harvesters, and malicious bots is to blacklist their associated user agents. Apache enables us to target bad user agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any bot identifying itself as one of the blacklisted agents is immediately and quietly denied access. While this certainly isn’t the most effective method of securing your site against […] Continue reading »
At last! After many months of collecting data, crafting directives, and testing results, I am thrilled to announce the release of the 4G Blacklist! The 4G Blacklist is a next-generation protective firewall that secures your site against a wide range of automated attacks and other malicious activity. Continue reading »
I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our sites every minute of the day. For the most part, there are effective methods available enabling […] Continue reading »
Last year, after much research and discussion, I built a concise, lightweight security strategy for Apache-powered websites. Prior to the development of this strategy, I relied on several extensive blacklists to protect my sites against malicious user agents and IP addresses. Unfortunately, these mega-lists eventually became unmanageable and ineffective. As increasing numbers of attacks hit my server, I began developing new techniques for defending against external threats. This work soon culminated in the release of a “next-generation” blacklist that works […] Continue reading »
I have written previously on the fine art of preloading images without JavaScript using only CSS. These caching techniques have evolved in terms of effectiveness and accuracy, but may be improved further to allow for greater cross-browser functionality. In this post, I share a “CSS-only” preloading method that works better under a broader set of conditions. Previous image-preloading techniques target all browsers, devices, and media types. Unfortunately, certain browsers do not load images that are hidden directly (via the <img […] Continue reading »
Working a great deal with blacklists, I am frequently trying to isolate and identify problematic code. For example, a blacklist implementation may suddenly prevent a certain type of page from loading. In order to resolve the issue, the blacklist is immediately removed and tested for the offending directive(s). This situation is common to other coding languages as well, especially when dealing with CSS. Identifying problem code is more of an art form than a science, but fortunately, there are a […] Continue reading »
In my previous article on WordPress title tags, How to Generate Perfect WordPress Title Tags without a Plugin, We explore everything needed to create perfect titles for your WordPress-powered site. After discussing the functionality and implementation of various code examples, the article concludes with a “perfect” title-tag script that covers all the bases. Or so I thought.. Some time after the article had been posted, Mat8iou chimed in with a couple of ways to improve thie script by cleaning up […] Continue reading »
Ever wanted to provide automatic language translations of your web pages without installing another plugin? Here is a valid, SEO-friendly technique that takes advantage of Google’s free translation service. All you need is a PHP-enabled server and you’re good to go. Just copy and paste the following code into the desired location in your page template and enjoy the results. Once in place, this code will produce translation links for eight common languages for every page on your site. Grab, […] Continue reading »
I recently added OpenSearch functionality to Perishable Press. Now, OpenSearch-enabled browsers such as Firefox and IE 7 alert users with the option to customize their browser’s built-in search feature with an exclusive OpenSearch-powered search option for Perishable Press. The autodiscovery feature of supportive browsers detects the custom search protocol and enables users to easily add it to their collection of readily available site-specific search options. Now, users may search the entire Perishable Press domain with the click of a button. […] Continue reading »