Articles tagged with “msn”
Here is a list of all articles tagged as “msn”. If you enjoy the high-quality content that I provide here at Perishable Press, you may want to subscribe to our main content feed to stay current.
- SEO Experiment: Let Google Sort it Out
- One way to prevent Google from crawling certain pages is to use elements in the section of your web documents. For example, if I want to prevent Google from indexing and archiving a certain page, I would add the following code to the head of my document: I’m no SEO guru, but it is my general understanding that it is possible to manipulate the flow of page rank throughout a site through strategic implementation of directives. After thinking about it, I recently decided to remove the strategic directives from my pages here at Perishable Press. This ...
- CSS Implementations of the Rich and Famous
- A great way to improve your CSS skills is to check out the stylesheets used by other websites. Digging behind the scenes and exploring some applied CSS provides new ideas and insights about everything from specificity and formatting to hacks and shortcuts. Learning CSS by reading about ideal cases and theoretical applications is certainly important, but actually seeing how the language is applied in “real-world” scenarios provides first-hand knowledge and insight. While there are millions of standards-based, CSS-designed ...
- Unexplained Crawl Behavior Involving Tagged Query Strings
- I need your help! I am losing my mind trying to solve another baffling mystery. For the past three or four months, I have been recording many 404 Errors generated from msnbot, Yahoo-Slurp, and other spider crawls. These errors result from invalid requests for URLs containing query strings such as the following: http://perishablepress.com/press/page/2/?tag=spam http://perishablepress.com/press/page/3/?tag=code http://perishablepress.com/press/page/2/?tag=email http://perishablepress.com/press/page/2/?tag=xhtml http://perishablepress.com/press/page/4/?tag=notes http://perishablepress.com/press/page/2/?tag=flash http://perishablepress.com/press/page/2/?tag=links http://perishablepress.com/press/page/3/?tag=theme http://perishablepress.com/press/page/2/?tag=press ..plus hundreds and hundreds more 1. The URL pattern is always the same: a different page number followed by a query string containing one of the tags used here at ...
- How to Verify the Four Major Search Engines
- Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. Thus, it is a good idea to implement a procedure for scanning and checking select agents for authenticity. In general, the verification process involves a “forward/reverse” DNS lookup, which is then cross-verified with ...
- Get Back
- The Internet Archive Wayback Machine is a trip into the online past, offering glimpses of ancient website relics. Reaching back through the virtual dark ages of 1996, the Wayback Machine chronicles over 55 billion pages. Although many of the pages appear incomplete due to missing images, the Wayback Machine provides an invaluable resource, enabling users to experience and learn from the arcane internet of yesterday. Check out these archaic online offerings: netscape.com, circa December 31, 1996 microsoft.com, circa October, 1996 The first Google website, circa late 1998
- Robots Notes Plus
- About the Robots Exclusion Standard1: The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website. Notes on the robots.txt Rules: Rules of specificity apply, not inheritance. Always include a blank line between rules. Note also that not all robots obey the robots rules -- even Google has been reported to ignore certain robots rules. Also, comments are allowed (and recommended) within any robots.txt file when written on a ...