32 posts

Redirect Stupid Bots to Existing Resources

In case you hadn’t noticed, I’m on another one of my posting sprees. Going through the past year’s worth of half-written drafts and collected code snippets, and sharing anything that might be useful or interesting. Here is a bit of .htaccess that brings together several redirection techniques into a singular plug-&-play code snippet. Continue reading »

Target User Agents and Reduce Spam via robots.txt

Your website’s robots.txt file probably contains some rules that tell compliant search engines and other bots which pages they can visit, and which are not allowed, etc. In most of the robots.txt files that I’ve looked at, all of the Allow and Disallow rules are applied to all user agents. This is done with the wildcard operator, which is written as an asterisk *, like this: User-agent: * This site’s robots.txt file provides a typical example. All of the allow/disallow […] Continue reading »

WordPress Plugin: Disable WP Robots

WordPress 5.7 features a new Robots API that provides filter-based control over the robots meta tag. So if your site is running WordPress 5.7 or better, you will notice a new <meta /> tag included in the <head></head> section of your web pages. By default, the meta tag added by WordPress has a value of max-image-preview:large, which is fine IF it is the only robots meta tag on the page. If your site already has its own meta robots tag, […] Continue reading »

Code Snippets to Customize WordPress Sitemaps (Complete Guide)

By now most have heard about the WP Sitemaps feature introduced in WordPress version 5.5. From what I’ve read most existing sites that needed a sitemap already had one via one of the many free sitemap plugins. But for new WordPress sites going forward, having all the sitemap code in the WordPress core now means that new sites have the option of rolling with the default WordPress sitemaps, or use a dedicated plugin to do the job. This post is […] Continue reading »

About the Auto-Generated WP Sitemaps and How to Disable Using Simple Code or a Free Plugin

WordPress 5.5 and beyond features built-in sitemaps that are enabled by default. For new users and sites this may be a good thing. Now users don’t have to bother with thinking about how to implement a sitemap. Like with Privacy control, WordPress just does it for you automagically. BUT for the millions of sites that already have a sitemap thanks to any of the excellent and free sitemap plugins — that’s like maybe 5–10 million websites — well congratulations you […] Continue reading »

All the little .txt files you can put in the root directory of your website

The ones I know of: ads.txt humans.txt robots.txt security.txt This site makes use of robots.txt and humans.txt. I don’t need ads.txt because 3rd-party ads aren’t currently running on the site, and security.txt seems not necessary as the site’s contact form is easy enough for anyone to find. Continue reading »

Blackhole for Bad Bots – Quick Start

Welcome to the Quick Start Guide for the standalone PHP version of Blackhole for Bad Bots. This post basically is a condensed summary of the original Blackhole tutorial. So if you are new to the concept of blocking bad bots, check out the original tutorial. Otherwise, for those that are familiar, the following guide should simplify things and help you get started with Blackhole as quickly as possible. Continue reading »

Worst IPs: 2016 Edition

A little late this year, but following tradition here is my list of the absolute worst IP addresses from 2016. All in nice numerical order for easy crunching. These IPs are associated with all sorts of malicious activity, including exploit scanning, email harvesting, brute-force login attacks, referrer spam, and everything in between. Really obnoxious stuff that degrades your site’s performance and potentially threatens security. Continue reading »

How to Block Baidu Bot

A user of my 6G Firewall recently asked how to block the “baidu” bot from accessing their site. This post explains why Baidu is not blocked in 6G and provides a quick .htaccess technique to deny it (or anything claiming to be it) access to your site. Continue reading »

WordPress Plugin: Blackhole for Bad Bots

Image Courtesy NASA/JPL-Caltech. Update: Pro version now available! Check out Blackhole Pro » Finally translated my Blackhole Spider Trap into a FREE WordPress plugin. It’s fun, fast, flexible, and works silently behind the scenes to protect your WordPress-powered site from malicious bots. Here are some of the features: Continue reading »

Integrating Google No Captcha reCaptcha In WordPress Forms

In this tutorial you will learn how to integrate Google’s new reCatcha model in WordPress Login, Comment, Registration and Lost Password Forms. Continue reading »

Humans.txt

One thing I love about Twitter is the instant feedback. For the past few weeks I’ve been seeing lots of 404 requests like this: https://perishablepress.com/humans.txt https://perishablepress.com/humans.txt https://perishablepress.com/humans.txt At first I thought it was some skript kiddie getting creative, you know as a play on the robots.txt file, which is also located in the root of many websites. So it seemed interesting enough to tweet about: Continue reading »

Multiple Sitemaps

Yes you can have multiple sitemaps for your site. Create the sitemaps you need, and then specify them in your robots.txt file. For example, here are the robots.txt directives for the two sitemaps used here at Perishable Press: Continue reading »

Better Robots.txt Rules for WordPress

Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file. I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt. This post summarizes my research and gives you a near-perfect robots file, so you can copy/paste completely “as-is”, or use a template to give you […] Continue reading »

Protect Your Site with a Blackhole for Bad Bots

One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the honeypot trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied […] Continue reading »

Stop 404s for Mobile Versions of Your Site

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files: http://domain.tld/apple-touch-icon.png […] Continue reading »

12 • Previous Posts »