Book Sale! Save 20% on WordPress books with discount code: SAVE20
Web Dev + WordPress + Security
Tag: robots
30 posts

WordPress Plugin: Disable WP Robots

WordPress 5.7 features a new Robots API that provides filter-based control over the robots meta tag. So if your site is running WordPress 5.7 or better, you will notice a new <meta /> tag included in the <head></head> section of your web pages. By default, the meta tag added by WordPress has a value of max-image-preview:large, which is fine IF it is the only robots meta tag on the page. If your site already has its own meta robots tag, […] Continue reading »

Code Snippets to Customize WordPress Sitemaps (Complete Guide)

By now most have heard about the WP Sitemaps feature introduced in WordPress version 5.5. From what I’ve read most existing sites that needed a sitemap already had one via one of the many free sitemap plugins. But for new WordPress sites going forward, having all the sitemap code in the WordPress core now means that new sites have the option of rolling with the default WordPress sitemaps, or use a dedicated plugin to do the job. This post is […] Continue reading »

About the Auto-Generated WP Sitemaps and How to Disable Using Simple Code or a Free Plugin

WordPress 5.5 and beyond features built-in sitemaps that are enabled by default. For new users and sites this may be a good thing. Now users don’t have to bother with thinking about how to implement a sitemap. Like with Privacy control, WordPress just does it for you automagically. BUT for the millions of sites that already have a sitemap thanks to any of the excellent and free sitemap plugins — that’s like maybe 5–10 million websites — well congratulations you […] Continue reading »

All the little .txt files you can put in the root directory of your website

The ones I know of: ads.txt humans.txt robots.txt security.txt This site makes use of robots.txt and humans.txt. I don’t need ads.txt because 3rd-party ads aren’t currently running on the site, and security.txt seems not necessary as the site’s contact form is easy enough for anyone to find. Continue reading »

Blackhole for Bad Bots – Quick Start

Welcome to the Quick Start Guide for the standalone PHP version of Blackhole for Bad Bots. This post basically is a condensed summary of the original Blackhole tutorial. So if you are new to the concept of blocking bad bots, check out the original tutorial. Otherwise, for those that are familiar, the following guide should simplify things and help you get started with Blackhole as quickly as possible. Continue reading »

Worst IPs: 2016 Edition

A little late this year, but following tradition here is my list of the absolute worst IP addresses from 2016. All in nice numerical order for easy crunching. These IPs are associated with all sorts of malicious activity, including exploit scanning, email harvesting, brute-force login attacks, referrer spam, and everything in between. Really obnoxious stuff that degrades your site’s performance and potentially threatens security. Continue reading »

How to Block Baidu Bot

A user of my 6G Firewall recently asked how to block the “baidu” bot from accessing their site. This post explains why Baidu is not blocked in 6G and provides a quick .htaccess technique to deny it (or anything claiming to be it) access to your site. Continue reading »

WordPress Plugin: Blackhole for Bad Bots

Image Courtesy NASA/JPL-Caltech. Update: Pro version now available! Check out Blackhole Pro » Finally translated my Blackhole Spider Trap into a FREE WordPress plugin. It’s fun, fast, flexible, and works silently behind the scenes to protect your WordPress-powered site from malicious bots. Here are some of the features: Continue reading »

Integrating Google No Captcha reCaptcha In WordPress Forms

In this tutorial you will learn how to integrate Google’s new reCatcha model in WordPress Login, Comment, Registration and Lost Password Forms. Continue reading »

Humans.txt

One thing I love about Twitter is the instant feedback. For the past few weeks I’ve been seeing lots of 404 requests like this: https://perishablepress.com/humans.txt https://perishablepress.com/humans.txt https://perishablepress.com/humans.txt At first I thought it was some skript kiddie getting creative, you know as a play on the robots.txt file, which is also located in the root of many websites. So it seemed interesting enough to tweet about: Continue reading »

Multiple Sitemaps

Yes you can have multiple sitemaps for your site. Create the sitemaps you need, and then specify them in your robots.txt file. For example, here are the robots.txt directives for the two sitemaps used here at Perishable Press: Continue reading »

Better Robots.txt Rules for WordPress

Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file. I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt. This post summarizes my research and gives you a near-perfect robots file, so you can copy/paste completely “as-is”, or use a template to give you […] Continue reading »

Protect Your Site with a Blackhole for Bad Bots

One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the honeypot trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied […] Continue reading »

Stop 404s for Mobile Versions of Your Site

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files: http://domain.tld/apple-touch-icon.png […] Continue reading »

Tell Google NOT to Index Certain Parts of Your Web Pages

There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This […] Continue reading »

Yahoo! Slurp too Stupid to be a Robot

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our sites every minute of the day. For the most part, there are effective methods available enabling […] Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
Banhammer: Protect your WordPress site against threats.
Thoughts
Our neighbor just lets their little rat dog bark incessantly 24/7. Endless barking for the whole neighborhood.
Loving Waterfox, my new favorite browser for general surfing and playing on teh Web.
Avoiding Amazon until they stop forcing 2-factor authentication. Frustrating waste of time. Make it optional imbeciles.
Today my trusty scanner died. Not going to replace it. And when my printer finally dies, I'm not going to replace that either.
Spent about a week or so away from screens and media as much as possible. Helps to regain perspective.
Celebrating 8 years providing premium WordPress plugins at Plugin Planet!
Power is *not* relying on a 3rd-party service to handle your email.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.