Plugin Sale! Save 15% on pro plugins with discount code: FALL2020
Web Dev + WordPress + Security
Tag: robots
29 posts

Code Snippets to Customize WordPress Sitemaps (Complete Guide)

By now most have heard about the WP Sitemaps feature introduced in WordPress version 5.5. From what I’ve read most existing sites that needed a sitemap already had one via one of the many free sitemap plugins. But for new WordPress sites going forward, having all the sitemap code in the WordPress core now means that new sites have the option of rolling with the default WordPress sitemaps, or use a dedicated plugin to do the job. This post is […] Continue reading »

About the Auto-Generated WP Sitemaps and How to Disable Using Simple Code or a Free Plugin

WordPress 5.5 and beyond features built-in sitemaps that are enabled by default. For new users and sites this may be a good thing. Now users don’t have to bother with thinking about how to implement a sitemap. Like with Privacy control, WordPress just does it for you automagically. BUT for the millions of sites that already have a sitemap thanks to any of the excellent and free sitemap plugins — that’s like maybe 5–10 million websites — well congratulations you […] Continue reading »

All the little .txt files you can put in the root directory of your website

The ones I know of: ads.txt humans.txt robots.txt security.txt This site makes use of robots.txt and humans.txt. I don’t need ads.txt because 3rd-party ads aren’t currently running on the site, and security.txt seems not necessary as the site’s contact form is easy enough for anyone to find. Continue reading »

Blackhole for Bad Bots – Quick Start

Welcome to the Quick Start Guide for the standalone PHP version of Blackhole for Bad Bots. This post basically is a condensed summary of the original Blackhole tutorial. So if you are new to the concept of blocking bad bots, check out the original tutorial. Otherwise, for those that are familiar, the following guide should simplify things and help you get started with Blackhole as quickly as possible. Continue reading »

Worst IPs: 2016 Edition

A little late this year, but following tradition here is my list of the absolute worst IP addresses from 2016. All in nice numerical order for easy crunching. These IPs are associated with all sorts of malicious activity, including exploit scanning, email harvesting, brute-force login attacks, referrer spam, and everything in between. Really obnoxious stuff that degrades your site’s performance and potentially threatens security. Continue reading »

How to Block Baidu Bot

A user of my 6G Firewall recently asked how to block the “baidu” bot from accessing their site. This post explains why Baidu is not blocked in 6G and provides a quick .htaccess technique to deny it (or anything claiming to be it) access to your site. Continue reading »

WordPress Plugin: Blackhole for Bad Bots

Image Courtesy NASA/JPL-Caltech. Update: Pro version now available! Check out Blackhole Pro » Finally translated my Blackhole Spider Trap into a FREE WordPress plugin. It’s fun, fast, flexible, and works silently behind the scenes to protect your WordPress-powered site from malicious bots. Here are some of the features: Continue reading »

Integrating Google No Captcha reCaptcha In WordPress Forms

In this tutorial you will learn how to integrate Google’s new reCatcha model in WordPress Login, Comment, Registration and Lost Password Forms. Continue reading »

Humans.txt

One thing I love about Twitter is the instant feedback. For the past few weeks I’ve been seeing lots of 404 requests like this: https://perishablepress.com/humans.txt https://perishablepress.com/humans.txt https://perishablepress.com/humans.txt At first I thought it was some skript kiddie getting creative, you know as a play on the robots.txt file, which is also located in the root of many websites. So it seemed interesting enough to tweet about: Continue reading »

Multiple Sitemaps

Yes you can have multiple sitemaps for your site. Create the sitemaps you need, and then specify them in your robots.txt file. For example, here are the robots.txt directives for the two sitemaps used here at Perishable Press: Continue reading »

Better Robots.txt Rules for WordPress

Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file. I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt. This post summarizes my research and gives you a near-perfect robots file, so you can copy/paste completely “as-is”, or use a template to give you […] Continue reading »

Protect Your Site with a Blackhole for Bad Bots

One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the honeypot trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied […] Continue reading »

Stop 404s for Mobile Versions of Your Site

If you’ve been keeping an eye on your 404 errors recently, you will have noticed an increase in requests for nonexistent mobile files and directories, especially over the past year or so. The scripts and bots requesting these files from your server seem to be looking for a mobile version of your site. Unfortunately, they are wasting bandwidth and resources in the process. It has become common to see the following 404 errors constantly repeated in your log files: http://domain.tld/apple-touch-icon.png […] Continue reading »

Tell Google NOT to Index Certain Parts of Your Web Pages

There are several ways to instruct Google to stay away from various pages in your site: Robots.txt directives Nofollow attributes on links Meta noindex/nofollow directives X-Robots noindex/nofollow directives ..and so on. These directives all function in different ways, but they all serve the same basic purpose: control how Google crawls the various pages on your site. For example, you can use meta noindex to instruct Google not to index your sitemap, RSS feed, or any other page you wish. This […] Continue reading »

Yahoo! Slurp too Stupid to be a Robot

I really hate bad robots. When a web crawler, spider, bot — or whatever you want to call it — behaves in a way that is contrary to expected and/or accepted protocols, we say that the bot is acting suspiciously, behaving badly, or just acting stupid in general. Unfortunately, there are thousands — if not hundreds of thousands — of nefarious bots violating our sites every minute of the day. For the most part, there are effective methods available enabling […] Continue reading »

Yahoo! Lies about Obeying Robots.txt Directives

There are two possibilities here: Yahoo!’s Slurp crawler is broken or Yahoo! lies about obeying Robots directives. Either case isn’t good. Slurp just can’t seem to keep its nose out of my private business. And, as I’ve discussed before, this happens all the time. Here are the two most recent offenses, as recorded in the log file for my blackhole spider trap: Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
WP Themes In Depth: Deep dive into WP theme development.
Thoughts
Watching. Waiting. Praying.
Got all of my free WordPress plugins updated for imminent WP 5.6 in early December. Pro plugin updates currently in the works.
7G Firewall now integrated into BBQ Firewall (free version). Pro version soon ;)
macOS Big Sur update complete. So far no crazy issues. Except TextEdit, which is completely screwed up and unusable. Replaced with free BBEdit.
Got so sick of macOS’ annoying “red dot” that I had to remove System Prefs from the dock. Come on Apple you can do better.
Beginning development of an Nginx version of 7G Firewall.
Happy Birthday to Perishable Press, celebrating 15 years online! :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.