Book Sale! Code WP2025 takes 20% OFF our Pro Plugins & Books »
Web Dev + WordPress + Security
109 posts related to: How to Block Baidu Bot

Blacklist Candidate Number 2008-02-10

[ Photo: Bob Barker points a finger ]

Welcome to the Perishable Press “Blacklist Candidate” series. In this post, we continue our new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags.. Like many bloggers, I like to spend a little quality time each week examining my site’s error logs. The data contained in Apache, 404, and even PHP error logs is always enlightening. In addition to suspicious behavior, spam nonsense, and cracker mischief, this site frequently endures automated and even manual attacks targeting various […] Continue reading »

Blacklist Candidate Number 2008-01-02

[ Photo: Bob Barker Pointing ]

Welcome to the Perishable Press “Blacklist Candidate” series! In this first post, we begin a new tradition of exposing, humiliating and banishing spammers, crackers and other worthless scumbags.. Every Wednesday, I take a little time to investigate my 404 error logs. In addition to spam, crack attacks, and other deliberate mischief, the 404 logs for Perishable Press contain errors due to missing resources, mistyped URLs, and the occasional bizarre or even suspicious behavior of the search-engine robots. Whenever possible, I […] Continue reading »

Yahoo! Slurp in My Blackhole (Yet Again)

Yup, ‘ol Slurp is at it again, flagrantly disobeying specific robots.txt rules forbidding access to my bad-bot trap, lovingly dubbed the “blackhole.” As many readers know, this is not the first time Yahoo has been caught behaving badly. This time, Yahoo was caught trespassing five different times via three different IPs over the course of four different days. Here is the data recorded in my site’s blackhole log (I know, that sounds terrible): Continue reading »

Yahoo! in my Blackhole

Okay, I realize that the title sounds a bit odd, but nowhere near as odd as my recent discovery of Slurp ignoring explicit robots.txt rules and digging around in my highly specialized bot trap, which I have lovingly dubbed “the blackhole”. What is up with that, Yahoo!? — does your Slurp spider obey robots.txt directives or not? I have never seen Google crawling around that side of town, neither has MSN nor even Ask ventured into the forbidden realms. Has […] Continue reading »

Ultimate .htaccess Blacklist 2: Compressed Version

[ Image: Lunar Eclipse ]

In our original htaccess blacklist article, we provide an extensive list of bad user agents. This so-called “Ultimate htaccess Blacklist” works great at blocking many different online villains: spammers, scammers, scrapers, scrappers, rippers, leechers — you name it. Yet, despite its usefulness, there is always room for improvement. Continue reading »

How to Verify the Four Major Search Engines

Keeping track of your access and error logs is a critical component of any serious security strategy. Many times, you will see a recorded entry that looks legitimate, such that it may easily be dismissed as genuine Google fare, only to discover upon closer investigation a fraudulent agent. There are many such cloaked or disguised agents crawling around these days, mimicking various search engines to hide beneath the radar. So it’s always a good idea to implement a procedure for […] Continue reading »

WordPress Spam Battle: 3 Seconds that will Save You Hours of Time

In the hellish battle against spam, many WordPress users have adopted a highly effective trinity of anti-spam plugins: Akismet Bad Behavior Spam Karma This effective triage of free WordPress plugins has served many a WP-blogger well, eliminating virtually 99% of all automated comment-related spam. When spam first became a problem for me, I installed this triple-threat arsenal of anti-spam plugins and immediately enjoyed the results. Although Spam Karma seemed a little invasive and resource-intensive, too much protection seemed far better […] Continue reading »

Eliminate 404 Errors for PHP Functions

Recently, I discussed the suspicious behavior recently observed by the Yahoo! Slurp crawler. As revealed by the site’s closely watched 404-error logs, Yahoo! had been requesting a series of nonexistent resources. Although a majority of the 404 errors were exclusive to the Slurp crawler, there were several instances of requests that were also coming from Google, Live, and even Ask. Initially, these distinct errors were misdiagnosed as existing URLs appended with various JavaScript functions. Here are a few typical examples […] Continue reading »

Suspicious Behavior from Yahoo! Slurp Crawler

[ Image: Black and white illustration of the upper half of a man's suspicious, paranoid face ]

Most of the time, when I catch scumbags attempting to spam, scrape, leech, or otherwise hack my site, I stitch up a new voodoo doll and let the cursing begin. No, seriously, I just blacklist the idiots. I don’t need their traffic, and so I don’t even blink while slamming the doors in their faces. Of course, this policy presents a bit of a dilemma when the culprit is one of the four major search engines. Slamming the door on […] Continue reading »

Ultimate htaccess Blacklist

[ Image: Solar Eclipse ]

For those of us running Apache, htaccess rewrite rules provide an excellent way to block spammers, scrapers, and other scumbags easily and effectively. While there are many htaccess tricks involving blocking domains, preventing access, and redirecting traffic, Apache’s mod_rewrite module enables us to target bad agents by testing the user-agent string against a predefined blacklist of unwanted visitors. Any matches are immediately and quietly denied access. Continue reading »

Invite Only: Traffic Control via Whitelist

Web developers trying to control comment-spam, bandwidth-theft, and content-scraping must choose between two fundamentally different approaches: selectively deny target offenders (the “blacklist” method) or selectively allow desirable agents (the “opt-in”, or “whitelist” method). Currently popular according to various online forums and discussion boards is the blacklist method. The blacklist method requires the webmaster to create and maintain a working list of undesirable agents, usually blocking their access via htaccess or php. The downside of blacklisting is that it requires considerable […] Continue reading »

Disobedient Robots and Company

In our never-ending battle against spammers, leeches, scrapers, and other online undesirables, we have implemented several powerful security measures to improve the operational integrity of our perpetual virtual existence. Here is a rundown of the new behind-the-scenes security features of Perishable Press. Continue reading »

Stop Bitacle from Stealing Content

[ Stop bitacle.org ]

If you have yet to encounter the content-scraping site, bitacle.org, consider yourself lucky. The scum-sucking worm-holes at bitacle.org are well-known for literally, blatantly, and piggishly stealing blog content and using it for financial gains through advertising. While I am not here to discuss the legal, philosophical, or technical ramifications of illegal bitacle behavior, I am here to provide a few critical tools that will help stop bitacle from stealing your content. Continue reading »

Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
WP Themes In Depth: Build and sell awesome WordPress themes.
Thoughts
Replacing my elaborate 27in iMac desk setup with a 15in Macbook Air.
Launching my new plugin, Head Meta Pro 🚀 Complete meta tags for WordPress.
Migrating sites to a new server, so far so good. Please report any bugs, thank you.
Arc browser looked good but lost me at “account required”. No browsers do that.
Finishing up the pro version of Head Meta Data plugin, launch planned this month.
Finally finished my ultimate block list to stop AI bots :) Blocks over 400+ AI bots!
After 10 years working late at night, my schedule has changed. I am now a “morning person”, starting my day at 6am or earlier.
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.