How to Block Baidu Bot

[ Baidu Search Engine ] A user of my 6G Firewall recently asked how to block the “baidu” bot from accessing their site. This post explains why Baidu is not blocked in 6G and provides a quick .htaccess technique to deny it (or anything claiming to be it) access to your site.

A cry for help

Recently one of my users sent an urgent message:

My aim to avoid mainly “baidu” eating all my bandwidth!! Also I learned that many other bots also using the name ‘baidu’. I found more than 50 different IPs & 10,000++ entries, either referer or requests, named ‘baidu’ within 7 days! … Can you please suggest somethings on this or if you already posted any blog for above issue, please tell me the links.

After providing this person a quick, easy-to-implement solution, I thought it would be useful to share the technique here at Perishable Press. Read on to learn more about the Baidu search bot and how to block it from accessing your site.

What is Baidu and why it’s not blocked by 6G et al

Here is the big blurb on Baidu:

Baidu, Inc., incorporated on January 18, 2000, is a Chinese web services company headquartered at the Baidu Campus in Beijing’s Haidian District. Baidu offers many services, including a Chinese search engine for websites, audio files and images.

Basically Baidu is not blocked in 6G Firewall, BBQ Pro, or Blackhole for Bad Bots because it’s a search engine used by the majority of the Chinese population. It’s like the Chinese version of Google, if you will. This also is why I include Baidu in my list of user agents for the top search engines. Because in normal circumstances it may not be prudent to ban Baidu by default, especially if you’re going for as much traffic as you can get.

Why block Baidu?

On the other hand, there are numerous reasons why someone would want to block Baidu, for example:

  • Too many bad requests reporting “baidu” as the user agent (whether legit or not)
  • Fine-tuning traffic to a particular geographic region
  • Personal strategy, political reasons, etc.

And of course the reason is irrelevant. If you own your website you are free to block or allow whomever you wish. Further, keep in mind that “blocking baidu” is not exclusively targeting the actual search engine bot. Rather, it’s blocking any bot that reports itself as being “baidu”. In my experience, there are legions of bad bots that transmit counterfeit identities. I don’t know about you, but my personal security strategy leans towards blocking any bot that is pretending to be someone else. Dishonest little bots.

How to block Baidu

Fortunately, blocking “baidu” is dead simple using a slice of .htaccess. Perhaps the easiest, most effective way of doing the job is to add the following directives to your site’s root .htaccess file:

# block baidu bot
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_USER_AGENT} baidu [NC]
	RewriteRule .* - [F,L]

Once implemented, that snippet provides extra-strong protection against anything claiming to be “baidu” (that is, it will block any bot that includes the term “baidu” anywhere in the reported user agent). Very effective, so only add to your site if you know what you are doing and are positive that you want to say goodbye to Baidu.