Spring Sale! Save 30% on all books w/ code: PLANET24
Web Dev + WordPress + Security

How to Block Baidu Bot

[ Baidu Search Engine ] A user of my 6G Firewall recently asked how to block the “baidu” bot from accessing their site. This post explains why Baidu is not blocked in 6G and provides a quick .htaccess technique to deny it (or anything claiming to be it) access to your site.

A cry for help

Recently one of my users sent an urgent message:

My aim to avoid mainly “baidu” eating all my bandwidth!! Also I learned that many other bots also using the name ‘baidu’. I found more than 50 different IPs & 10,000++ entries, either referer or requests, named ‘baidu’ within 7 days! … Can you please suggest somethings on this or if you already posted any blog for above issue, please tell me the links.

After providing this person a quick, easy-to-implement solution, I thought it would be useful to share the technique here at Perishable Press. Read on to learn more about the Baidu search bot and how to block it from accessing your site.

What is Baidu and why it’s not blocked by 6G et al

Here is the big blurb on Baidu:

Baidu, Inc., incorporated on January 18, 2000, is a Chinese web services company headquartered at the Baidu Campus in Beijing’s Haidian District. Baidu offers many services, including a Chinese search engine for websites, audio files and images.

Basically Baidu is not blocked in 6G Firewall, BBQ Pro, or Blackhole for Bad Bots because it’s a search engine used by the majority of the Chinese population. It’s like the Chinese version of Google, if you will. This also is why I include Baidu in my list of user agents for the top search engines. Because in normal circumstances it may not be prudent to ban Baidu by default, especially if you’re going for as much traffic as you can get.

Why block Baidu?

On the other hand, there are numerous reasons why someone would want to block Baidu, for example:

  • Too many bad requests reporting “baidu” as user agent (whether legit or not)
  • Fine-tuning traffic to a particular geographic region
  • Personal strategy, political reasons, etc.

And of course the reason is irrelevant. If you own your website you are free to block or allow whomever you wish. Further, keep in mind that “blocking baidu” is not exclusively targeting the actual search engine bot. Rather, it’s blocking any bot that reports itself as being “baidu”. In my experience, there are legions of bad bots that transmit counterfeit identities. I don’t know about you, but my personal security strategy leans towards blocking any bot that is pretending to be someone else. Dishonest little bots.

How to block Baidu

Fortunately, blocking “baidu” is dead simple using a slice of .htaccess. Perhaps the easiest, most effective way of doing the job is to add the following directives to your site’s root .htaccess file:

# block baidu bot
<IfModule mod_rewrite.c>
	RewriteCond %{HTTP_USER_AGENT} baidu [NC]
	RewriteRule .* - [F,L]
</IfModule>

Once implemented, that snippet provides extra-strong protection against anything claiming to be “baidu” (that is, it will block any bot that includes the term “baidu” anywhere in the reported user agent). Very effective, so only add to your site if you know what you are doing and are positive that you want to say goodbye to Baidu.

About the Author
Jeff Starr = Designer. Developer. Producer. Writer. Editor. Etc.
SAC Pro: Unlimited chats.

5 responses to “How to Block Baidu Bot”

  1. Baidu says that they follow robots.txt directives.

    http://help.baidu.com/question?prod_en=master&class=498&id=1000973

    However as they only index simplified Chinese language pages there’s usually no point letting them crawl the site at all.

    • Good point, thanks John.

    • John, why would you think that Baidu only indexes Chinese language pages? I don’t believe that is true at all. On the page you referred to, a bit further down there are instructions to see what Baidu has indexed of any given site. I just added perishablepress to it and there are many pages that in fact have been indexed, total amount of results is actually 3,410 pages!

      link to see what is indexed: http://www.baidu.com/s?wd=site%3Aperishablepress.com

      • That is one reason I personally do not block Baidu. I need any traffic I can get. Very interesting indeed.

      • That’s news to me, in the past they were crawling my site at a crazy speed and sending zero traffic. The research I did at the time told me that they only indexed simplified Chinese.

        Looks like things have changed, or the information I read at the time was wrong.

Comments are closed for this post. Something to add? Let me know.
Welcome
Perishable Press is operated by Jeff Starr, a professional web developer and book author with two decades of experience. Here you will find posts about web development, WordPress, security, and more »
BBQ Pro: The fastest firewall to protect your WordPress.
Thoughts
I live right next door to the absolute loudest car in town. And the owner loves to drive it.
8G Firewall now out of beta testing, ready for use on production sites.
It's all about that ad revenue baby.
Note to self: encrypting 500 GB of data on my iMac takes around 8 hours.
Getting back into things after a bit of a break. Currently 7° F outside. Chillz.
2024 is going to make 2020 look like a vacation. Prepare accordingly.
First snow of the year :)
Newsletter
Get news, updates, deals & tips via email.
Email kept private. Easy unsubscribe anytime.