How to Block Bad Bots

Blackhole Pro: Block Bad Bots

Suffering from spammers, content scrapers, bandwidth leeches, and other bad bots? Got some loser stalking your chat forum? Site getting scanned by endless malicious requests? In this tutorial, you’ll learn how to block bad bots and users with minimal effort. Keeping the trash away from your site is gonna free up valuable server resources, conserve bandwidth, and improve the overall security and quality of your site.

Block bad bots with a plugin

If you are using WordPress or some other CMS, the easiest way to block bad bots and other bad requests is to use a plugin. For WordPress, I develop a set of plugins that will block the “bad guys” and help keep your site safe and secure.

Blackhole for Bad Bots

Blackhole for Bad Bots does one thing and does it well: traps bad bots in a virtual blackhole. It’s a lot of fun, and very effective at stopping bad bots from visiting your site. The idea is simple: add a hidden link to your site, forbid access to that link via robots.txt, and then automatically block any bots that disobey the rules and follow the link.

Both free and pro versions work the same way, but Blackhole Pro gives you some sweet features not included in the free version. Like deluxe bot log and adding bots manually. Check out the Blackhole Pro Tutorials at Plugin Planet to learn more.

BBQ: Block Bad Queries

To block bad HTTP requests using a WordPress plugin, I recommend either BBQ: Block Bad Queries (free) or BBQ Pro (premium). These firewall plugins are designed to protect your site from a wide range of malicious activity, including bad bots, users, requests, referrers, IPs, bots, and more.

BBQ (free version)

The great thing about the free version of BBQ is that it’s entirely plug-&-play, set-it-and-forget-it. Install, activate, and done. Instant firewall. BBQ automatically blocks a wide range of bad requests. And it’s super lightweight and super fast. To customize the free version of BBQ, you can use the whitelist and blacklist plugins.

BBQ Pro

The Pro version of BBQ can do much more. The plugin settings give you full control over every detail. You can block any IPs, user agents, referrers, requests, and query strings. Plus you can view hit counts for every blocked item, so you can dial in the perfect firewall. Here are some tutorials for some of the cool things you can do with BBQ Pro.

Of course, there are a million other scripts and plugins out there, so feel free to explore and use whichever best suits your needs. Obviously, I’m going to recommend my own plugins, because I can attest to their quality and effectiveness.

Block bad bots with .htaccess

While blocking bots with plugins is super-easy, doing so requires a lot more resources (e.g., PHP, database, assets) than using .htaccess. With .htaccess, blocking functionality happens directly at the server level, without requiring PHP, database, assets, and so forth. So you’re saving a lot of server resources while maximizing site performance.

Before diving into some .htaccess techniques for blocking bad bots, it’s important to understand that .htaccess is very powerful and also very strict in terms of syntax. So you definitely should make a backup of your current .htaccess file before making any changes. It’s also a good idea in general to keep good working backups of your files and database.

Also note that, for any of these techniques, the .htacccess code can be added to the bottom of the file, after any existing rules. Alternately, if you have access and know what you are doing, you can add any of these techniques to Apache’s configuration file. Also, if your site doesn’t have an .htaccess file in the site root directory, you can create one.

Identifying bad bots

The first step in blocking bad bots and other bad requests is to identify them. The best way to do this is by keeping an eye on your site’s log files. Analyzing log files is one of those things that requires practice, and is more of an art than a science.

You may also want to check around for any good log-parsing or log-analysis software. I know there are some options out there, but can’t really vouch for any of them (I prefer looking at the raw data). Whatever method you use, once you identify some bad requests, there are numerous ways to block future occurrences. For example, you can:

Before continuing with examples of these methods, make sure that you investigate the request to determine whether or not it should be blocked. The best way to do this is with a few quick searches, but there also are help forums and databases of known bad bots that you can use to get more information.

Block bad bots via Request URI

Let’s say that we are seeing lots of requests that look like this:

http://example.com/evil/request/?xscan=123
http://example.com/another/evil/request/?xscan=123456
http://example.com/and/another/evil/request/?xscan=123456789
.
.
.

These requests all have different user agents, IP addresses, and referrers. So the only way to block similar future requests is to target the request string directly. That is, we can use .htaccess to block all requests that match the same pattern. The trick to this blocking technique is to find the best pattern. Ideally, you want to find the most unique common factor for the type of requests that you want to block. In the above example, we find these common patterns:

  • /evil/
  • /request/
  • xscan
  • 123

When deciding on a pattern to block, it is important to choose one that is not used by any existing resources on your site. So for this example, we would choose to block all requests that include the string, /evil/. We could also maybe choose to block xscan, but there is a greater chance that that particular string may be used for legit requests, like maybe something that a plugin is using or whatever.

So to block /evil/, we can use mod_alias by adding the following code to our site’s root .htaccess file (add to the bottom of the file):

# Block via Request URI
<IfModule mod_alias.c>
	RedirectMatch 403 /evil/
</IfModule>

Later, if we want to also block all request that include the string, /spam/, we can modify the code like so:

# Block via Request URI
<IfModule mod_alias.c>
	RedirectMatch 403 /(evil|spam)/
</IfModule>

Note that this technique only works when the target pattern is included in the main part of the request URI. To also block these patterns if included in the query-string portion of the request (i.e., everything after the question mark ?), we can use mod_rewrite instead:

# Block via Query String
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{QUERY_STRING} (evil|spam) [NC]
	RewriteRule (.*) - [F,L]
</IfModule>

And the regular expression (regex) with mod_rewrite works the same as it does with mod_alias. Once this code is in place, all requests that include either of the banned strings will be denied access. Remember to test everything for proper functionality before going live with this technique.

Block bad bots via User Agent

This example demonstrates how to block bad bots via their user agent. Let’s say that we notice a bunch of nasty spam requests all reporting one of the following user agents:

EvilBot
ScumSucker
FakeAgent

To block all requests from any of these user agents (bots), add the following to .htaccess:

# Block via User Agent
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{HTTP_USER_AGENT} (EvilBot|ScumSucker|FakeAgent) [NC]
	RewriteRule (.*) - [F,L]
</IfModule>

Save, upload, and done. You can (and should) test that everything is working properly before going live. To test that the specified user agents are actually blocked, you can use a free online tool such as Bots vs Browsers. You can add more user agents to the list like so:

RewriteCond %{HTTP_USER_AGENT} (EvilBot|ScumSucker|FakeAgent|AnotherSpammer|AndAnotherEtc) [NC]

Here we have added and to the list. Note: this line would replace the one included in the above technique. Easy peasy.

Block bad bots via Referrer

Spam referrers, content thieves, and bandwidth leeches are a huge waste time and resources. If you find that your site is targeted, you can easily block requests from specific referrers. For example, let’s say that we want to block the following referrer URLs:

http://spam-referrer.org/
http://content-thief.biz/
http://bandwidth-leech.net/

Similar to previous techniques, we can block these referrers via Apache’s mod_rewrite. To do so, add the following code to your site’s root .htaccess file:

# Block via Referrer
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{HTTP_REFERER} ^http://(.*)spam-referrer\.org [NC,OR]
	RewriteCond %{HTTP_REFERER} ^http://(.*)content-thief\.biz [NC,OR]
	RewriteCond %{HTTP_REFERER} ^http://(.*)bandwidth-leech\.net [NC]
	RewriteRule (.*) - [F,L]
</IfModule>

This code does the following:

  1. Enables mod_rewrite (if not already enabled)
  2. Checks the referrer for any of the specified URLs
  3. If the referrer is a match, it is blocked via 403 “Forbidden” response

You can easily add more referrers by adding a similar RewriteCond. The important thing to remember is that the last RewriteCond must not include an OR flag. Note also that, for any of the mod_rewrite techniques, you can customize the response for blocked requests. For more information, check out the section “Dealing with Blacklisted Visitors” in my tutorial, Eight Ways to Blacklist with Apache’s mod_rewrite.

Block bad bots via IP Address

Blocking via IP address is useful when dealing with specific users. While you can use IPs to block entire countries, ranges, etc., doing so costs more resources than its worth. There is just too much variation in terms of proxies, caches, forwarding, and spoofing for IP-blocking to be very effective. Before blocking anyone or anything via IP, I recommend taking a quick read through the information provided in my article, Worst IPs: 2016 Edition.

That said, let’s say that we do want to block a user based on their associated IP address. For example, if we have a public forum and some d-bag is just driving everyone nuts with attitude, spam, nonsense, whatever, we can check the logs to get their IP address, say, 123.456.789.000. With that info, we can add the following directives to .htaccess:

# Block via IP Address
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000
	RewriteRule (.*) - [F,L]
</IfModule>

That’s all there is to it. Note that we are escaping the dots with a backslash \. This tells Apache to treat the dots as literal instead of as a wildcard, which is the default for an unescaped dot. Escaping the dots ensures that we’re only blocking the specified IP address, so there will be no false positives.

To block more than one IP, do this:

# Block via IP Address
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000 [OR]
	RewriteCond %{REMOTE_ADDR} ^111\.222\.333\.000 [OR]
	RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.000
	RewriteRule (.*) - [F,L]
</IfModule>

As in previous mod_rewrite techniques, the last RewriteCond should NOT include the [OR] flag. Other than that, everything is straightforward.

To block a range of IPs, we can simply omit the last octet (or whichever octets are required for the range):

# Block via IP Address
<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{REMOTE_ADDR} ^123\.           [OR]
	RewriteCond %{REMOTE_ADDR} ^111\.222\.      [OR]
	RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.
	RewriteRule (.*) - [F,L]
</IfModule>

Here we are blocking:

  • All IPs that begin with 123.
  • All IPs that begin with 111.222.
  • All IPs that begin with 444.555.777

Alternate method for blocking IPs

Here is an alternate method for blocking IPs (Apache 2.4 and better):

# Block via IP Address
<RequireAll>
	Require all granted
	Require not ip 123.456.789.000
	Require not ip 111.222.333.000
	Require not ip 444.555.777.000
</RequireAll>

Or, for older versions of Apache (less than or equal to version 2.4):

# Block via IP Address
<Files ~ ".*\..*">
	Order Deny,Allow
	Allow from all
	Deny from 123.456.789.000
	Deny from 111.222.333.000
	Deny from 444.555.777.000
</Files>

It’s important to note here that we are NOT escaping the dots in the IPs. To block a range of IPs using this method, simply omit the final octet from the IP, as in previous examples. You can also use these techniques for blocking based on Host instead of IP. For more information, check out the official Apache documentation.

Going further

Hopefully this article has helped you to better protect your site against bad bots and other malicious activity. To go further with securing your site, check out some of my other tutorials:

As always, questions and comments welcome :)