How to Block Bad Bots
Suffering from spammers, content scrapers, bandwidth leeches, and other bad bots? Got some loser stalking your chat forum? Site getting scanned by endless malicious requests? In this tutorial, you’ll learn how to block bad bots and users with minimal effort. Keeping the trash away from your site is gonna free up valuable server resources, conserve bandwidth, and improve the overall security and quality of your site.
Menu
Block bad bots with a plugin
If you are using WordPress or some other CMS, the easiest way to block bad bots and other bad requests is to use a plugin. For WordPress, I develop a set of plugins that will block the “bad guys” and help keep your site safe and secure.
Blackhole for Bad Bots
Blackhole for Bad Bots does one thing and does it well: traps bad bots in a virtual blackhole. It’s a lot of fun, and very effective at stopping bad bots from visiting your site. The idea is simple: add a hidden link to your site, forbid access to that link via robots.txt
, and then automatically block any bots that disobey the rules and follow the link.
Both free and pro versions work the same way, but Blackhole Pro gives you some sweet features not included in the free version. Like deluxe bot log and adding bots manually. Check out the Blackhole Pro Tutorials at Plugin Planet to learn more.
BBQ: Block Bad Queries
To block bad HTTP requests using a WordPress plugin, I recommend either BBQ: Block Bad Queries (free) or BBQ Pro (premium). These firewall plugins are designed to protect your site from a wide range of malicious activity, including bad bots, users, requests, referrers, IPs, bots, and more.
BBQ (free version)
The great thing about the free version of BBQ is that it’s entirely plug-&-play, set-it-and-forget-it. Install, activate, and done. Instant firewall. BBQ automatically blocks a wide range of bad requests. And it’s super lightweight and super fast. To customize the free version of BBQ, you can use the whitelist and blacklist plugins.
BBQ Pro
The Pro version of BBQ can do much more. The plugin settings give you full control over every detail. You can block any IPs, user agents, referrers, requests, and query strings. Plus you can view hit counts for every blocked item, so you can dial in the perfect firewall. Here are some tutorials for some of the cool things you can do with BBQ Pro.
Of course, there are a million other scripts and plugins out there, so feel free to explore and use whichever best suits your needs. Obviously, I’m going to recommend my own plugins, because I can attest to their quality and effectiveness.
Block bad bots with .htaccess
While blocking bots with plugins is super-easy, doing so requires a lot more resources (e.g., PHP, database, assets) than using .htaccess. With .htaccess, blocking functionality happens directly at the server level, without requiring PHP, database, assets, and so forth. So you’re saving a lot of server resources while maximizing site performance.
Before diving into some .htaccess techniques for blocking bad bots, it’s important to understand that .htaccess is very powerful and also very strict in terms of syntax. So you definitely should make a backup of your current .htaccess file before making any changes. It’s also a good idea in general to keep good working backups of your files and database.
Also note that, for any of these techniques, the .htacccess code can be added to the bottom of the file, after any existing rules. Alternately, if you have access and know what you are doing, you can add any of these techniques to Apache’s configuration file. Also, if your site doesn’t have an .htaccess file in the site root directory, you can create one.
Identifying bad bots
The first step in blocking bad bots and other bad requests is to identify them. The best way to do this is by keeping an eye on your site’s log files. Analyzing log files is one of those things that requires practice, and is more of an art than a science.
- Analyzing Weird 404 Search Engine Requests
- What Chrome Predictive URLs Look Like on the Server
- Example of a Spoofed Search Engine Bot
You may also want to check around for any good log-parsing or log-analysis software. I know there are some options out there, but can’t really vouch for any of them (I prefer looking at the raw data). Whatever method you use, once you identify some bad requests, there are numerous ways to block future occurrences. For example, you can:
Before continuing with examples of these methods, make sure that you investigate the request to determine whether or not it should be blocked. The best way to do this is with a few quick searches, but there also are help forums and databases of known bad bots that you can use to get more information.
Block bad bots via Request URI
Let’s say that we are seeing lots of requests that look like this:
http://example.com/evil/request/?xscan=123
http://example.com/another/evil/request/?xscan=123456
http://example.com/and/another/evil/request/?xscan=123456789
.
.
.
These requests all have different user agents, IP addresses, and referrers. So the only way to block similar future requests is to target the request string directly. That is, we can use .htaccess to block all requests that match the same pattern. The trick to this blocking technique is to find the best pattern. Ideally, you want to find the most unique common factor for the type of requests that you want to block. In the above example, we find these common patterns:
/evil/
/request/
xscan
123
When deciding on a pattern to block, it is important to choose one that is not used by any existing resources on your site. So for this example, we would choose to block all requests that include the string, /evil/
. We could also maybe choose to block xscan
, but there is a greater chance that that particular string may be used for legit requests, like maybe something that a plugin is using or whatever.
So to block /evil/
, we can use mod_alias
by adding the following code to our site’s root .htaccess file (add to the bottom of the file):
# Block via Request URI
<IfModule mod_alias.c>
RedirectMatch 403 /evil/
</IfModule>
Later, if we want to also block all request that include the string, /spam/
, we can modify the code like so:
# Block via Request URI
<IfModule mod_alias.c>
RedirectMatch 403 /(evil|spam)/
</IfModule>
Note that this technique only works when the target pattern is included in the main part of the request URI. To also block these patterns if included in the query-string portion of the request (i.e., everything after the question mark ?
), we can use mod_rewrite
instead:
# Block via Query String
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{QUERY_STRING} (evil|spam) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
And the regular expression (regex) with mod_rewrite
works the same as it does with mod_alias
. Once this code is in place, all requests that include either of the banned strings will be denied access. Remember to test everything for proper functionality before going live with this technique.
mod_rewrite
techniques, check out Eight Ways to Redirect with Apache’s mod_rewrite.Block bad bots via User Agent
This example demonstrates how to block bad bots via their user agent. Let’s say that we notice a bunch of nasty spam requests all reporting one of the following user agents:
EvilBot
ScumSucker
FakeAgent
To block all requests from any of these user agents (bots), add the following to .htaccess:
# Block via User Agent
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (EvilBot|ScumSucker|FakeAgent) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
Save, upload, and done. You can (and should) test that everything is working properly before going live. To test that the specified user agents are actually blocked, you can use a free online tool such as Bots vs Browsers. You can add more user agents to the list like so:
RewriteCond %{HTTP_USER_AGENT}
(EvilBot|ScumSucker|FakeAgent|AnotherSpammer|AndAnotherEtc) [NC]
Here we have added and
to the list. Note: this line would replace the one included in the above technique. Easy peasy.
Block bad bots via Referrer
Spam referrers, content thieves, and bandwidth leeches are a huge waste time and resources. If you find that your site is targeted, you can easily block requests from specific referrers. For example, let’s say that we want to block the following referrer URLs:
http://spam-referrer.org/
http://content-thief.biz/
http://bandwidth-leech.net/
Similar to previous techniques, we can block these referrers via Apache’s mod_rewrite
. To do so, add the following code to your site’s root .htaccess file:
# Block via Referrer
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.*)spam-referrer\.org [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.*)content-thief\.biz [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.*)bandwidth-leech\.net [NC]
RewriteRule (.*) - [F,L]
</IfModule>
This code does the following:
- Enables
mod_rewrite
(if not already enabled) - Checks the referrer for any of the specified URLs
- If the referrer is a match, it is blocked via 403 “Forbidden” response
You can easily add more referrers by adding a similar RewriteCond
. The important thing to remember is that the last RewriteCond
must not include an OR
flag. Note also that, for any of the mod_rewrite
techniques, you can customize the response for blocked requests. For more information, check out the section “Dealing with Blacklisted Visitors” in my tutorial, Eight Ways to Blacklist with Apache’s mod_rewrite.
Block bad bots via IP Address
Blocking via IP address is useful when dealing with specific users. While you can use IPs to block entire countries, ranges, etc., doing so costs more resources than its worth. There is just too much variation in terms of proxies, caches, forwarding, and spoofing for IP-blocking to be very effective. Before blocking anyone or anything via IP, I recommend taking a quick read through the information provided in my article, Worst IPs: 2016 Edition.
That said, let’s say that we do want to block a user based on their associated IP address. For example, if we have a public forum and some d-bag is just driving everyone nuts with attitude, spam, nonsense, whatever, we can check the logs to get their IP address, say, 123.456.789.000
. With that info, we can add the following directives to .htaccess:
# Block via IP Address
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000
RewriteRule (.*) - [F,L]
</IfModule>
That’s all there is to it. Note that we are escaping the dots with a backslash \
. This tells Apache to treat the dots as literal instead of as a wildcard, which is the default for an unescaped dot. Escaping the dots ensures that we’re only blocking the specified IP address, so there will be no false positives.
To block more than one IP, do this:
# Block via IP Address
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000 [OR]
RewriteCond %{REMOTE_ADDR} ^111\.222\.333\.000 [OR]
RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.000
RewriteRule (.*) - [F,L]
</IfModule>
As in previous mod_rewrite
techniques, the last RewriteCond
should NOT include the [OR]
flag. Other than that, everything is straightforward.
To block a range of IPs, we can simply omit the last octet (or whichever octets are required for the range):
# Block via IP Address
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\. [OR]
RewriteCond %{REMOTE_ADDR} ^111\.222\. [OR]
RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.
RewriteRule (.*) - [F,L]
</IfModule>
Here we are blocking:
- All IPs that begin with
123.
- All IPs that begin with
111.222.
- All IPs that begin with
444.555.777
Alternate method for blocking IPs
Here is an alternate method for blocking IPs (Apache 2.4 and better):
# Block via IP Address
<RequireAll>
Require all granted
Require not ip 123.456.789.000
Require not ip 111.222.333.000
Require not ip 444.555.777.000
</RequireAll>
Or, for older versions of Apache (less than or equal to version 2.4):
# Block via IP Address
<Files ~ ".*\..*">
Order Deny,Allow
Allow from all
Deny from 123.456.789.000
Deny from 111.222.333.000
Deny from 444.555.777.000
</Files>
It’s important to note here that we are NOT escaping the dots in the IPs. To block a range of IPs using this method, simply omit the final octet from the IP, as in previous examples. You can also use these techniques for blocking based on Host instead of IP. For more information, check out the official Apache documentation.
Going further
Hopefully this article has helped you to better protect your site against bad bots and other malicious activity. To go further with securing your site, check out some of my other tutorials:
- WP Security Video Course
- PHP Spider Trap
- .htaccess Hotlink Protection
- .htaccess made easy
- 6G Firewall
As always, questions and comments welcome :)