Block Spam by Denying Access to No-Referrer Requests

by Jeff Starr on Monday, November 20, 2006 35 Responses

What we have here is an excellent method for preventing a great deal of blog spam. With a few strategic lines placed in your htaccess file, you can prevent spambots from dropping spam bombs by denying access to all requests that do not originate from your domain.

How does it work? Well, when a legitimate user (i.e., not a robot, etc.) decides to leave a comment on your blog, they have (hopefully) read the article for which they wish to leave a comment, and have subsequently loaded your blog’s comment template (e.g., comments.php), which is most likely located within the same domain as the article, blog, etc. (i.e., your domain).

So, after filling out the comment form via comments.php, the user clicks the "submit" button, which then initiates the PHP file/script that actually processes the comment for the world to see. For WordPress users, the comment processing file is wp-comments-post.php.

Therefore, the HTTP referrer for all legitimate (user-initiated) comments will be your domain (or the domain in which the comments.php file is located). Automated spam robots typically target the comment-processing script directly, bypassing your comments.php form altogether. Such activity results in HTTP referrers that are not from your domain.

Thus, by blocking all requests for the comments-processing script (wp-comments-post.php) that are not sent directly from your domain (comments.php), you immediately eliminate a large portion of blog spam.

Sound good? Here is the script to add to your site’s .htaccess file:

# block comment spam by denying access to no-referrer requests
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.*perishablepress.com.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ ^http://the-site-where-you-want-to-send-spammers.com/$ [R=301,L]

Please note that you need to edit the following lines according to your specific setup:

.wp-comments-post\.php*
This is the default comment-processing script for WordPress users. If you are not running WordPress, you will need to determine the corresponding file and enter its name here.
!.*perishablepress.com.*
Change this value to that of your own domain.
^http://the-site-where-you-want-to-send-spammers.com/$
Because spambots typically ignore redirects, this may not be accomplishing too much. But go ahead and enter the URL of your least-favorite website anyway. Another option here is to simply bounce the spambot back to where it came from by replacing the last with this: RewriteRule ^(.*)$ ^http://%{REMOTE_ADDR}/$ [R=301,L]

And that is all there is to it! Bye bye spambots!


35 Responses

Add a comment

[ Gravatar Icon ]

Michael#1

WordPress Trackback Spam!!!
I have installed plugins that prevent comment spams, but this won't prevent trackback to be blocked. I've been spam by many
MFA websites that most probably is from the same network with trackback, but they are not linking me on their website. May I
know how do they do it and how do I stop it? Without disabling trackback?
Thanks, and I'm using WordPress.

[ Gravatar Icon ]

Perishable#2

Hmmm… good question. I will look into it..

[ Gravatar Icon ]

Lee#3

Shouldn’t the last line be changed to:

RewriteRule ^(.*)$ http://the-site-where-you-want-to-send-spammers.com/ [R=301,L]

I am using it as you have it and am getting getting 404 errors like this:

http://shamar.org/%sitegoto.com/$

[ Gravatar Icon ]

Perishable#4

Lee,
If that works for you, great. Often, there are multiple ways of writing htaccess expressions. For example, here is the last line of the same htaccess code currently presented on the WordPress Codex:

RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]

Further, here is the corresponding line we are currently using at Perishable Press:

RewriteRule ^(.*)$ ^http://www.google.com/$ [R=301,L]

..which has been working fine for quite a while.

Also, an absence of errors doesn’t necessarily translate into proper functionality. You should throw down with some tuf log action:

RewriteEngine On
RewriteLog /absolute/path/to/your/wwwroot/public_html/rewrite_log.txt
RewriteLogLevel 2

..to ensure that your syntax actually produces the desired results (i.e., blocking spambots, etc.). Either way, thanks for the information concerning your specific issue — it may prove beneficial to others experiencing the same type of error.

Cheers!

[ Gravatar Icon ]

danielle#5

oh nothing just wanted to feel special!!!!!!!!!!!

[ Gravatar Icon ]

Perishable#6

Your specialness is obvious, danielle ;)

[ Gravatar Icon ]

Jenny#7

I’ve thought of using this method before but I was too lazy to form up a proper code. Thank you Perishable…of course not forgetting Shoemoney :)

[ Gravatar Icon ]

Perishable#8

My pleasure, Jenny — thank you for the feedback :)

[ Gravatar Icon ]

Rick Beckman#9

I’m using this code too, but looking up the IPs of spammers caught by Akismet and cross-referencing those same IPs with my Apache logs, I’m seeing that the spammers are actually loading the posts and submitting via the actual form.

And by doing so, they’ve circumvented the protection you share above, as well as the one I implemented (renaming /wp-comments-post.php to something custom, editing my theme’s /comments.php file appropriately).

Spam sucks.

Oh, just curious as to why users with empty user-agents are blocked from commenting in the above rewrite?

[ Gravatar Icon ]

a name#10

I put in the above code in my .htaccess and got a 500. After a few tries and changes, I decided to add this into my wp-comments-post.php. Is there any reason I shouldn’t have this (other than having to add it every time I upgrade WP)?

if (strPos($_SERVER['HTTP_REFERER'],'yourdomain.com')===FALSE) exit;

Thanks.

[ Gravatar Icon ]

Perishable#11

@Rick: Yes indeed, spam sucks — it’s like a perpetual cyberspace battle: spammers attack, bloggers defend themselves, spammers defeat the defenses and attack some more.. ad nauseam. As to the secret purpose of blocking empty user agents, I will never tell!

@a name: Beyond the pain of perpetual updates, I see no reason why such code would cause any issues — in fact, it seems like an excellent alternative to the htaccess method. Thanks for sharing :)

[ Gravatar Icon ]

Jim#12

Hi

Thanks for your list, it’s been on my favourites for years. I’m trying to use the above script to kill spam on our contact forms, however, not being the htaccess guru you are, I’m having trouble redoing the urls to the form handlers in subdirectories….any tips?

[ Gravatar Icon ]

(beausoleilm) Mathieu Beausoleil#13

What about proxy ? I know that some proxy server will erase referrer header. Do you know if that solution will block visitors ? Is that better to stock a referrer address in session or use an otherway like an empty input text (display none) and verify that the input still empty before using the data ?

[ Gravatar Icon ]

Perishable#14

Yes, that would be one way to do it. If you are allowing visitors to comment via proxy, you may want to test the method before implementation. It is really a double-edged sword, dealing with no-referrer requests: it is nearly impossible to avoid false positives and false negatives. Frankly, I have been contemplating removing the method described in this article. If so, it will be done as a test and I will report the results in a future post.

[ Gravatar Icon ]

Web Designer#15

I was using manual posting technique, but I am not able to post comment in any site.

May be my URL “gigaturn” has been listed in block-list by wp-comments-post.

Any solution for this problem would be appreciated.

Thanks in advance!
jitu78@gmail.com

[ Gravatar Icon ]

Jeff Starr#16

Hi Web Designer, I have never heard of the wp-comments-post.php file having any inherent blacklisting capabilities, but I have not investigated the file in newer versions of WordPress, so it may be the case. Another thing you could check is whether or not your “gigaturn” URL has been flagged as spam within the Akismet database. If you go to the site or Google for something like “remove site from Akismet” (or similar), you should find all the information needed to investigate and possibly remedy the issue. Good luck!

[ Gravatar Icon ]

Web Designer#17

Thanks Jeff,
But you can try it by yourself.
just try it with gigaturn.com
you will not able to post.

still looking for right solution.

[ Gravatar Icon ]

Web Designer#18

Tried again with site URL and got this,

http://perishablepress.com/press/wp-comments-post.php

Not able to post comment but this URL (gigaturn.com), however I can post with other URLs.

[ Gravatar Icon ]

Jeff Starr#19

Perhaps I am confused as to what you are trying to do. Are you trying to post comments at gigaturn.com? Or are you trying to post comments on other sites using gigaturn.com as the commentator link? Or something else..? I guess I need more information as to what’s going on and where..

[ Gravatar Icon ]

balisugar#20

Hi, sorry to botter you, I need help.

I think I have a few pages with strange url, that i can see from my wassUp stats. That xxx is a porn site. And Google crawls it all the time. I never link to them in the first place. Please help. How to remove and block it because it’s not only one page.
eg :
/page/92/?ref=www.xxx.com-www.xxx.com-www.xxx.com-www.xxx.com

I’m very sad, I don;t know much about this :cry:

[ Gravatar Icon ]

Jeff Starr#21

@balisugar: don’t cry! You should be able to deny requests for that specific query string by adding the following directives to your root htaccess file:

# BLOCK XXX.COM QUERY STRINGS
<ifmodule mod_rewrite.c>
   RewriteCond %{QUERY_STRING} xxx\.com [NC,OR]
   RewriteRule .* - [F,L]
</ifmodule>

Once in place, this code should block any query-string requests containing the character string “xxx.com”.

For more information on this technique, check out my article, Improving Site Security by Preventing Malicious Query-String Exploits.

[ Gravatar Icon ]

balisugar#22

Thank you for your help. I will link to you so I don’t forget your site.

[ Gravatar Icon ]

Jeff Starr#23

@balisugar: My pleasure — happy to help! :)

[ Gravatar Icon ]

balisugar#24

Hi, Mr Jeff. How are you? :smile:

After what happened to me, I’m still sometimes worried that someone is redirecting bad content to my site. Is that possible and if so, how can I stop them? And which is the better way to block bad bots - .htaccess or robots.text?

I feel more “comfortable” modifying robots.text rather than .htaccess.
Thank you for all your help.

[ Gravatar Icon ]

Jeff Starr#25

@balisugar: htaccess and robots.txt perform two different functions. htaccess is responsible for per-directory configuration of various Apache server directives (such as rewriting, redirecting, and many others), while robots.txt directives merely instruct robots (such as Googlebot and Slurp) on how to go about crawling your site (which URLs to ignore, sitemap location, et al). Unfortunately, there are precious few robots that actually obey the directives specified in the robots.txt files, while they really have no choice but to follow the rules specified via htaccess files.

So, to answer your question, if you need to block a specific URL, referrer, or request, it is best to handle it via htaccess — robots.txt is the wrong tool for the job.

[ Gravatar Icon ]

Json#26

I am new to .htaccess and have to ask…
Q1: Can I use this for any page that is posting data?
Q2: If Q1 is YES, my page is one folder deep, ie:comments/comment-page.php
Do I do this:
RewriteCond %{REQUEST_URI} .comments/page.php\.php*
Or this:
RewriteCond %{REQUEST_URI} .comments\page.php\.php*
Or this:
RewriteCond %{REQUEST_URI} .http://www/example.com/comments/page.php\.php*
Or this:
RewriteCond %{REQUEST_URI} ./var/htdocs/web/comments/page.php\.php*

Any help would be great.
Cheers

[ Gravatar Icon ]

Json#27

@Web Designer: I got the same thing twice. I can only say IE7 is a serious problem, IE8 needs to be refreshed every time, MS is terrible… these aren’t even web browsers.

Cheers

[ Gravatar Icon ]

Jeff Starr#28

@Json: Yes you can use the REQUEST_URI variable to match just about anything in the request string, including individual files. The targeted string is a regular expression and will match any instance of itself within the requested URI. Something like this should work great:

RewriteCond %{REQUEST_URI} page-you-want-blocked\.php

To verify that it works, try loading the page directly in a browser. For more information on blocking with the REQUEST_URI variable, check out the fifth method in this article.

[ Gravatar Icon ]

Peekay#29

My problem is similar to the one highlighted by Rick Beckman. Our spammer is submitting a GET request for the contact form on my site (with no referer) followed a few seconds later by a POST request. The POST request correctly identifies my website as the referer.

The only solution I can think of is to also block GET requests for the form if the referrer is blank, however, I’m a bit new to this so wondered if that was a bad idea?.

I figure it would block anyone who has bookmarked the contact form page directly, but I could redirect them back to the homepage where they can follow the page link as intended.

[ Gravatar Icon ]

TopGearStreaming#30

You’re not the only one with stuff like this :)

[ Gravatar Icon ]

Jeff Starr#31

@Peekay: that would certainly work, unless you think there are quite a few folks bookmarking your site’s contact page.. if so, I think a better option would be to implement some sort of simple captcha system to stop the automated junk.

[ Gravatar Icon ]

The Traveler Guy#32

Great tips, I am not a big programmer, but this kind of thing I think I understand and will implement it as soon as our new posting system will be functioning.

In the past we got a huge swarm of spam and had a hard time dealing with it!

Trackbacks / Pingbacks
  1. Silent Lucidity » links for 2007-02-13
  2. Put Together Quickly » Optimizing performance for WordPress
  3. Recent Links: January 26 to January 29 at Alex Jones
Share your thoughts..

Read Comment Policy

Comment Rules: No spam. No profanity. Use your real name. You may use simple HTML tags for style. Wrap all code in <code> tags. Learn more.



Attention: Do NOT follow this link!