Block Spam by Denying Access to No-Referrer Requests
by Jeff Starr on Monday, November 20, 2006 – 35 Responses
What we have here is an excellent method for preventing a great deal of blog spam. With a few strategic lines placed in your htaccess file, you can prevent spambots from dropping spam bombs by denying access to all requests that do not originate from your domain.
How does it work? Well, when a legitimate user (i.e., not a robot, etc.) decides to leave a comment on your blog, they have (hopefully) read the article for which they wish to leave a comment, and have subsequently loaded your blog’s comment template (e.g., comments.php), which is most likely located within the same domain as the article, blog, etc. (i.e., your domain).
So, after filling out the comment form via comments.php, the user clicks the "submit" button, which then initiates the PHP file/script that actually processes the comment for the world to see. For WordPress users, the comment processing file is wp-comments-post.php.
Therefore, the HTTP referrer for all legitimate (user-initiated) comments will be your domain (or the domain in which the comments.php file is located). Automated spam robots typically target the comment-processing script directly, bypassing your comments.php form altogether. Such activity results in HTTP referrers that are not from your domain.
Thus, by blocking all requests for the comments-processing script (wp-comments-post.php) that are not sent directly from your domain (comments.php), you immediately eliminate a large portion of blog spam.
Sound good? Here is the script to add to your site’s .htaccess file:
# block comment spam by denying access to no-referrer requests
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.*perishablepress.com.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ ^http://the-site-where-you-want-to-send-spammers.com/$ [R=301,L]
Please note that you need to edit the following lines according to your specific setup:
.wp-comments-post\.php*- This is the default comment-processing script for WordPress users. If you are not running WordPress, you will need to determine the corresponding file and enter its name here.
!.*perishablepress.com.*- Change this value to that of your own domain.
^http://the-site-where-you-want-to-send-spammers.com/$- Because spambots typically ignore redirects, this may not be accomplishing too much. But go ahead and enter the URL of your least-favorite website anyway. Another option here is to simply bounce the spambot back to where it came from by replacing the last with this:
RewriteRule ^(.*)$ ^http://%{REMOTE_ADDR}/$ [R=301,L]
And that is all there is to it! Bye bye spambots!
Focused on clean code and quality content, Perishable Press is the online home of Jeff Starr, author, artist, designer, developer, and all-around swell guy. 





35 Responses
Add a comment
Michael – #1
WordPress Trackback Spam!!!
I have installed plugins that prevent comment spams, but this won't prevent trackback to be blocked. I've been spam by many
MFA websites that most probably is from the same network with trackback, but they are not linking me on their website. May I
know how do they do it and how do I stop it? Without disabling trackback?
Thanks, and I'm using WordPress.
Perishable – #2
Hmmm… good question. I will look into it..
Lee – #3
Shouldn’t the last line be changed to:
RewriteRule ^(.*)$ http://the-site-where-you-want-to-send-spammers.com/ [R=301,L]I am using it as you have it and am getting getting 404 errors like this:
http://shamar.org/%sitegoto.com/$Perishable – #4
Lee,
If that works for you, great. Often, there are multiple ways of writing htaccess expressions. For example, here is the last line of the same htaccess code currently presented on the WordPress Codex:
RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]Further, here is the corresponding line we are currently using at Perishable Press:
RewriteRule ^(.*)$ ^http://www.google.com/$ [R=301,L]..which has been working fine for quite a while.
Also, an absence of errors doesn’t necessarily translate into proper functionality. You should throw down with some tuf log action:
RewriteEngine OnRewriteLog /absolute/path/to/your/wwwroot/public_html/rewrite_log.txtRewriteLogLevel 2..to ensure that your syntax actually produces the desired results (i.e., blocking spambots, etc.). Either way, thanks for the information concerning your specific issue — it may prove beneficial to others experiencing the same type of error.
Cheers!
danielle – #5
oh nothing just wanted to feel special!!!!!!!!!!!
Perishable – #6
Your specialness is obvious, danielle ;)
Jenny – #7
I’ve thought of using this method before but I was too lazy to form up a proper code. Thank you Perishable…of course not forgetting Shoemoney :)
Perishable – #8
My pleasure, Jenny — thank you for the feedback :)
Rick Beckman – #9
I’m using this code too, but looking up the IPs of spammers caught by Akismet and cross-referencing those same IPs with my Apache logs, I’m seeing that the spammers are actually loading the posts and submitting via the actual form.
And by doing so, they’ve circumvented the protection you share above, as well as the one I implemented (renaming
/wp-comments-post.phpto something custom, editing my theme’s/comments.phpfile appropriately).Spam sucks.
Oh, just curious as to why users with empty user-agents are blocked from commenting in the above rewrite?
a name – #10
I put in the above code in my .htaccess and got a 500. After a few tries and changes, I decided to add this into my
wp-comments-post.php. Is there any reason I shouldn’t have this (other than having to add it every time I upgrade WP)?if (strPos($_SERVER['HTTP_REFERER'],'yourdomain.com')===FALSE) exit;Thanks.
Perishable – #11
@Rick: Yes indeed, spam sucks — it’s like a perpetual cyberspace battle: spammers attack, bloggers defend themselves, spammers defeat the defenses and attack some more.. ad nauseam. As to the secret purpose of blocking empty user agents, I will never tell!
@a name: Beyond the pain of perpetual updates, I see no reason why such code would cause any issues — in fact, it seems like an excellent alternative to the htaccess method. Thanks for sharing :)
Jim – #12
Hi
Thanks for your list, it’s been on my favourites for years. I’m trying to use the above script to kill spam on our contact forms, however, not being the htaccess guru you are, I’m having trouble redoing the urls to the form handlers in subdirectories….any tips?
(beausoleilm) Mathieu Beausoleil – #13
What about proxy ? I know that some proxy server will erase referrer header. Do you know if that solution will block visitors ? Is that better to stock a referrer address in session or use an otherway like an empty input text (display none) and verify that the input still empty before using the data ?
Perishable – #14
Yes, that would be one way to do it. If you are allowing visitors to comment via proxy, you may want to test the method before implementation. It is really a double-edged sword, dealing with no-referrer requests: it is nearly impossible to avoid false positives and false negatives. Frankly, I have been contemplating removing the method described in this article. If so, it will be done as a test and I will report the results in a future post.
Web Designer – #15
I was using manual posting technique, but I am not able to post comment in any site.
May be my URL “gigaturn” has been listed in block-list by wp-comments-post.
Any solution for this problem would be appreciated.
Thanks in advance!
jitu78@gmail.com
Jeff Starr – #16
Hi Web Designer, I have never heard of the
wp-comments-post.phpfile having any inherent blacklisting capabilities, but I have not investigated the file in newer versions of WordPress, so it may be the case. Another thing you could check is whether or not your “gigaturn” URL has been flagged as spam within the Akismet database. If you go to the site or Google for something like “remove site from Akismet” (or similar), you should find all the information needed to investigate and possibly remedy the issue. Good luck!Web Designer – #17
Thanks Jeff,
But you can try it by yourself.
just try it with gigaturn.com
you will not able to post.
still looking for right solution.
Web Designer – #18
Tried again with site URL and got this,
http://perishablepress.com/press/wp-comments-post.php
Not able to post comment but this URL (gigaturn.com), however I can post with other URLs.
Jeff Starr – #19
Perhaps I am confused as to what you are trying to do. Are you trying to post comments at gigaturn.com? Or are you trying to post comments on other sites using gigaturn.com as the commentator link? Or something else..? I guess I need more information as to what’s going on and where..
balisugar – #20
Hi, sorry to botter you, I need help.
I think I have a few pages with strange url, that i can see from my wassUp stats. That xxx is a porn site. And Google crawls it all the time. I never link to them in the first place. Please help. How to remove and block it because it’s not only one page.
eg :
/page/92/?ref=www.xxx.com-www.xxx.com-www.xxx.com-www.xxx.comI’m very sad, I don;t know much about this :cry:
Jeff Starr – #21
@balisugar: don’t cry! You should be able to deny requests for that specific query string by adding the following directives to your root htaccess file:
# BLOCK XXX.COM QUERY STRINGS<ifmodule mod_rewrite.c>RewriteCond %{QUERY_STRING} xxx\.com [NC,OR]RewriteRule .* - [F,L]</ifmodule>Once in place, this code should block any query-string requests containing the character string “
xxx.com”.For more information on this technique, check out my article, Improving Site Security by Preventing Malicious Query-String Exploits.
balisugar – #22
Thank you for your help. I will link to you so I don’t forget your site.
Jeff Starr – #23
@balisugar: My pleasure — happy to help! :)
balisugar – #24
Hi, Mr Jeff. How are you? :smile:
After what happened to me, I’m still sometimes worried that someone is redirecting bad content to my site. Is that possible and if so, how can I stop them? And which is the better way to block bad bots - .htaccess or robots.text?
I feel more “comfortable” modifying robots.text rather than .htaccess.
Thank you for all your help.
Jeff Starr – #25
@balisugar: htaccess and robots.txt perform two different functions. htaccess is responsible for per-directory configuration of various Apache server directives (such as rewriting, redirecting, and many others), while robots.txt directives merely instruct robots (such as Googlebot and Slurp) on how to go about crawling your site (which URLs to ignore, sitemap location, et al). Unfortunately, there are precious few robots that actually obey the directives specified in the robots.txt files, while they really have no choice but to follow the rules specified via htaccess files.
So, to answer your question, if you need to block a specific URL, referrer, or request, it is best to handle it via htaccess — robots.txt is the wrong tool for the job.
Json – #26
I am new to .htaccess and have to ask…
Q1: Can I use this for any page that is posting data?
Q2: If Q1 is YES, my page is one folder deep, ie:comments/comment-page.php
Do I do this:
RewriteCond %{REQUEST_URI} .comments/page.php\.php*Or this:
RewriteCond %{REQUEST_URI} .comments\page.php\.php*Or this:
RewriteCond %{REQUEST_URI} .http://www/example.com/comments/page.php\.php*Or this:
RewriteCond %{REQUEST_URI} ./var/htdocs/web/comments/page.php\.php*Any help would be great.
Cheers
Json – #27
@Web Designer: I got the same thing twice. I can only say IE7 is a serious problem, IE8 needs to be refreshed every time, MS is terrible… these aren’t even web browsers.
Cheers
Jeff Starr – #28
@Json: Yes you can use the
REQUEST_URIvariable to match just about anything in the request string, including individual files. The targeted string is a regular expression and will match any instance of itself within the requested URI. Something like this should work great:RewriteCond %{REQUEST_URI} page-you-want-blocked\.phpTo verify that it works, try loading the page directly in a browser. For more information on blocking with the
REQUEST_URIvariable, check out the fifth method in this article.Peekay – #29
My problem is similar to the one highlighted by Rick Beckman. Our spammer is submitting a GET request for the contact form on my site (with no referer) followed a few seconds later by a POST request. The POST request correctly identifies my website as the referer.
The only solution I can think of is to also block GET requests for the form if the referrer is blank, however, I’m a bit new to this so wondered if that was a bad idea?.
I figure it would block anyone who has bookmarked the contact form page directly, but I could redirect them back to the homepage where they can follow the page link as intended.
TopGearStreaming – #30
You’re not the only one with stuff like this :)
Jeff Starr – #31
@Peekay: that would certainly work, unless you think there are quite a few folks bookmarking your site’s contact page.. if so, I think a better option would be to implement some sort of simple captcha system to stop the automated junk.
The Traveler Guy – #32
Great tips, I am not a big programmer, but this kind of thing I think I understand and will implement it as soon as our new posting system will be functioning.
In the past we got a huge swarm of spam and had a hard time dealing with it!
Trackbacks / Pingbacks