Block Spam by Denying Access to No-Referrer Requests

Post #243 categorized as Function, Websites, WordPress, last updated on Apr 6, 2008
Tagged with apache, htaccess, mod_rewrite, security, spam, WordPress

Credit for this trick goes to shoemoney.com. What we have here is an excellent method for preventing a great deal of blog spam. With a few strategic lines placed in your htaccess file, you can prevent spambots from dropping spam bombs by denying access to all requests that do not originate from your domain.

How does it work? Well, when a legitimate user (i.e., not a robot, etc.) decides to leave a comment on your blog, they have (hopefully) read the article for which they wish to leave a comment, and have subsequently loaded your blog’s comment template (e.g., comments.php), which is most likely located within the same domain as the article, blog, etc. (i.e., your domain).

So, after filling out the comment form via comments.php, the user clicks the "submit" button, which then initiates the PHP file/script that actually processes the comment for the world to see. For WordPress users, the comment processing file is wp-comments-post.php.

Therefore, the HTTP referrer for all legitimate (user-initiated) comments will be your domain (or the domain in which the comments.php file is located). Automated spam robots typically target the comment-processing script directly, bypassing your comments.php form altogether. Such activity results in HTTP referrers that are not from your domain.

Thus, by blocking all requests for the comments-processing script (wp-comments-post.php) that are not sent directly from your domain (comments.php), you immediately eliminate a large portion of blog spam.

Sound good? Here is the script to add to your site’s .htaccess file:

# block comment spam by denying access to no-referrer requests
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.*perishablepress.com.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ ^http://the-site-where-you-want-to-send-spammers.com/$ [R=301,L]

Please note that you need to edit the following lines according to your specific setup:

.wp-comments-post\.php*
This is the default comment-processing script for WordPress users. If you are not running WordPress, you will need to determine the corresponding file and enter its name here.
!.*perishablepress.com.*
Change this value to that of your own domain.
^http://the-site-where-you-want-to-send-spammers.com/$
Because spambots typically ignore redirects, this may not be accomplishing too much. But go ahead and enter the URL of your least-favorite website anyway. Another option here is to simply bounce the spambot back to where it came from by replacing the last with this: RewriteRule ^(.*)$ ^http://%{REMOTE_ADDR}/$ [R=301,L]

And that is all there is to it! Bye bye spambots!

Subscribe to Perishable Press


31 Responses

TopLeave a comment

[ Gravatar Icon ]

#1Michael

WordPress Trackback Spam!!!
I have installed plugins that prevent comment spams, but this won't prevent trackback to be blocked. I've been spam by many
MFA websites that most probably is from the same network with trackback, but they are not linking me on their website. May I
know how do they do it and how do I stop it? Without disabling trackback?
Thanks, and I'm using WordPress.

[ Gravatar Icon ]

#2Perishable

Hmmm… good question. I will look into it..

[ Gravatar Icon ]

#3Lee

Shouldn’t the last line be changed to:

RewriteRule ^(.*)$ http://the-site-where-you-want-to-send-spammers.com/ [R=301,L]

I am using it as you have it and am getting getting 404 errors like this:

http://shamar.org/%sitegoto.com/$

[ Gravatar Icon ]

#4Perishable

Lee,
If that works for you, great. Often, there are multiple ways of writing htaccess expressions. For example, here is the last line of the same htaccess code currently presented on the WordPress Codex:

RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]

Further, here is the corresponding line we are currently using at Perishable Press:

RewriteRule ^(.*)$ ^http://www.google.com/$ [R=301,L]

..which has been working fine for quite a while.

Also, an absence of errors doesn’t necessarily translate into proper functionality. You should throw down with some tuf log action:

RewriteEngine On
RewriteLog /absolute/path/to/your/wwwroot/public_html/rewrite_log.txt
RewriteLogLevel 2

..to ensure that your syntax actually produces the desired results (i.e., blocking spambots, etc.). Either way, thanks for the information concerning your specific issue — it may prove beneficial to others experiencing the same type of error.

Cheers!

[ Gravatar Icon ]

#5danielle

oh nothing just wanted to feel special!!!!!!!!!!!

[ Gravatar Icon ]

#6Perishable

Your specialness is obvious, danielle ;)

[ Gravatar Icon ]

#7Jenny

I’ve thought of using this method before but I was too lazy to form up a proper code. Thank you Perishable…of course not forgetting Shoemoney :)

[ Gravatar Icon ]

#8Perishable

My pleasure, Jenny — thank you for the feedback :)

[ Gravatar Icon ]

#9Rick Beckman

I’m using this code too, but looking up the IPs of spammers caught by Akismet and cross-referencing those same IPs with my Apache logs, I’m seeing that the spammers are actually loading the posts and submitting via the actual form.

And by doing so, they’ve circumvented the protection you share above, as well as the one I implemented (renaming /wp-comments-post.php to something custom, editing my theme’s /comments.php file appropriately).

Spam sucks.

Oh, just curious as to why users with empty user-agents are blocked from commenting in the above rewrite?

[ Gravatar Icon ]

#10a name

I put in the above code in my .htaccess and got a 500. After a few tries and changes, I decided to add this into my wp-comments-post.php. Is there any reason I shouldn’t have this (other than having to add it every time I upgrade WP)?

if (strPos($_SERVER['HTTP_REFERER'],'yourdomain.com')===FALSE) exit;

Thanks.

[ Gravatar Icon ]

#11Perishable

@Rick: Yes indeed, spam sucks — it’s like a perpetual cyberspace battle: spammers attack, bloggers defend themselves, spammers defeat the defenses and attack some more.. ad nauseam. As to the secret purpose of blocking empty user agents, I will never tell!

@a name: Beyond the pain of perpetual updates, I see no reason why such code would cause any issues — in fact, it seems like an excellent alternative to the htaccess method. Thanks for sharing :)

[ Gravatar Icon ]

#12Jim

Hi

Thanks for your list, it’s been on my favourites for years. I’m trying to use the above script to kill spam on our contact forms, however, not being the htaccess guru you are, I’m having trouble redoing the urls to the form handlers in subdirectories….any tips?

[ Gravatar Icon ]

#13(beausoleilm) Mathieu Beausoleil

What about proxy ? I know that some proxy server will erase referrer header. Do you know if that solution will block visitors ? Is that better to stock a referrer address in session or use an otherway like an empty input text (display none) and verify that the input still empty before using the data ?

[ Gravatar Icon ]

#14Perishable

Yes, that would be one way to do it. If you are allowing visitors to comment via proxy, you may want to test the method before implementation. It is really a double-edged sword, dealing with no-referrer requests: it is nearly impossible to avoid false positives and false negatives. Frankly, I have been contemplating removing the method described in this article. If so, it will be done as a test and I will report the results in a future post.

[ Gravatar Icon ]

#15Web Designer

I was using manual posting technique, but I am not able to post comment in any site.

May be my URL “gigaturn” has been listed in block-list by wp-comments-post.

Any solution for this problem would be appreciated.

Thanks in advance!
jitu78@gmail.com

[ Gravatar Icon ]

#16Jeff Starr

Hi Web Designer, I have never heard of the wp-comments-post.php file having any inherent blacklisting capabilities, but I have not investigated the file in newer versions of WordPress, so it may be the case. Another thing you could check is whether or not your “gigaturn” URL has been flagged as spam within the Akismet database. If you go to the site or Google for something like “remove site from Akismet” (or similar), you should find all the information needed to investigate and possibly remedy the issue. Good luck!

[ Gravatar Icon ]

#17Web Designer

Thanks Jeff,
But you can try it by yourself.
just try it with gigaturn.com
you will not able to post.

still looking for right solution.

[ Gravatar Icon ]

#18Web Designer

Tried again with site URL and got this,

http://perishablepress.com/press/wp-comments-post.php

Not able to post comment but this URL (gigaturn.com), however I can post with other URLs.

[ Gravatar Icon ]

#19Jeff Starr

Perhaps I am confused as to what you are trying to do. Are you trying to post comments at gigaturn.com? Or are you trying to post comments on other sites using gigaturn.com as the commentator link? Or something else..? I guess I need more information as to what’s going on and where..

[ Gravatar Icon ]

#20balisugar

Hi, sorry to botter you, I need help.

I think I have a few pages with strange url, that i can see from my wassUp stats. That xxx is a porn site. And Google crawls it all the time. I never link to them in the first place. Please help. How to remove and block it because it’s not only one page.
eg :
/page/92/?ref=www.xxx.com-www.xxx.com-www.xxx.com-www.xxx.com

I’m very sad, I don;t know much about this :cry:

[ Gravatar Icon ]

#21Jeff Starr

@balisugar: don’t cry! You should be able to deny requests for that specific query string by adding the following directives to your root htaccess file:

# BLOCK XXX.COM QUERY STRINGS
<ifmodule mod_rewrite.c>
   RewriteCond %{QUERY_STRING} xxx\.com [NC,OR]
   RewriteRule .* - [F,L]
</ifmodule>

Once in place, this code should block any query-string requests containing the character string “xxx.com”.

For more information on this technique, check out my article, Improving Site Security by Preventing Malicious Query-String Exploits.

[ Gravatar Icon ]

#22balisugar

Thank you for your help. I will link to you so I don’t forget your site.

[ Gravatar Icon ]

#23Jeff Starr

@balisugar: My pleasure — happy to help! :)

[ Gravatar Icon ]

#24balisugar

Hi, Mr Jeff. How are you? :smile:

After what happened to me, I’m still sometimes worried that someone is redirecting bad content to my site. Is that possible and if so, how can I stop them? And which is the better way to block bad bots - .htaccess or robots.text?

I feel more “comfortable” modifying robots.text rather than .htaccess.
Thank you for all your help.

[ Gravatar Icon ]

#25Jeff Starr

@balisugar: htaccess and robots.txt perform two different functions. htaccess is responsible for per-directory configuration of various Apache server directives (such as rewriting, redirecting, and many others), while robots.txt directives merely instruct robots (such as Googlebot and Slurp) on how to go about crawling your site (which URLs to ignore, sitemap location, et al). Unfortunately, there are precious few robots that actually obey the directives specified in the robots.txt files, while they really have no choice but to follow the rules specified via htaccess files.

So, to answer your question, if you need to block a specific URL, referrer, or request, it is best to handle it via htaccess — robots.txt is the wrong tool for the job.

[ Gravatar Icon ]

#26Json

I am new to .htaccess and have to ask…
Q1: Can I use this for any page that is posting data?
Q2: If Q1 is YES, my page is one folder deep, ie:comments/comment-page.php
Do I do this:
RewriteCond %{REQUEST_URI} .comments/page.php\.php*
Or this:
RewriteCond %{REQUEST_URI} .comments\page.php\.php*
Or this:
RewriteCond %{REQUEST_URI} .http://www/example.com/comments/page.php\.php*
Or this:
RewriteCond %{REQUEST_URI} ./var/htdocs/web/comments/page.php\.php*

Any help would be great.
Cheers

[ Gravatar Icon ]

#27Json

@Web Designer: I got the same thing twice. I can only say IE7 is a serious problem, IE8 needs to be refreshed every time, MS is terrible… these aren’t even web browsers.

Cheers

[ Gravatar Icon ]

#28Jeff Starr

@Json: Yes you can use the REQUEST_URI variable to match just about anything in the request string, including individual files. The targeted string is a regular expression and will match any instance of itself within the requested URI. Something like this should work great:

RewriteCond %{REQUEST_URI} page-you-want-blocked\.php

To verify that it works, try loading the page directly in a browser. For more information on blocking with the REQUEST_URI variable, check out the fifth method in this article.

Share your thoughts..

TopRead official comment policy

The rules are simple. Comment intelligently. Stay on-topic. Don’t spam! Suspected spam will be deleted. Use your real name or nickname, not a site name or business name. Using a site name or business name is a good way to get your link or comment removed. Certain comments are moderated; if your comment does not appear after several days, or if you wish to comment privately, contact me. Also, by posting a comment, you grant this site a perpetual license to reproduce your comment, name, and website URL. Lastly, you may use basic HTML markup, but please do not use <pre> tags. Instead, wrap your code with <code> tags. Use a new set of <code> tags for each code term or phrase, as well as for each individual line of code (i.e., multiple lines of code require multiple code tags). Please see the complete comment policy for more information.