[Techtalk] Re: Blog spam

Daniel Richter drichter at essi.fr
Wed Oct 13 12:10:11 EST 2004


 > Recently, my blog has begun getting inundated with
 > comment spam.
<snip>
 > I know I _don't_ want to set up one of those "enter the
 > characters in the below image' checkpoints, as I know that
 > it is not compatible with everyone (e.g. the handicapped).

Good for you for not using them!

The W3C discusses this problem here:
    http://www.w3.org/TR/turingtest/

The W3C acknowledges that it doesn't have a perfect solution to this 
problem. However, one of the suggestions is a simple question with a 
free-form answer.

The W3C's discussion notes that "answers may need to be handled 
flexibly, if they require free-form text. A system would have to 
maintain a vast number of questions, or shift them around 
programmatically, in order to keep spiders from capturing them all."

(The W3C also notes that the technique might cause problems for mentally 
disabled people, but I think we can make the questions simple enough to 
avoid that problem.)

So here's my suggestion: on the form where the user posts a response, 
ask him a simple question. (Maintain a list of four or five questions 
that are chosen at random, and change them every month or so.) When he 
answers a question correctly, give him a "trusted user cookie" that's 
good for a month or so, to avoid frequent users having to constantly 
answer questions.

Be generous when determining whether an answer is right: if the user's 
response contains the right answer, it's right. (Place a reasonable 
limit on the length of the answer to avoid brute-force attacks here.) 
For example, if the question is "who is president of the United 
States?", the answers "George Bush", "George W. Bush" and just plain 
"Bush" are acceptable. Case insensitive, of course.

Some criteria for good questions:
1) Avoid mathematical questions, such as "what is five plus seven?" 
They're tempting because they can be easily generated by a computer, but 
they are also easy to solve with a computer.
2) Avoid multiple choice questions or questions that include the answer 
in the question. They can be defeated by brute-force.
3) Remember your international audience. Questions like "who was the 
first president of the United States?" may not be easy for someone in 
Nigeria. Even worse: "who is president?" But of what country?

The last criterion is particularly tricky, but I have some examples of 
questions that I think would be acceptable:
   "What is the opposite of 'fast'?"
   "What is the capital of France?"
   "How many feet does a dog have?"
   "What is the name of the third planet from the sun?"
      (Avoid asking the names of other planets: a Chinese person
      might not know the English name of the planet Mercury.)
   "How many days in a week?"
   "Mozilla, Internet Explorer and Netscape are used to view _____."
      (Accept any answer containing "web" or even "net".)

Finally, just to make sure no one gets stuck, you might give the user a 
choice of answering any one of three questions.

-- 
    They've signed me up for every advertising campaign
    and mailing list there is. These people are out of
    their minds. They're harassing me.
        - spam tycoon Alan Ralsky, who was signed up for tons of
          (paper) junk mail after publicly proclaiming that
          he had no regrets about his spam empire.


More information about the Techtalk mailing list