Spam trends

« previous post | next post »

Comment spam isn't getting any better, but it's certainly getting more frequent. Akismet is now catching more than a thousand LL spam comments (or what it identifies as spam comments) every day.

Some very small but non-zero percentage of this is not in fact spam. So I used to scan everything in Akismet's grease trap, in order to rescue the real stuff. In the past, I've salvaged worthwhile contributions from John Cowan, Language Hat, and others. However, the volume is now so great that I usually don't have time to do this.  If your genuine contribution is trapped and flushed, I apologize in advance — let me know by email if you think this has happened.



9 Comments

  1. Chris said,

    December 30, 2010 @ 10:23 am

    Any way to crowd source this task?

  2. Ellen K. said,

    December 30, 2010 @ 10:41 am

    Thanks for the heads up. The spam filter is much appreciated. Without it, we'd be back to no comments at all, I'm sure.

  3. Eric S said,

    December 30, 2010 @ 1:22 pm

    Slashdot's comment moderation system code is open source:

    http://www.slashcode.com/

    Incorporating this into LL would likely take far more time than the authors have, but you never know.

  4. Arjan said,

    December 30, 2010 @ 4:47 pm

    I find Mollom works wonders.

    The average efficiency is 99.95%. This means that only 5 in 10,000 spam messages were not caught. Mollom has caught 351,724,549 spam messages since it started. Today we caught 213,870 spam messages. On average, 90% of all messages are spam.

    The only drawback is that the free version gives you up to 100 legitimate posts a day.

    [(myl) I don't think that Akismet is much worse — I've been seeing just one or two legitimate comments caught per week, out of 5,000 or more comments identified as spam. The miss rate is higher — maybe as high as 5%.]

  5. Dougal said,

    December 30, 2010 @ 7:19 pm

    What we do on our site is query Akismet before the page says "okay, your comment's posted." If it says the comment is non-spam, all is well; if not, we hit the poster with a CAPTCHA. If they solve it, the comment goes in to the moderation list, though it also shows up on the site.

    Spam comments never hit our database, and Akismet false-positives get completely avoided, as far as I can tell. This is on a much smaller site than Language Log, though, and is also susceptible to Akismet downtime. Plus, as far as I know, this can't be easily hacked into WordPress.

    [(myl) For some irrational reason, I prefer spending my hacking time on other things, and do as little on this site as possible other than writing and discussing posts. The current situation with spam comments is quite tolerable, from my perspective. Non-spam comments, now :-)…. ]

  6. Twitter Trackbacks for Language Log » Spam trends [upenn.edu] on Topsy.com said,

    December 30, 2010 @ 9:44 pm

    […] Language Log » Spam trends languagelog.ldc.upenn.edu/nll/?p=2874 – view page – cached December 30, 2010 @ 9:16 am · Filed by Mark Liberman under Announcements […]

  7. Adrian Bailey said,

    December 31, 2010 @ 6:31 am

    "If your genuine contribution is trapped and flushed, I apologize in advance"

    I think that sentence would be improved if the last two words were omitted.

  8. SteveT said,

    January 7, 2011 @ 10:39 pm

    I ran a blog with Akismet running. I was able to add a number of keywords to a blacklist which immediately deleted the posting rather than put in the spam queue. This helped immensely with moderation.

  9. Mark said,

    February 19, 2011 @ 8:04 am

    In answer to Chris' question, Akismet IS crowdsourcing, since it leverages information sent back from thousands of blogs about what webmasters consider spam and what not. I am not sure of all the parameters involved, probably more than just the content of the message, since these are now usually 'spun' to prevent such identification.

RSS feed for comments on this post