A million (spam) comments

« previous post | next post »

At some point early in the morning of September 1, 2011, we logged our millionth spam comment:

Unfortunately, I didn't get a screen shot until this morning, so the counter is up to 1,008,782, and I similarly failed to put procedures in place to determine which spam site actually placed the millionth feeble attempt.

The complete number of spam-comment attempts is in fact somewhat larger, since this total only counts the attempts that are detected automatically by Akismet, and some spam comments are deleted by hand after fooling the automatic detector. On the other hand, Akismet's total includes some genuine comments that were trapped as well. I don't have time to check everything in the spam trap, since several thousand attempts a day are now logged — I usually find out about a false positive when someone sends me email to complain about it.

1,008,782 spam comments in 1220 days (since April 30, 2008, when I installed our new WordPress software) is 827 spam comments per day, on average. However, spam comment attempts have increased along with our readership, which is now at an average of 27,176 visits and 42,922 page views per day.

Update — approximately 24 hours later, we're up to 1,013,210:

…for an increment of 4,428. Some days I think we get as many 8-9 thousand, some days as few as 1,000 or so.

Update #2 — and on the morning of September 12, 11 days later, we're at 1,054,064, for an average of (1054064-1008782)/11 = 4,117 per day.



10 Comments

  1. Skullturf said,

    September 2, 2011 @ 7:14 pm

    I tell you'u about new da(ating site where you can meet manny men@ or wommen to fulfill you neeed

    No, just kidding.

    [(myl) Our most recent mini-flood was a few hundred comments, all reading "Nice writing skills you have, great article and will be sure to check out your other ones.", all pretending to come from so-and-so@stanford.edu (for a few hundred different values of so-and-so), all coming from a small set of IP addresses known to be associated with "Nobis Technology Group" using servers hosted by Ubiquity in various cities (basically the same as those cited here). Most of them link to a page at a certain online "herbal store" hawking a secret Amazonian "extract from bark roots" that cures "asthma, coronary disease, and arthritis and also liver ailments", as well as "different varieties of cancer specifically cancer of colon, ovary, breast, prostrate, lung, pancreas and lymphoma". But there are other pages offering "Instant Virgin Vaginal Tightening Spray", "Bazooka Penis Enlargement Pills", "Columbian Gold synthetic weed", and so on.

    The general pattern that I've observed — not that I've paid very much attention, frankly — is for a source like this to flood us for a day or two with hundreds of spam comment attempts. Then they give up or move on.]

  2. Andy Averill said,

    September 2, 2011 @ 7:58 pm

    I'm a bit horrified to think what would happen if the purchaser of the Bazooka Penis Enlargement Pills hooked up with one of the Instant Virgin Vaginal Tightening Spray ladies… oh go ahead and delete this…

    [This is coarse and silly, so naturally we tried to delete it immediately. Unfortunately Andy had taken the precaution of submitting 65,536 copies of it, and after deleting the first 65,535 of them we got tired. —Language Log Plaza administration]

  3. Aaron Binns said,

    September 2, 2011 @ 8:18 pm

    Impressive spam totals. Back in 1999 (or so) I turned of spam filtering by my ISP and handled it in my email client. Since then, I've accumulated 171,531 spam emails.

  4. Leonardo Boiko said,

    September 2, 2011 @ 8:40 pm

    Did you offer a prize to the millionth winner?

  5. Jarek Weckwerth said,

    September 3, 2011 @ 3:59 am

    For comparison, how many genuine comments have you had?

    [Since you ask, Jarek, yours is comment number 136,226 of those that we have had since the new server was started in 8 April 2008. (This is making the simplifying but clearly false assumption that everything that got posted should be counted as "genuine".) Notice that the number of posts since 8 April 2008 is merely 3,410 (there were more than 5,000 before that, still archived here; a few hundred of Mark's and Geoff's from that era were gathered into a book that you can obtain from Amazon here). So the posts that actually survive spam screening and our occasional attempts to hand-delete junk come in at a rate of well over a hundred a day, and they outnumber our posts by a factor of 40. When we say we are overwhelmed by the reaction of our readership, this is not a metaphor. It is a quantitatively supportable claim. (We can get Mark to do a scatter plot for you…) —Language Log Plaza administration]

  6. Terry Collmann said,

    September 3, 2011 @ 8:23 am

    For comparison, I have a WordPress-powered bog that has had spam coming in at an average of 28 "slices" a day since January 2008. However, my average number of visitors a day over that period is only around 200: that means Language Log gets 135 times more visitors than me, but only 30 times more spam.

  7. Spell Me Jeff said,

    September 3, 2011 @ 9:04 am

    Small wonder Geoff Pullum no longer accepts comments.

  8. Cecilieaux Bois de Murier said,

    September 3, 2011 @ 9:46 am

    Reminds me of Tom Paxton's song about America reaching a million lawyers.

  9. Acilius said,

    September 3, 2011 @ 8:45 pm

    I've noticed something odd in my spam comment folder recently. These comments consist of bitterly phrased assertions that my fancy videos and eye-catching graphics are just camouflage for weak arguments. They are obviously spam, not only because they repeat each other verbatim and they link to commercial sites of the usual kinds, but that also because they usually appear on links pages with no videos, no graphics, and no arguments. I wonder if they are the product of some program that has culled the hostile remarks from comments that a wide variety of bloggers have approved.

  10. blahedo said,

    September 4, 2011 @ 4:19 pm

    > "a WordPress-powered bog"

    I have to say that I'm tickled by this concept. Is this the true update of the Sears-catalogue-powered bog?

RSS feed for comments on this post