Numerous upon the written content material

« previous post | next post »

Another fragment of aleatoric sub-poetry, from the 5,036,601 spam comments that Akismet has caught since we installed it:

I image this might be numerous upon the written content material? nevertheless I nonetheless believe that it may be suitable for just about any type of topic material, because it could frequently be pleasant to resolve a warm and delightful face or possibly listen a voice whilst initial landing.

No doubt Stan Carey's spam-comment collection will contain the variants needed to infer the pattern out of which this was spun. (See "'Some superb entropy' in the language of spam", Sentence First 5/6/2013.)

Since we logged our two millionth spam comment on 1/31/2012, 437 days ago, we've averaged

(5035601-2000000)/437 = 6946 spam comments per day

over the past year and a bit. This is pretty close to the 6579 spamments per day that we averaged during the previous half-year, suggesting that Language Log's spammentarial ecosystem is in or near a state of equilibrium.

Most spam comments are much less interesting:

Thanks for giving these substantial post.

thanks for posting, please keep doing it.

If you value custom and also have a few lounging around the house the best of each sides with consign. The best thing regarding consignment  created in your sales and get additional custom, and that means you obtain a particular percentage of the sale from the these people handle all of the aspects of selling your purse for you personally. The advantage in order to utilizing a consignment shop.

But none of them are very convincing. I continue to be puzzled about the generally poor quality of this stuff, as discussed in "The case of the missing spamularity", 11/23/2010:

There may well be a "classic … evolutionary predator-prey arms race" going on in the world of spam and spam filters — I have this on good authority, though I don't know much about the details — but whatever the resulting evolutionary trajectory is, it's not creating any "parasitic viral payloads" that do a credible job of "pretending to be meat".

Oh wait. The thing is, if Stross were right, how could we tell? I've never actually met John Cowan in the flesh…

[Seriously, I suspect that the current economics of spam rewards propagation rate much more strongly than payload quality; and that the aspects of payload quality that are optimized are relatively uncorrelated with "pretending to be meat". Otherwise, we'd certainly see much more higher-quality spam.]



12 Comments

  1. Faldone said,

    April 12, 2013 @ 12:35 pm

    I would speechless of the ambiguity in the referral introduction.

  2. Bob Moore said,

    April 12, 2013 @ 12:44 pm

    I find the first comment you quote interesting, in that it seems almost grammatical, yet semantically incoherent. Sort of like "Colorless green ideas sleep furiously." If it were human-generated non-native English, I would expect the reverse. Could the comment spammers be using something like probablistic context-free grammars to try to elude spam detectors? Or perhaps, it is just machine translation output (which might be much the same thing).

  3. Jim said,

    April 12, 2013 @ 1:57 pm

    It's probably a Markov chain generator of some sort.

  4. MikeA said,

    April 12, 2013 @ 5:31 pm

    There has been speculation that the sheer ineptitude of most spam is intended to preselect for customers who are likely to swallow the hook later, when the "unfortunate circumstance force us to request a prepayment of delivery fees", thus economizing on "customer service". Of course, I can see that applying more to email spam with an actual offer included, than to blog comment spam which seems mostly intended to drive click-traffic to affiliate sites, or maybe to hit the reader with a drive-by malware-load.

    Of course it is possible that a modern-day emulation of the Manchester Poetry generator has become self-aware and is trying to communicate.

  5. Ross Presser said,

    April 13, 2013 @ 4:12 am

    Are there any steps we, the spam-reading public, could take to improve the quality of comment spam?

  6. maidhc said,

    April 13, 2013 @ 4:24 am

    Ross Presser: Are there any steps we, the spam-reading public, could take to improve the quality of comment spam?

    You know the Russian fox-breeding program that every generation selected those animals that were friendliest to people?

    Construct a rating system and convince other people to click on a spam message only when it exceeds a certain rating threshold. If you can get enough people to participate, by natural selection you should be able to drive spam-generating algorithms toward your esthetic goal.

  7. JW Mason said,

    April 13, 2013 @ 2:13 pm

    My understanding is that the target of comment spam is search engines, not human readers. The goal is to maximize the number of legitimate, high-traffic sites that contain a link to the spammer's site with an appropriate keyword, whatever the spammer is selling. [*] There is no expectation that anyone reading the comment will click through the link; its only purpose is to increase the odds that someone doing a search for the keyword will end up at the spammer's site. The only purpose of having text other than the keyword is to get past automated anti-spam software. So there is no selective pressure toward more human-like comments.

    Incidentally, the best spam comment I've gotten was: "This is my job. I am so sorry."

    [*] I tried posting this comment with a concrete example, and of course that comment was filtered out as spam. Which points up the weakness of comment spam — since its only value is the link from the keyword to the spammer's site, the keyword has to be present, which means it's easy to filter for.

  8. JW Mason said,

    April 13, 2013 @ 2:16 pm

    Incidentally, that means the answer to Ross Presser's question is No. The public for comment spam is search-engine spiders, not blog readers.

  9. Ran Ari-Gur said,

    April 13, 2013 @ 3:18 pm

    > Seriously, I suspect that the current economics of spam rewards propagation rate much more strongly than payload quality […]

    Or, continuing with the biological-evolution analogy, we could say that spam is r-selected rather than K-selected.

  10. “Some superb entropy” in the language of spam | Sentence first said,

    April 14, 2013 @ 12:16 pm

    […] "fragment of aleatoric sub-poetry" from Mark Liberman at Language Log, who nonetheless is puzzled about the generally poor quality of […]

  11. Nick Lamb said,

    April 15, 2013 @ 4:25 am

    "Squint and you might see genius" type emergent AI literature is the apparent subject of "A History of Bitic Literature" for which we have only an introduction by Stanisław Lem (writing as Juan Rambellais et al). Lem dates this work to 2009 (and it's a second edition) so we're evidently falling behind on this almost as much as our flying cars.

  12. Ray Dillinger said,

    April 18, 2013 @ 4:10 pm

    If I received the above note, I would have to go check whether an acquaintance of mine who is usually quite brilliant had forgotten again to take his meds.

    Sigh.

    Don't listen to the Hallucinations. They give Bad Advice.

RSS feed for comments on this post