Language Log

The case of the missing spamularity

December 23, 2010 @ 2:57 pm · Filed by Mark Liberman under Computational linguistics

A recent diary post by Charlie Stross ("It's made out of meat", 12/22/2010) poses a striking paradox. Or rather, he makes a prediction about a process whose trajectory, as so far observable, seems paradoxical to me.

Stross starts with some amusing stuff about the history of the internet and the human urge to communicate, and then takes up the Singularity:

About 20 years ago, [Vernor] Vinge asked, "what if there exist new technologies where the curve never flattens, but looks exponential?" The obvious example — to him — was Artificial Intelligence. [..]

Vernor came up with two postulates. Firstly, if we can design a true artificial intelligence, something that's cognitively our equal, then we can make it run faster by throwing more computing resources at it. […]

He also noted something else: individually, on average, we humans are not terribly smart. […]

If such higher types of intelligence can exist, and if a human-equivalent intelligence can build an AI that runs one of them — which is an open question — then it's going to appear very rapidly after the first weakly superhuman AI. And we're not going to be able to second guess it because it'll be as much smarter than us as we are than a frog.

After putting aside some foundational philosophy, Stross asks:

Let us for a moment suppose that the classical formulation of the singularity is plausible, and that furthermore classical computational artificial intelligence is possible. Where is it likely to emerge?

And he gives an interesting answer: Spam.

We are currently in the early days of an arms race, between the spammers and the authors of spam filters. The spammers are writing software to generate personalized, individualized wrappers for their advertising payloads that masquerade as legitimate communications. The spam cops are writing filters that automate the process of distinguishing a genuinely interesting human communication from the random effusions of a 'bot. And with each iteration, the spam gets more subtly targeted, and the spam filters get better at distinguishing human beings from software, in a bizarre parody of the imitation game popularized by Alan Turing […]

We have one faction that is attempting to write software that can generate messages that can pass a Turing test, and another faction that is attempting to write software that can administer an ad-hoc Turing test. Each faction has a strong incentive to beat the other. This is the classic pattern of an evolutionary predator/prey arms race: and so I deduce that if symbol-handling, linguistic artificial intelligence is possible at all, we are on course for a very odd destination indeed — the Spamularity, in which those curious lumps of communicating meat give rise to a meta-sphere of discourse dominated by parasitic viral payloads pretending to be meat …

For a plausible picture of this kind of future, see (for example) Stross's immensely enjoyable novel Accelerando, But I have a problem with the premises of his argument, and it's got nothing to do with the interesting philosophical issues of whether "symbol-handling, linguistic artificial intelligence is possible at all". My problem is historical.

As far as I can see, the spam that piles up in my email and weblog-comment filters is NOT getting better, at least not in the sense of being harder to distinguish from legitimate communications. On the contrary, the spam I see these days is not even trying to emulate (for example) the cleverness displayed (five years ago!) by SCIGen. Nor is it showing evidence of any of the other fairly obvious things that I would do if I were trying to create "personalized, individualized wrappers for … advertising payloads that masquerade as legitimate communications". Instead, we see linguistic garbage like this, crafted by hopeless underpaid drudges in third-world internet cafes, not by clever AI algorithms lifting themselves up by their digital bootstraps.

Sure, there's plenty of machine-generated stuff, but it's not getting any better. Here's a small sample of a recent flood of Language Log comments that were clearly the output of a program:

hi everyone, my kinfolk is frank and i virtuous poorness to say that this is an fantabulous journal billet and i truly institute it adjuvant, would it be o.k. if i submitted posts to this diary around topics i launch riveting?

hi everyone, my kinsfolk is stamp and i vindicatory requisite to say that this is an excellent diary writer and i rattling found it implemental, would it be okay if i submitted posts to this diary nigh topics i saved gripping?

hi everyone, my folk is frankfurter and i upright require to say that this is an excellent blog flier and i rattling plant it steadying, would it be o.k. if i submitted posts to this blog around topics i plant gripping?

But if the program that wrote that had been submitted as homework in an undergraduate AI class 20 years ago, I would have given it a C — out of pity.

There may well be a "classic … evolutionary predator-prey arms race" going on in the world of spam and spam filters — I have this on good authority, though I don't know much about the details — but whatever the resulting evolutionary trajectory is, it's not creating any "parasitic viral payloads" that do a credible job of "pretending to be meat".

Oh wait. The thing is, if Stross were right, how could we tell? I've never actually met John Cowan in the flesh…

[Seriously, I suspect that the current economics of spam rewards propagation rate much more strongly than payload quality; and that the aspects of payload quality that are optimized are relatively uncorrelated with "pretending to be meat". Otherwise, we'd certainly see much more higher-quality spam.]

December 23, 2010 @ 2:57 pm · Filed by Mark Liberman under Computational linguistics

Permalink

35 Comments

Ben C said,

December 23, 2010 @ 3:18 pm

Interesting, because I was recently deluged with spammy Twitter follows that were MUCH more difficult to distinguish from the usual sort. They didn't use underwear models for pictures, for one. I guess they caught on that most people aren't friends with people that would wear so little in front of a camera. Second, the bios they used were lines from what appear to have been books, and were a lot better than the garbage that has been put there—they seemed like something particularly strange people might actually might put in their bios (one was "It looks like it's time for another cryptic prophecy.")

The tweets themselves were the giveaway. They were unremarkable reposts of headlines, but were punctuated by things people might actually say about the news ("I cant get on my skype" and the like).
Ben C said,

December 23, 2010 @ 3:19 pm

Yeah, I had some grammar errors up there. Sorry. Can't think right now.
Jethro said,

December 23, 2010 @ 3:23 pm

Or maybe you're one of them!
Twitter Trackbacks for Language Log » The case of the missing spamularity [upenn.edu] on Topsy.com said,

December 23, 2010 @ 3:45 pm

[…] Language Log » The case of the missing spamularity languagelog.ldc.upenn.edu/nll/?p=2860 – view page – cached December 23, 2010 @ 2:57 pm · Filed by Mark Liberman under Computational […]
William Ockham said,

December 23, 2010 @ 3:47 pm

I think the mistake Stross makes is the assumption that "the classic pattern of an evolutionary predator/prey arms race" results in smarter prey (or predators, for that matter). Based on my very unscientific view of the world around me, I would think that the result is at least as likely to be higher fecundity on the part of the prey. Fish, for example, have many eggs at once to ensure that some of them survive. Your offspring don't have to be that smart if you have a million of them. At least a few will survive. That seems to be the pattern with spam.
Thomas Westgard said,

December 23, 2010 @ 3:53 pm

When it's sufficiently interesting and accurately targeted, we will cease to call it "spam" and start calling it an "information service."
Jonathon said,

December 23, 2010 @ 4:18 pm

obligatory XKCD link

But as already noted, most spammers seem to be going for quantity more than quality. At this point in the arms race, it seems to be cheaper to bombard sites with spam than to craft really convincing spam comments. And even with good spam filters like Akismet, a few will occasionally slip through.
the other Mark P said,

December 23, 2010 @ 4:19 pm

Arms races, real ones that is, are only rarely interesting. They aren't a good metaphor for constant progress.

Sure the rocket-to-deliver-nuclear-payload race was interesting. Lots of splashy headlines with loads of technical progress.

But sometimes the predators win the battle easily. We don't see much cavalry these days.

Sometimes the prey can't be beaten. The landmine has so far outstripped its "predators" that there is a movement to ban them (very reminiscent of medieval attempts to ban crossbows).

In general arms exhibit the usual features of technology. A breakthrough spurs amazing developments, but then further advances come slower and slower. There is no constant upwards development.

Machine-guns, tanks, planes etc were all hugely developed around WWI. But we still have recognisably the same items as the keys to our arsenals even now.
swami said,

December 23, 2010 @ 4:20 pm

I think most anti-SPAM filters are designed to catch keywords (such as prescription drug names), disguised URLs, false headers, geographic origin, and sheer volume of identical e-mails. Being written in proper English wouldn't fool a filter looking for these technical hacks. The only way a Turing test-passing AI would improve SPAM success rates is if it properly responded to a reply e-mail from the target asking very complicated questions about the product advertised (assuming the initial e-mail is even read).

And if we do develop a truly sentient AI that can write individual convincing content for each SPAM e-mail sent, would it really want to spend it's days advertising junk to strangers? It'd probably want to spend some time reading Wikipedia like the rest of us.

Comment SPAM is a whole other issue. If you've make a program that reads a blog post and formulate a pretty convincing response based on the content of the post, then we're going to have to come up with some very difficult CAPTCHA tests in response.
John Cowan said,

December 23, 2010 @ 6:50 pm

hi everyone, my kinship-system is eskimo and i phonemic functional to say that this is an fantabulous LLog post and i truly eliminated it grammatical, would it be o.k. if i submitted comments to this blog anent topics i raised c-commanding?

(Does that help any? I'm even more immobile at the moment than usual, but if you come to New York and want dinner, drop me a line. And yes, just to add some substance, I agree that spammers are r-strategists.)
Matt McIrvin said,

December 23, 2010 @ 7:11 pm

Back in the heyday of Usenet spam and the early days of Web spam, people often wondered how effective these ads actually were. My theory at the time was that they didn't have to be, because most spam was really a con played by the providers of spam services on the dimwits who were actually paying to place the ads. If you could convince those people that you were getting lots of eyeballs, nothing else mattered.

But I have no idea whether this was actually the case, or is now.
the other Mark P said,

December 23, 2010 @ 8:22 pm

I believe most spam now is only using the advertised product in order to sucker people into clicking their link.They have no intention of selling anything necessarily, but are phishing for credit card details etc.
HP said,

December 23, 2010 @ 9:11 pm

Has anyone ever tried a reverse-Turing test? Could a human being emulate machine-generated text that would fool an AI? Could a human emulating machine-generating text fool another human? Or would there be some clue in the text that a sentient human had written it?
HP said,

December 23, 2010 @ 9:15 pm

Phishing for credit card details, etc.? Have intention of selling anything necessarily? I believe most tried a reverse-Turing test. Emulating fool would fool an AI into clicking their link.
Charles Gaulke said,

December 23, 2010 @ 10:27 pm

I imagine that spam will work out for AI much like chess did – it was promising on its face, until it became clear there was a much more straightforward solution to the problem (chess is just a search algorithm, getting around spam filters is just a matter of generating patternless gibberish, neither is what we'd call intelligence).

The spam FILTER, now… That's a harder problem. Even human beings, clearly, can't always recognize spam, so getting a computer to catch both clever imitations of legitimate e-mail and random gibberish might actually require something we'd call intelligence.

In any event, I've always felt Turing intended his famous Test as a dig at human intelligence rather than a measure of machines'.
Garrett Wollman said,

December 23, 2010 @ 11:05 pm

Note that spam filters come in a very broad quality range, and operate on several different principles (ideally all at once). They can also be tuned differently — I for one have my filters tuned to be very aggressive, such that I rarely receive spam in my inbox, but do have to check for the occasional false positive in my spambox. I can do this because I run my own mail server; most people don't have that luxury and must put up with mail servers that are optimized for economics (equilibrium of cost-of-spam and cost-of-complaints-about-spam-filtering) rather than for absolute minimum spam-delivery rate. The way my colleagues at work run our common mail system must perforce have a higher false-negative rate simply because not everyone classifies the same message identically. (E.g., the broadcast email from Mahogany Row about the search committee for the new assistant dean for search committees might interest a quarter of the faculty and administrators but none of the undergrads.)
Rod Johnson said,

December 23, 2010 @ 11:54 pm

Like John, I thought of this as an r- vs K-strategy question. r-selection is supposed to win in unstable environments, and K-selection is supposed to win in stable environments. I think it's fair to say that the internets is an unstable environment, and our current spam is an r-strategy species, like, say, rats, and about as easy to eradicate. So to breed intelligent spam, we need a more stable environment for it to compete in. That doesn't seem likely in the near future.
Barrett D said,

December 24, 2010 @ 3:29 am

The relationship is more like virus/anti bodies, yes?

Spam will always be a minor annoyance. If it became more than that then people will stop using the internet. (The virus killing the host!)
maidhc said,

December 24, 2010 @ 6:16 am

I read a lot of sites online. Consequently I am often unable to credit how a certain thought came into my head. So …

Very few people who speak "proper" English are taken in by spams. But there is a huge population of people online with marginal English skills. Such people may find broken English more credible because it matches more to their own command of English.

One thing I've seen on blogs is an app that grabs sentences from past posts and combines them into a new post. Of course the sentences are unrelated. But I haven't yet seen a payoff to it. One would expect something like this:

Based on my very unscientific view of the world around me, I would think that the result is at least as likely to be higher fecundity on the part of the prey. My theory at the time was that they didn't have to be, because most spam was really a con played by the providers of spam services on the dimwits who were actually paying to place the ads. In any event, I've always felt Turing intended his famous Test as a dig at human intelligence rather than a measure of machines'. Have you tried the new penis size enhancer? http://www.URaSucker.com

[(myl) I've seen comment spam that simply repeats the text of an earlier posted comment, with an added link or two. That's about the cleverest spam technique that I've seen recently, and it's much less common than the dumber ones.]
Barry said,

December 24, 2010 @ 6:59 am

I think part of the reason that the spam arms race doesn't mesh with the theoretically rapidly-approaching technological singularity is that advancements in spam/anti-spam are largely developed ad-hoc through direct human ingenuity. That is to say, there is no AI system coming up with new and better ways to spam, but rather, humans are devising these new and incrementally better ways and simply implementing them. Likewise, there is no AI system coming up with new and better ways to block spam, and while AI concepts come into play in the form of trainable classifiers, the incremental improvements in the classifiers themselves are the result of humans devising and implementing them.

If the singularity were to come about through the spam arms race, you'd really need an AI system that was able to generate new spam techniques by itself. From a practical standpoint, this is such a difficult problem that it's likely to be cracked by actual AI researchers before it's cracked by spammers.
Dan Lufkin said,

December 24, 2010 @ 10:35 am

There's an actual AI arms race in progress in the field of automated stock-exchange trading. Hundreds of programs are tracking the market and making millisecond decisions on transactions based on patterns, real and imaginary, in the flow of trading. There was an unexpected instability a few months ago that had Apple trading at the $100,000 level for a few minutes, if memory serves. Various circuit-breakers have been put in place, but no one really understands the details of the situation.

I'm just writing this off the cuff; I'm sure that there's much more info out there.
Calaqscedoa said,

December 24, 2010 @ 10:47 am

I never thought that spam was really intended to really sell what they are occasionaly advertizing, not since several years at any rate. I suspected it a while of being purely a terrorist device, aimed at confusing its targets and instilling in them the idea that they were out of the race ( whatever that may be ), a little bit in the fashion which can be observed since several years on commercial and world TV with people adressing aggressively or derogatively the viewer through the camera. This, with TV remains true – I often feel the need to put on a cap or a hat, in the fashion of a helmet, in order to distanciate myself from the abusers; they are simply taking for granted the reason for which you are watching the program and they are attributing themselves whatever which dominant role there, propagable in the outside – whereas regarding spam, after having processed to some basic linguistic and sociologic analysis I have become indeed able to considering it, like Thomas is suggesting it, as an information device. Still pictures do remain strongly psychologically polluting, in fact the worse so on simply information sites that seem to be inheriting them or their transgressions after a while, and otherwise, one single unexpected variation of the procedure will be able to introduce disorder inside the watch.
Nathan Myers said,

December 24, 2010 @ 3:49 pm

It should be noted that "It's made out of meat" is a play on the title of the phenomenally successful story meme, "They're made out of meat", by Terry Bisson. Terry Bisson has written plenty of other brilliant work, some of which you can read online, and much more that ought to be, if it isn't, on the shelves of your neighborhood library.
Dierk said,

December 25, 2010 @ 5:37 am

One of Stross' assumptions is that 'AI is possible', but he uses this not as a given, only as a necessary base for his thoughts. He then develops a [in my view: counterfactual] world in which AI exists and how, where, and why it will most probably first emerge. The question if AI is possible at all doesn't play into his thought experiment, hence, any empirical notion of what we actually see at the moment is a bit moot.

Let's assume Stross argument were about what we currently experience, that is, AI is possible and it will emerge first in the world od spam and anti-spam. The historical argument then forwarded against this – 'I don't see it, everything is as bad as it was when I was young' – does not lead to any trouble for Stross since – again: as far as I understand him – he does not claim we already reached the point were actual AI emerges.

It's a bit like the 'heap' question: When is it? There is a continuum for which most of the time we don't count a phenomenon as being but at one point we see a "sudden" emergence of exactly the phenomenon. A lot depends on which side of the perceived line we are standing and looking – 'when is it becoming a heap' vs. 'when is it still a heap'. Curiously we rarely notice that the line is a moving target.
400guy said,

December 25, 2010 @ 2:28 pm

HP asked, "Has anyone ever tried a reverse-Turing test?"

There is at least one well-documented example. For a delightfully amusing description, see "A Coffeehouse Conversation on the Turing Test", chapter 22 in "Metamagical Themas: Questing for the Essence of Mind and Pattern" by Douglas R. Hofstadter, ISBN 0-553-34279-7.
baylink said,

December 26, 2010 @ 8:17 am

@maidhc: looks like Dissociated Press to me (q.g.)
Dan Lufkin said,

December 26, 2010 @ 1:55 pm

I think maybe the kitchen is closed now, but I recall where I was reading about AI in the stock market. It was in the January 2011 issue of Wired, page 90. It's part of a long feature on "Artificial Intelligence is here. In fact, it's all around us. But it's nothing like we expected." Sorry it doesn't seem to be available on-line unless you subscribe.
Spam Poetry – strangely charming « Word Machines said,

December 28, 2010 @ 8:26 pm

[…] excellent Mark Liberman over at the equally excellent Language Log has a post up about the missing Spamularity. As part of the post, Mr Liberman puts presents three spam messages which are intended to […]
dustbury.com » Quantity, not quality said,

December 28, 2010 @ 8:51 pm

[…] Spam, says Mark Liberman, is not, despite previous predictions, getting any smarter: As far as I can see, the spam that piles up in my email and weblog-comment filters is NOT getting better, at least not in the sense of being harder to distinguish from legitimate communications. On the contrary, the spam I see these days is not even trying to emulate (for example) the cleverness displayed (five years ago!) by SCIGen. Nor is it showing evidence of any of the other fairly obvious things that I would do if I were trying to create "personalized, individualized wrappers for advertising payloads that masquerade as legitimate communications". Instead, we see linguistic garbage. […]
plain material « company of three, black peppermint tea said,

December 30, 2010 @ 7:18 am

[…] language log: the case of missing spamularity Seriously, I suspect that the current economics of spam rewards propagation rate much more strongly than payload quality; and that the aspects of payload quality that are optimized are relatively uncorrelated with "pretending to be meat". Otherwise, we'd certainly see much more higher-quality spam. […]
David Conrad said,

December 30, 2010 @ 3:02 pm

"Has anyone ever tried a reverse-Turing test?"

Turing Test extra credit: Convince the examiner that he's a computer.

"You know, you make some very good points."

http://xkcd.com/329/
D. B. Propert said,

February 19, 2011 @ 11:34 am

Unfortunately I was able to give the Verizon help desk chat site a Turing test recently while trying to get them to fix my DSL access. They failed the test (and failed to diagnose the problem*), though I suspect all the answers were written by actual humans. Unfortunately my attempt to save the content of the chat session was interrupted and failed.

* Apparently the help desk scripts did not allow the customer to provide thoughtful input.
Rubrick said,

April 27, 2011 @ 4:23 pm

Mark, it seems clear to me that the comment-spam trio you included is actually the work of an AI into which has been uploaded the combined consciousnesses of e.e. cummings and Gertrude Stein.

Or possibly they just had a kid.
Holly said,

April 27, 2011 @ 5:15 pm

It could become even more smarter than us than we are than a frog if we were smarter. But dumb as we are, we're not giving the smart variants enough of an advantage.
Barron’s Red Flags – Augmented Attention Bots? « hyperpomodoro.com said,

June 16, 2011 @ 10:33 am

[…] to rely on AI to direct their attention the forces of evil fight back and before you know it the spamularity is here. Or, in this case, the accountants just work out what the red flag phrases are and learn to […]

RSS feed for comments on this post

The case of the missing spamularity

35 Comments

Ben C said,

Ben C said,

Jethro said,

Twitter Trackbacks for Language Log » The case of the missing spamularity [upenn.edu] on Topsy.com said,

William Ockham said,

Thomas Westgard said,

Jonathon said,

the other Mark P said,

swami said,

John Cowan said,

Matt McIrvin said,

the other Mark P said,

HP said,

HP said,

Charles Gaulke said,

Garrett Wollman said,

Rod Johnson said,

Barrett D said,

maidhc said,

Barry said,

Dan Lufkin said,

Calaqscedoa said,

Nathan Myers said,

Dierk said,

400guy said,

baylink said,

Dan Lufkin said,

Spam Poetry – strangely charming « Word Machines said,

dustbury.com » Quantity, not quality said,

plain material « company of three, black peppermint tea said,

David Conrad said,

D. B. Propert said,

Rubrick said,

Holly said,

Barron’s Red Flags – Augmented Attention Bots? « hyperpomodoro.com said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta