"The victims are large and costly machines"
« previous post | next post »
From Shay Cohen, via Lyle Ungar, some output from a PCFG ("probabilistic context-free grammar") trained on the Wall Street Journal part of the Penn Treebank:
A manager is a better value of well-polished desks .
I have been able to force to be more receptive to therapy , and to keep the committee informed , usually in advance , of covert actions : ; the victims are large and costly machines .
The purchase of all women is in September .
Their museum had been dumping their securities for comment .
It can remember one million truly inspiring teachers from Rainbow Technologies .
This reminds me of a rash of relatively high-quality spam comments that came our way a few weeks ago. A small sample:
People point out "Ultimate Battling, " I say "Professional Snuggling! "
I am grumpier over a 3 calendar year previous ballerina which has a wedgie.
TYPING IN EVERY CAPS HELPS MAKE SOMEONE LOOK MAD, CORRECT?
Just what will it mean each time a woman calls an individual "scrumptious"… very good? bad?
I'm going to live-tweet this specific fart. Err. Succulent. Acidic. And…. completed.
Henning Makholm said,
March 6, 2012 @ 2:13 pm
The spammers are probably using regular old word-based Markov chain generators. The PCFG output seems to have slightly better sentence-level coherence, but there's no need for a spammer to employ such advanced methods as long as spamfilters don't attempt to do whole-sentence analysis anyway.
Theo Vosse said,
March 6, 2012 @ 2:28 pm
I've built my fair share of generators. This is output from one that was hand-crafted on titles of new age songs:
No footprints on the blue sky
Clouds and wings
Rhythm of the eternal forest
Moving movements
and my personal favorite: Dawn of the bamboo day
And this are just three random examples of output from a trigram generator. Can you guess which corpus was used as input?
Neighborhood Size Effects of Grammatical Gender: a PDP Approach
Effects of Chinese Words in French
Finiteness and the Computation of Agreement
And now that I'm on it, this is from a Haiku generator:
Filosofieën
Een uitgerekend grijs zoekt
Gradaties stilstaan
in English it's probably not a haiku:
Philosophies
A computed gray searches
Levels of stand still
Sili said,
March 6, 2012 @ 3:15 pm
That looks like a genuine "Ask LanguageLog" question.
Jens Ayton said,
March 6, 2012 @ 4:42 pm
I’ve been enjoying a Markov bot called @RandomTEDTalks on Twitter. Examples include “The Hunt For A Future That Never Happened”, “The Hunt For A Future That Never Happened”, “Hooked By An Octopus”, and “Charles Leadbeater Weaves A Tight Argument That Isn't Just Tedious, It's Irrelevant To Real Mathematics And The Clues To Past Civilizations”. Uncanny!
(Actually, one of those was a real TED talk.)
Jens Ayton said,
March 6, 2012 @ 4:43 pm
And one was a real copy & paste error!
David Walker said,
March 6, 2012 @ 6:48 pm
Theo, I like that one:
Philosophies
A computed gray searches
Levels of stand still
It sounds haiku-ish to me, even if it doesn't have the exact number of syllables. I forget the definition.
George Amis said,
March 6, 2012 @ 7:30 pm
@David Walker
Haiku have three lines, / seventeen syllables, five / seven five, like this.
Rod Johnson said,
March 6, 2012 @ 9:13 pm
Many, many songs have been written from titles generated here. Examples:
feelin' back the antihero
evil (pacem)
for my vomit
heaven for the cracks
don't bow down
Sparky said,
March 6, 2012 @ 11:53 pm
The fart one was a human.
Or else a winner of the Turing test.
Andy Averill said,
March 7, 2012 @ 12:25 am
In other news, colorless green ideas actually do sleep furiously.
Alex Boulton said,
March 7, 2012 @ 8:34 am
Reminiscent of the Postmodernism Generator (cf. the Sokal hoax): create your own instantly publishable paper, as meaningful as many… http://www.elsewhere.org/pomo/
Mr Punch said,
March 7, 2012 @ 12:05 pm
You can't expect much in the way coherence if you train on the WSJ editorial pages.
MBM said,
March 7, 2012 @ 1:06 pm
Humorous text generators and lousy machine translation, those are the achievements of computational linguistics.
Toma said,
March 7, 2012 @ 1:16 pm
In the 1980s, I had a BASIC program that generated poetry. It randomly chose words and plugged them into a pattern like "adjective noun present tense verb adverb" and so on. They usually turned out pretty funny in a meaningless sort of way. Sounds like these spammers have only a slightly more advanced version of this.
patricia said,
March 7, 2012 @ 2:43 pm
these remind me of the nonsense strings in those candidate "Bad Lip-Reading" videos, such as http://youtu.be/BhDhDRvHaGs
cxpli said,
March 7, 2012 @ 2:47 pm
I'd guess those spam comments are actually random tweets gathered from Twitter with certain words replaced with synonyms. They make too much sense to have been computer-generated, and with a bit of word substitution you can guess at the original. e.g:
I am grumpier over a 3 calendar year previous ballerina which has a wedgie.
= I am grumpier than a 3 year old ballerina with a wedgie.
TYPING IN EVERY CAPS = TYPING IN ALL CAPS
Dave M. said,
March 7, 2012 @ 3:29 pm
@cxpli and @Sparky:
Yup, at least some of the spam comments are from Twitter. For example, the fart one is derived from a tweet by Rainn Wilson:
http://twitter.com/#!/rainnwilson/status/31865527672512512
Obvious find and replace:
"this" — "this specific"
"Hmmm." — "Err."
"Moist" — "Succulent"
"done" — "completed"
David Eddyshaw said,
March 7, 2012 @ 6:23 pm
"A manager is a better value of well-polished desks."
Alas, this is only true of the elite. Few managers can truly be said to attain this level.