We've often had occasion to wonder how spammy blog comments are linguistically constructed. (See, most recently, Mark Liberman's post, "Numerous upon the written content material," in which he refers to spam comments as "aleatoric sub-poetry.") Now, on Quartz, David Yanofsky and Zachary M. Seward expose how spam comments are engineered:
Comment spam follows a formula, which was made plain the other day when a spambot accidentally posted its entire template on the blog of programmer Scott Hanselman. With his permission, we’ve reproduced some of the spam comment recipes here and added colorful formatting to make it readable. The spambot constructs new, vaguely unique comments by selecting from each set of options. We hope you find it wonderful | terrific | brilliant | amazing | great | excellent | fantastic | outstanding | superb.
A few examples:
been online more than hours today, yet I never found any interesting article like yours. pretty worth enough for me. , if all and bloggers made good content as you did, the will be useful than ever before. I commenting. written! your as I your subscription or service. Do any? me I subscribe. Thanks.
Wow, thisis , my is analyzing things, I am going to her.
! Someone in my group shared this with us so I came to . I'm definitely the information. I'm and will be tweeting this to my followers! blog and .
For connoisseurs of such automated quasi-synonymy, let me also note this passage from a piece I wrote for Lapham's Quarterly last year called "Word for Word," a reconsideration of Roget's Thesaurus:
Fans of the television show Friends may recall the episode in which the dim-witted Joey Tribbiani discovers the built-in thesaurus in his word-processing program and tries to spruce up a letter of recommendation for his friends’ adoption agency. He thesaurusizes every word, so that the sentence “They are warm, nice people with big hearts” turns into “They are humid, prepossessing homo sapiens with full-sized aortic pumps.”
That bit of sitcom silliness has actually turned into a grim reality, now that online content farms use so-called spinning software to modify a source text by automatically swapping out words with ostensible synonyms. (The goal is to create new textual fodder that can be used on websites without search engines like Google suspecting that the content has been duplicated from elsewhere.) I recently came across a particularly ham-handed example on a news aggregator which lifted an article from the Star-Ledger about a looming fight between two congressional candidates. The original said that “the Democratic showdown…will be bloody and fairly evenly matched considering the county machinery behind each candidate.” In the “spun” version, the showdown “will be full of blood and sincerely uniformly suited deliberation the county equipment at the back any candidate.” Sadly, this sort of thesaurus-driven gobbledygook can be found in abundance online, as if Joey and his full-sized aortic pump had taken over the Internet.