Anatomy of a spambot

We've often had occasion to wonder how spammy blog comments are linguistically constructed. (See, most recently, Mark Liberman's post, "Numerous upon the written content material," in which he refers to spam comments as "aleatoric sub-poetry.") Now, on Quartz, David Yanofsky and Zachary M. Seward expose how spam comments are engineered:

Comment spam follows a formula, which was made plain the other day when a spambot accidentally posted its entire template on the blog of programmer Scott Hanselman. With his permission, we’ve reproduced some of the spam comment recipes here and added colorful formatting to make it readable. The spambot constructs new, vaguely unique comments by selecting from each set of options. We hope you find it wonderful | terrific | brilliant | amazing | great | excellent | fantastic | outstanding | superb.

Read the rest of this entry »

Comments (27)


Tooth and Throat Singing

"Corrections: April 19, 2013", NYT 4/18/2013:

An article on Thursday about Caroline Shaw, who won the Pulitzer Prize for music this week, referred incorrectly to a vocal technique explored by a group she has sung with, Roomful of Teeth. It is Tuvan throat singing — a tradition of the Tuvan people of Siberia — not “tooth and throat” singing.

Read the rest of this entry »

Comments (26)


Phrasal type shifting

David Craig points out an interesting usage in today's Frazz: "They're for just because."

I discussed the process of turning phrases into modifiers in "Phrasally grateful", 10/18/2007:

If you run out of conventional adjectives and adverbs, the English language stands ready to help. Just package an evocative phrase or two with an appropriate prosodic inflection, and you're on your way […]

As the Frazz example illustrates, you can also use a similar process to make noun phrases, though I think it's much less common.

Read the rest of this entry »

Comments (18)


The Gray Lady gets coy again

Dave Itzkoff, "Putting Away His Toys", NYT 4/17/2013:

The lesson he learned about Mr. Bay, he said, was that “behind the intensity and, oftentimes, the complications of getting” things (Mr. Johnson used a different word) “done in an efficient way is a very insightful guy.”

Read the rest of this entry »

Comments (8)


Cupertino of the year (?)

Alex Baumans asks, "Could this be a Cupertino?" Liz Rafferty, "Oops! Zooey Deschanel Captioned as Boston Marathon Bombing Suspect", TV Guide 4/21/2013:

Who's that girl? It's … the Boston Marathon bomber?

During the intense lockdown and manhunt for the Boston Marathon bombing suspects Friday, a local Fox affiliate in Dallas, Texas misidentified one of the suspects as none other than New Girl star Zooey Deschanel. The closed-captioning error came as the station was attempting to name Dzhokhar Tsarnaev, the second suspect in the attack who was being hunted by police on Friday.

"He is 19-year-old Zooey Deschanel," the caption faux pas read.

Read the rest of this entry »

Comments (10)


Dungan: a Sinitic language written with the Cyrillic alphabet

The Dungan people are a group of Sinitic speakers whose Muslim ancestors fled to Central Asia (mainly in parts of what are now Kyrgyzstan and Kazakhstan) over a century ago when the Qing (Manchu) government suppressed their revolt (1862-1877), one of many Muslim uprisings in the course of Chinese history since Islam arrived in East Asia during the Middle Ages.

When they came to Central Asia, the Dungans were mostly illiterate peasants from northwest China who spoke a series of topolects from Shaanxi, Gansu, and other areas.  From 1927 to 1928, they wrote their language with the Arabic alphabet, and from 1928-1932 they used the Latin alphabet.  In 1952-53, the Soviet government created for the Dungans a writing system based on the Cyrillic alphabet, which they continue to use till today.

Read the rest of this entry »

Comments (29)


Chechens, Czechs, whatever

"Statement of the Ambassador of the Czech Republic on the Boston terrorist attack", 4/19/2013:

As many I was deeply shocked by the tragedy that occurred in Boston earlier this month. It was a stark reminder of the fact that any of us could be a victim of senseless violence anywhere at any moment.

As more information on the origin of the alleged perpetrators is coming to light, I am concerned to note in the social media a most unfortunate misunderstanding in this respect. The Czech Republic and Chechnya are two very different entities – the Czech Republic is a Central European country; Chechnya is a part of the Russian Federation.

As the President of the Czech Republic Miloš Zeman noted in his message to President Obama, the Czech Republic is an active and reliable partner of the United States in the fight against terrorism. We are determined to stand side by side with our allies in this respect, there is no doubt about that.

Petr Gandalovič
Ambassador of the Czech Republic

Read the rest of this entry »

Comments (57)


He / she / it / none of the above

I missed this article in the Chinese edition of China Daily when it first appeared on June 20, 2012, but it raises an issue that is sufficiently important to warrant addressing now that William Steed has kindly called my attention to it:

"Qián Jīnfán:  84 suì hòu kuà xìngbié 'rénshēng de cànlàn qī cáigāng kāishǐ'” 钱今凡:84岁后跨性别 “人生的灿烂期才刚开始” ("Qian Jinfan:  'the most glorious period of a person's life only begins' after age 84 when one transcends gender")

Read the rest of this entry »

Comments (12)


Cupertinos in the spotlight

About seven years ago, in March 2006, I wrote a Language Log post about "the Cupertino effect," a term to describe spellchecker-aided "miscorrections" that might turn, say, Pakistan's Muttahida Quami Movement into the Muttonhead Quail Movement. It owes its name to European Union translators who had noticed the word cooperation getting replaced with Cupertino by a spellchecker that lacked the unhyphenated form of the word in its dictionary. Since then, I've had occasion to hold forth on the Cupertino effect in various venues (OUPblog, Der Spiegel, Radiolab, the New York Times, etc.). Now, Cupertinos are getting yet another flurry of publicity, thanks to a new book by the British tech writer Tom Chatfield called Netymology.

Read the rest of this entry »

Comments (8)


Importance of publishing data and code

J.W. writes:

In connection with some of your prior statements on the Log about the importance of publishing underlying data, you might be interested in Thomas Herndon, Michael Ash, and Robert Pollin, "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff", PERI 4/15/2013 (explanation in lay language at "Shocking Paper Claims That Microsoft Excel Coding Error Is Behind The Reinhart-Rogoff Study On Debt", Business Insider 4/16/2013). In sum, a look at the data spreadsheet underlying a really influential 2010 economics paper reveals that its results were driven by selective data exclusions, idiosyncratic weighting, and an Excel coding error [!].

Read the rest of this entry »

Comments (13)


On the other hand, alone

My faith in the possibility of integrity and self-criticism in humankind got a real boost the other day when I read a post on Lingua Franca in which an editor (who is also a professor in an English department) stopped to think about whether she was in the right about a construction she had been proscribing for years in the journal papers she edited, and decided that she wasn't.

Is it legitimate to say "On the other hand, …" in a text where you have not first used "On the one hand, …"? Professor Anne Curzan thought the answer was no. And for years she told authors to change on the other hand to something like in contrast if they hadn't got a preceding instance of on the one hand somewhere nearby. But then one day she got to thinking: Am I right? Is it really an error to use on the other hand alone? So she did what people interested in grammar only rarely do: she started looking at the evidence, and decided that it refuted her rule.

Read the rest of this entry »

Comments off


Boostez votre carrière

R.S. writes:

Remember when using English words to create French counterparts was considered (I believe this is the technical term) a shonda?

Me neither. Still the case in Quebec, apparently, where the STOP signs say ARRET, but in the Hexagon apparently not so much.

In support of his case, he sends along this ad from Le Monde:

Read the rest of this entry »

Comments (59)


Keep it vague

The buses run by Lothian Buses in Edinburgh currently have a prominent sign near the entrance that says "REVISED Adult Fare".

Revised. I will leave it to you to guess whether the fare has been revised upward or downward.

Comments off