Language Log

Electric sheep

April 18, 2017 @ 4:16 am· Filed by Mark Liberman under Computational linguistics, Elephant semifics

A couple of recent LLOG posts ("What a tangled web they weave", "A long short-term memory of Gertrude Stein") have illustrated the strange and amusing results that Google's current machine translation system can produce when fed variable numbers of repetitions of meaningless letter sequences in non-Latin orthographic systems. [Update: And see posts in the elephant semifics category for many other examples.] Geoff Pullum has urged me to explain how and why this sort of thing happens:

I think Language Log readers deserve a more careful account, preferably from your pen, of how this sort of craziness can arise from deep neural-net machine translation systems. […]

Ordinary people imagine (wrongly) that Google Translate is approximating the process we call translation. They think that the errors it makes are comparable to a human translator getting the wrong word (or the wrong sense) from a dictionary, or mistaking one syntactic construction for another, or missing an idiom, and thus making a well-intentioned but erroneous translation. The phenomena you have discussed reveal that something wildly, disastrously different is going on.

Something nonlinear: 18 consecutive repetitions of a two-character Thai sequence produce "This is how it is supposed to be", and so do 19, 20, 21, 22, 23, and 24, and then 25 repetitions produces something different, and 26 something different again, and so on. What will come out in response to a given input seems informally to be unpredictable (and I'll bet it is recursively unsolvable, too; it's highly reminiscent of Emil Post's famous tag system where 0..X is replaced by X00 and 1..X is replaced by X1101, iteratively).

Type "La plume de ma tante est sur la table" into Google Translate and ask for an English translation, and you get something that might incline you, if asked whether you would agree to ride in a self-driving car programmed by the same people, to say yes. But look at the weird shit that comes from inputting Asian language repeated syllable sequences and you not only wouldn't get in the car, you wouldn't want to be in a parking lot where it was driving around on a test run. It's the difference between what might look like a technology nearly ready for prime time and the chaotic behavior of an engineering abortion that should strike fear into the hearts of any rational human.

Language Log needs at least a sketch of a proper serious account of what's going on here.

A sketch is all that I have time for today, but here goes…

Read the rest of this entry »

Permalink Comments (38)

Jesus is good, beef noodles are good, and so is money

April 17, 2017 @ 11:40 pm· Filed by Victor Mair under Language and food, Signs

From a Twitter account:

Read the rest of this entry »

Permalink Comments (1)

English orthography is fake news

April 17, 2017 @ 6:22 pm· Filed by Mark Liberman under Linguistics in the comics

Today's Non Sequitur:

Permalink Comments (36)

Russia is a surface but other countries are spaces?

April 17, 2017 @ 6:57 am· Filed by Mark Liberman under Morphology, Variation

In Finnish, that is. Garrett Wollman ("Some linguistic observations from my trip to Finland", Occasionally Coherent 4/14/2017) notes that Finnish morphology differentiates between "surface" and "interior" relationships of position and motion:

	toward	at	away
surface	allative -lle “onto”	adessive -lla/-llä “on” or “at”	ablative -lta/-ltä “off” or “away”
interior	illative -Vn/-hVn (for stems ending in V) “into” or “toward”	inessive -ssa/-ssä “in” or “inside of”	elative -sta/-stä “out of” or “from”

Against this background, he describes his recent experience at the World Figure Skating Championships in Helsinki.

Read the rest of this entry »

Permalink Comments (49)

I (don't) doubt that the letter is fake

April 16, 2017 @ 3:49 pm· Filed by Victor Mair under Misnegation, Semantics

Somebody just sent me a note that begins, "I don’t doubt that the letter is fake…".

Read the rest of this entry »

Permalink Comments (16)

A long short-term memory of Gertrude Stein

April 16, 2017 @ 3:07 pm· Filed by Mark Liberman under Computational linguistics, Elephant semifics, Language and culture

As just observed ("What a tangled web they weave"), successive repetitions of short sequences of Japanese, Korean, Thai (and perhaps other types of) characters cause Google's Neural Machine Translation system to generate surprisingly varied and poetic English equivalents.

Thus if we repeat 1 through 25 times the two-character Thai sequence ไๅ

|ไ| 0x0E44 "THAI CHARACTER SARA AI MAIMALAI"
|ๅ| 0x0E45 "THAI CHARACTER LAKKHANGYAO"

the system, "a deep LSTM network with 8 encoder and 8 decoder layers using attention, residual connections, and trans-temporal chthonic affinity", establishes a pretty solid spiritual connection with Gertrude Stein:

Read the rest of this entry »

Permalink Comments (14)

How not to learn Chinese

April 16, 2017 @ 11:10 am· Filed by Victor Mair under Language teaching and learning, Writing systems

In "Sinological suffering" (3/31/17), "Aphantasia — absence of the mind's eye" (3/24/17), and other recent posts, we examined the difficulty, for some the near impossibility, of mastering how to write hundreds and thousands of Chinese characters. Yet, if one wishes to become literate in Chinese, one simply must do it. Until the 21st century, there was basically only one way: rote copying of the characters to engrave them in the neuromuscular pathways of the learner.

Read the rest of this entry »

Permalink Comments (27)

What a tangled web they weave

April 15, 2017 @ 11:04 pm· Filed by Mark Liberman under Computational linguistics, Elephant semifics, Humor

…when neural nets are recursive:

Read the rest of this entry »

Permalink Comments (31)

Not not

April 15, 2017 @ 2:40 pm· Filed by Victor Mair under Grammar, Language and philosophy, Misnegation

This is NOT a post about misnegation, a frequent topic at Language Log. This is a reflection on the sublimity of nonnegation, which is not quite the same as transcendental affirmation. It is a linguistic and philosophical inquiry on the absence of nothingness.

First comes the linguistics; at the end comes the philosophy.

In Mandarin, we have expressions such as the following, where the bù 不 doesn't seem to make any sense in terms of its usual signification — "not":

suānbuliūliūde 酸不溜溜的 ("sourish; quite sour")

Read the rest of this entry »

Permalink Comments (18)

Mixed metaphor of the month

April 14, 2017 @ 8:24 am· Filed by Geoffrey K. Pullum under Jargon, Metaphors

A friend of mine who works in the Federal government recently received an email posing this rhetorical question:

How do agencies mitigate risks and achieve FedRAMP compliance in multi-tenant environments to successfully pave their way to the cloud?

He naturally wondered whether there can ever be a paved road leading to a cloud. And I naturally wondered how anyone could get paid for writing jargon-laden garbage as bad as this. We can but wonder.

(I actually live in a multi-tenant environment. It's great; all the other tenants are lovely people. But I'm not sure whether I am FedRAMP-compliant. I hope I am.)

Permalink Comments off

Thoroughly earthy

April 13, 2017 @ 8:59 pm· Filed by Victor Mair under Humor, Idioms, Proverbs

Because I like the Chinese term tǔ 土 ("earth; soil; dirt; ground; earthy; rustic; colloquial") so much, I was going to add the substance of the remarks below as a comment to the "Fun bun pun" (4/9/17) post, in which we devoted a lot of attention to one of my favorite expressions, "tǔbāozi 土包子" ("earthy steamed stuffed bun", i.e., "country bumpkin, hick, rube, clodhopper, backwoodsman, boor, dolt, yokel"). But the ramifications grew to such large proportions that they merited their own post.

Read the rest of this entry »

Permalink Comments (15)

Mongolian transliterations of Donald Trump's name

April 13, 2017 @ 8:53 am· Filed by Victor Mair under Transcription

We've looked fairly intensively at transcriptions of our new President's name in Chinese and, en passant, in Japanese, Korean, and other languages:

"Trump translated" (8/31/16) — about halfway down in the o.p.

"Transcription of "Barack Obama", "Hillary Clinton", and "Donald Trump" in the Sinosphere" (10/2/16)

"Chinese transcriptions of Donald Trump's surname" (11/23/16)

For those who are interested in how the POTUS's name and surname are rendered in Mongolian scripts, both Cyrillic and traditional Mongolian writing, we now have Bathrobe's post at Spicks & Specks:

"'Donald Trump' in Mongolian" (4/13/17)

Read the rest of this entry »

Permalink Comments (21)

You couldn't fail to miss Melania

April 12, 2017 @ 4:28 pm· Filed by Geoffrey K. Pullum under Misnegation

Mr John Kelly, an attorney for Melania Trump, reading out a statement concerning why she has just scored nearly $3million in a London libel suit (reported here in The Guardian; I reproduce the use of "right-handed" found in the article, despite its oddness):

The article was illustrated with an old photograph of the claimant standing naked with her front against a wall but her face turned towards the camera. The photograph was prominently displayed and occupied almost the entire right-handed side of page 15. Readers of the newspaper could not fail to miss the article.

But of course Mr Kelly means they could not fail to see it. Or could not possibly miss it. Semantic overnegation again, this time in a prepared statement by an attorney dealing crucially with details of language and meaning. Amazing. But richly documented in scores of previous posts here on Language Log.

Permalink Comments off

Archive for April, 2017

Electric sheep

Jesus is good, beef noodles are good, and so is money

English orthography is fake news

Russia is a surface but other countries are spaces?

I (don't) doubt that the letter is fake

A long short-term memory of Gertrude Stein

How not to learn Chinese

What a tangled web they weave

Not not

Mixed metaphor of the month

Thoroughly earthy

Mongolian transliterations of Donald Trump's name

You couldn't fail to miss Melania

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta