Archive for Computational linguistics

More Deep Translation arcana

At Riddled, sometime LLOG commenter Smut Clyde has posted an impressive series of Goofle Translate experiments. You can read them at the links below — I've added locally-stored images, based on previous experience with bit rot as well as recent advice from James Angleton.

"Mayor Snorkum will lay the cake" [Snorkum1]
"Reveal to me the unknown tongue": [UnknownTongue1, UnknownTongue2, UnknownTongue3, UnknownTongue4, UnknownTongue5]
"Go home, Google Translate. You are drunk.": [Lovecraft1, Lovecraft2, Lovecraft3]

Read the rest of this entry »

Comments (13)

Your gigantic crocodile!

One more piece of Google Translate poetry, contributed by Mackenzie Morris:


Read the rest of this entry »

Comments (24)

"I have gone into my own way"

In a series of recent posts we've explored the fun side of recursive weighted sums and point nonlinearities as a translation algorithm: "What a tangled web they weave", "A long short-term memory of Gertrude Stein", "Electric sheep", "The sphere of the sphere is the sphere of the sphere". But the featured translations have all involved inputs of characters in kana, hangul, Thai script, and other non-Latin alphabets, and it's natural to wonder whether this is an essential part of the game.

No — here are various repetitions of "è ", "îî ", and "îè "  translated from Greek:

è è è è è è è è è è Things to Do
è è è è è è è è è è è Date of Issue No.
è è è è è è è è è è è è May 2009
îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî I have forsaken myself for it to be with you
îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî îî I have resuscitated myself for my own sake I have forgiven myself for myself
îè îè îè îè îè îè îè îè îè You're going to be yours
îè îè îè îè îè îè îè îè îè îè îè îè îè You'll be out of your way
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You're on your way out of the sun
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You're on your way back to your day
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You are on your way back to the day you are in your country
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You have been signed in. You have signed in. You have signed in.
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You are on your way to the last day of your stay. You have reached the last day of your stay.
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You have finished your call and have signed in.
îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè îè You have been signed in. You have made a call. You are on your way. You are on your way. You have signed in.

Read the rest of this entry »

Comments (20)

PR push for "Voice Stress Analysis" products?

A Craigslist ad posted 20 days ago — "Seeking a Blog Writer for Voice Stress Analysis Technology":

We are looking for someone to ghostwrite blog posts and articles for a large company that specializes in computer-aided voice stress analysis technology or CVSA. We want you to primarily discuss the scientific research backing it up and the psychophysiological processes involved in implementing the technology. Basically, we want you to describe how it works, why it works, and why it is an effective technology, with everything backed up by scientific research and facts. […]

We are seeking a motivated, passionate, enthusiastic ghostwriter to craft blog articles ranging loosely from 750-900 words, that are valuable and informative to our target audience. Our audience for this client is law enforcement agencies, military, intelligence, immigration, and any other section of our government or private law practices that will be using investigative interviewing methods to screen subjects.

Read the rest of this entry »

Comments (6)

The sphere of the sphere is the sphere of the sphere

In a comment on "Electric Sheep", Tim wrote:

Just want to share a little Google Translate poetry resulting from drumming my fingers on the keyboard while set to Thai:

There are six sparks in the sky, each with six spheres. The sphere of the sphere is the sphere of the sphere.

Read the rest of this entry »

Comments (13)

Electric sheep

A couple of recent LLOG posts ("What a tangled web they weave", "A long short-term memory of Gertrude Stein") have illustrated the strange and amusing results that Google's current machine translation system can produce when fed variable numbers of repetitions of meaningless letter sequences in non-Latin orthographic systems. [Update: And see posts in the elephant semifics category for many other examples.] Geoff Pullum has urged me to explain how and why this sort of thing happens:

I think Language Log readers deserve a more careful account, preferably from your pen, of how this sort of craziness can arise from deep neural-net machine translation systems. […]

Ordinary people imagine (wrongly) that Google Translate is approximating the process we call translation. They think that the errors it makes are comparable to a human translator getting the wrong word (or the wrong sense) from a dictionary, or mistaking one syntactic construction for another, or missing an idiom, and thus making a well-intentioned but erroneous translation. The phenomena you have discussed reveal that something wildly, disastrously different is going on.  

Something nonlinear: 18 consecutive repetitions of a two-character Thai sequence produce "This is how it is supposed to be", and so do 19, 20, 21, 22, 23, and 24, and then 25 repetitions produces something different, and 26 something different again, and so on. What will come out in response to a given input seems informally to be unpredictable (and I'll bet it is recursively unsolvable, too; it's highly reminiscent of Emil Post's famous tag system where 0..X is replaced by X00 and 1..X is replaced by X1101, iteratively).

Type "La plume de ma tante est sur la table" into Google Translate and ask for an English translation, and you get something that might incline you, if asked whether you would agree to ride in a self-driving car programmed by the same people, to say yes. But look at the weird shit that comes from inputting Asian language repeated syllable sequences and you not only wouldn't get in the car, you wouldn't want to be in a parking lot where it was driving around on a test run. It's the difference between what might look like a technology nearly ready for prime time and the chaotic behavior of an engineering abortion that should strike fear into the hearts of any rational human.  

Language Log needs at least a sketch of a proper serious account of what's going on here.

A sketch is all that I have time for today, but here goes…

Read the rest of this entry »

Comments (38)

A long short-term memory of Gertrude Stein

As just observed ("What a tangled web they weave"), successive repetitions of short sequences of Japanese, Korean, Thai (and perhaps other types of) characters cause Google's Neural Machine Translation system to generate surprisingly varied and poetic English equivalents.

Thus if we repeat 1 through 25 times the two-character Thai sequence ไๅ

|ไ| 0x0E44 "THAI CHARACTER SARA AI MAIMALAI"
|ๅ| 0x0E45 "THAI CHARACTER LAKKHANGYAO"

the system, "a deep LSTM network with 8 encoder and 8 decoder layers using attention, residual connections, and trans-temporal chthonic affinity", establishes a pretty solid spiritual connection with Gertrude Stein:

Read the rest of this entry »

Comments (14)

What a tangled web they weave

Comments (31)

Country list translation oddity

This is weird, and even slightly creepy — paste a list of countries like

Costa Rica, Argentina, Belgium, Bulgaria, Canada, Chile, Colombia, Dominican Republic, Ecuador, El Salvador, Ethiopia, France, Germany, England, Guatemala, Honduras, Italy, Israel, Mexico, New Zealand, Nicaragua, Peru, Puerto Rico, Scotland, Switzerland, Spain, Sweden, Uruguay, Venezuela, USA

into Google Translate English-to-Spanish, and a parallel-universe list emerges:

Read the rest of this entry »

Comments (22)

Advances in birdsong modeling

Eve Armstrong and Henry Abarbanel, "Model of the songbird nucleus HVC as a network of central pattern generators", Journal of neurophysiology, 2016:

We propose a functional architecture of the adult songbird nucleus HVC in which the core element is a "functional syllable unit" (FSU). In this model, HVC is organized into FSUs, each of which provides the basis for the production of one syllable in vocalization. Within each FSU, the inhibitory neuron population takes one of two operational states: (A) simultaneous firing wherein all inhibitory neurons fire simultaneously, and (B) competitive firing of the inhibitory neurons. Switching between these basic modes of activity is accomplished via changes in the synaptic strengths among the inhibitory neurons. The inhibitory neurons connect to excitatory projection neurons such that during state (A) the activity of projection neurons is suppressed, while during state (B) patterns of sequential firing of projection neurons can occur. The latter state is stabilized by feedback from the projection to the inhibitory neurons. Song composition for specific species is distinguished by the manner in which different FSUs are functionally connected to each other.

Ours is a computational model built with biophysically based neurons. We illustrate that many observations of HVC activity are explained by the dynamics of the proposed population of FSUs, and we identify aspects of the model that are currently testable experimentally. In addition, and standing apart from the core features of an FSU, we propose that the transition between modes may be governed by the biophysical mechanism of neuromodulation.

Read the rest of this entry »

Comments off

"Bare-handed speech synthesis"

This is neat: "Pink Trombone", by Neil Thapen.

By the same author — doodal:

Comments (4)

Court fight over Oxford commas and asyndetic lists

Language Log often weighs in when courts try to nail down the meaning of a statute. Laws are written in natural language—though one might long, by formalization, to end the thousand natural ambiguities that text is heir to—and thus judges are forced to play linguist.

Happily, this week's "case in the news" is one where the lawyers managed to identify several relevant considerations and bring them to the judges for weighing.

Most news outlets reported the case as being about the Oxford comma (or serial comma)—the optional comma just before the end of a list. Here, for example, is the New York Times:

Read the rest of this entry »

Comments (20)

What's hot at ICASSP

This week I'm at IEEE ICASSP 2017 in New Orleans — that's the "Institute of Electrical and Electronics Engineers International Conference on Acoustics, Speech and Signal Processing". pronounced /aɪ 'trɪ.pl i 'aɪ.kæsp/. I've had joint papers at all the ICASSP conferences since 2010, though I'm not sure that I've attended all of them.

This year the conference distributed its proceedings on a nifty little guitar-shaped USB key, which I promptly copied to my laptop for easier access. I seem to have deleted my local copies of most of the previous proceedings, but ICASSP 2014 escaped the reaper, so I decided to while away the time during one of the many parallel sessions here by running all the .pdfs (1703 in 2014, 1316 this year) through pdftotext, removing the REFERENCE sections, tokenizing the result, removing (some of the) unwordlike strings, and creating overall lexical histograms for comparison. The result is about 5 million words for 2014 and about 3.9 million words this year.

And to compare the lists, I used the usual "weighted log-odds-ratio, informative Dirichlet prior" method, as described for example in "The most Trumpish (and Bushish) words", 9/5/2015.

Read the rest of this entry »

Comments (2)