Archive for Computational linguistics

"Protester dressed as Boris Johnson scales Big Ben"

Sometimes it's hard for us humans to see the intended meaning of an ambiguous phrase, like "Hospitals named after sandwiches kill five". But in other cases, the intended structure comes easily to us, and we have a hard time seeing the alternative, as in the case of "Extinction rebellion protester dressed as Boris Johnson scales Big Ben".

These two examples have essentially the same structure. There's a word that might be construed as a preposition linking a verb to a nominal argument ("named after sandwiches", "dressed as Boris Johnson"), or alternatively as a complementizer introducing a subordinate clause ("after sandwiches kill five", "as Boris Johnson scales Big Ben"). In the first example, the complementizer reading is the one the author intended, while in the second example, it's the preposition. But in both cases, most of us go for the preposition, presumably because "named after X" and "dressed as Y" are common constructions.

Read the rest of this entry »

Comments (18)

Danger: Demo!

John Seabrook, "The Next Word: Where will predictive text take us?", The New Yorker 10/14/2019:

At the end of every section in this article, you can read the text that an artificial intelligence predicted would come next.

I glanced down at my left thumb, still resting on the Tab key. What have I done? Had my computer become my co-writer? That’s one small step forward for artificial intelligence, but was it also one step backward for my own?

The skin prickled on the back of my neck, an involuntary reaction to what roboticists call the “uncanny valley”—the space between flesh and blood and a too-human machine.

Read the rest of this entry »

Comments (11)

TO THE CONTRARYGE OF THE AND THENESS

Yiming Wang et al., "Espresso: A fast end-to-end neural speech recognition toolkit", ASRU 2019:

We present ESPRESSO, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESPRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. ESPRESSO achieves state-of-the-art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-to-end systems without data augmentation, and is 4–11× faster for decoding than similar systems (e.g. ESPNET)

Read the rest of this entry »

Comments (13)

Speed vs. efficiency in speech production and reception

An interesting new paper on speech and information rates as determined by neurocognitive capacity appeared a week ago:

Christophe Coupé, Yoon Oh, Dan Dediu, and François Pellegrino, "Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche", Science Advances, 5.9 (2019):  eaaw2594. doi: 10.1126/sciadv.aaw2594.

Here's the abstract:

Language is universal, but it has few indisputably universal characteristics, with cross-linguistic variation being the norm. For example, languages differ greatly in the number of syllables they allow, resulting in large variation in the Shannon information per syllable. Nevertheless, all natural languages allow their speakers to efficiently encode and transmit information. We show here, using quantitative methods on a large cross-linguistic corpus of 17 languages, that the coupling between language-level (information per syllable) and speaker-level (speech rate) properties results in languages encoding similar information rates (~39 bits/s) despite wide differences in each property individually: Languages are more similar in information rates than in Shannon information or speech rate. These findings highlight the intimate feedback loops between languages’ structural properties and their speakers’ neurocognition and biology under communicative pressures. Thus, language is the product of a multiscale communicative niche construction process at the intersection of biology, environment, and culture.

Read the rest of this entry »

Comments (20)

Where the magic happens

From today's SMBC, an idea about AI that's obvious in retrospect but seems to be new:

Read the rest of this entry »

Comments (24)

The Voder — and "emotion"

There was an interesting story yesterday on NPR's All Things Considered, "How We Hear Our Own Voice Shapes How We See Ourselves And How Others See Us". Shankar Vedantam starts with the case of a woman whose voice was altered because her larynx was accidentally damaged during an operation, leading to a change in her personality. And then it segues into an 80-year-old crowd pleaser, the Voder:

All the way back in 1939, Homer Dudley unveiled an organ-like machine he called the "Voder". It worked using special keys and a foot pedal, and it fascinated people at the World's Fair in New York.

Helen, will you have the Voder say 'She saw me'.

She … saw … me

"That sounded awfully flat. How about a little expression? Say the sentence in answer to these questions.

Q: Who saw you?
A: SHE saw me.
Q: Whom did she see?
A: She saw ME.
Q: Well did she see you or hear you?
A: She SAW me.

Read the rest of this entry »

Comments (3)

"Douchey uses of AI"

The book for this year's Penn Reading Project is Cathy O'Neil's Weapons of Math Destruction. From the PRP's description:

We live in the age of the algorithm. Increasingly, the decisions that affect our lives—where we go to school, whether we get a car loan, how much we pay for health insurance—are being made not by humans but by mathematical models. In theory, this should lead to greater fairness: everyone is judged according to the same rules, and bias is eliminated.

But as Cathy O’Neil reveals in this urgent and necessary book, the opposite is true.

I've been seeing lots of resonances of this concern elsewhere in popular culture, for example this recent SMBC, which focuses on the deflection of responsibility:

Read the rest of this entry »

Comments (17)

Emotion detection

Taylor Telford, "‘Emotion detection’ AI is a $20 billion industry. New research says it can’t do what it claims", WaPo 7/31/2019:

In just a handful of years, the business of emotion detection — using artificial intelligence to identify how people are feeling — has moved beyond the stuff of science fiction to a $20 billion industry. Companies such as IBM and Microsoft tout software that can analyze facial expressions and match them to certain emotions, a would-be superpower that companies could use to tell how customers respond to a new product or how a job candidate is feeling during an interview. But a far-reaching review of emotion research finds that the science underlying these technologies is deeply flawed.

The problem? You can’t reliably judge how someone feels from what their face is doing.

A group of scientists brought together by the Association for Psychological Science spent two years exploring this idea. After reviewing more than 1,000 studies, the five researchers concluded that the relationship between facial expression and emotion is nebulous, convoluted and far from universal.

Read the rest of this entry »

Comments (7)

ERNIE's here — is OSCAR next?

In "Contextualized Muppet Embeddings" (2/13/2019) I noted the advent of ELMo ("Embeddings from Language Models") and BERT ("Bidirectional Encoder Representations from Transformers"), and predicted ERNiE, GRoVEr, KERMiT, …

I'm happy to say that the first of these predictions has come true:

"Baidu’s ERNIE 2.0 Beats BERT and XLNet on NLP Benchmarks", Synced 7/30/2019
"Baidu unveils ERNIE 2.0 natural language framework in Chinese and English", VentureBeat 7/30/2019

Actually I'm late reporting this, since ERNIE 1.0 came out in March:

"Baidu’s ERNIE Tops Google’s BERT in Chinese NLP Tasks", Synced 3/25/2019

But I'm ashamed to say that the Open System for Classifying Ambiguous Reference (OSCAR) is still just an idea, though I did recruit a collaborator who agreed in principle to work with me on it.

 

Comments (7)

Google needs to learn to read :-)…

John Lawler writes:

I recently had reason to ask the following question of Google

https://www.google.com/search?q=how+many+words+does+the+average+person+say+in+a+day

and the result turned up an old LL post, which is great, except the selection algorithm picked the wrong number as the answer, and even quoted the post you were complaining about as if it were true.

This should probably be brought to someone's attention, but it seems, what with the vast amounts of irony, hyperbole, bullshit, lying, and fact-checking on the net, this is not an isolated problem.

Read the rest of this entry »

Comments (14)

Blizzard Challenge: Appeal for volunteer judges

From Zhizheng Wu:

We are pleased to announce that the Blizzard Challenge 2019 listening test is now live. The paid listening tests have been running in the University of Edinburgh for two weeks and will finish by 19th July. We need your help as a listener, and to help us recruit more listeners.

Speech experts (you decide if you are one! Native speakers only please!)

http://3.16.124.227/register_expert.html

Everyone else:

http://3.16.124.227/register_volunteer.html

The test takes around 45 minutes. You can do it over several sessions,  if you prefer.

Please distribute the above URL as widely as possible, such as on your  institutional or national mailing lists, or to your students.

Update: Sorry about the lack of guidance on the fact that the synthesis is all in Chinese!  I'm traveling, with somewhat erratic internet, and took a few minutes off from packing to post the appeal without checking it out — apologies again.

 

Comments (6)

Cumulative syllable-scale power spectra

Babies start making speech-like vocalizations long before they start to produce recognizable words — various stages of these sounds are variously described as cries, grunts, coos, goos, yells, growls, squeals, and "reduplicated" or "variegated" babbling. Developmental progress is marked by variable mixtures of variable versions of these noises, and their analysis may provide early evidence of later problems. But acoustic-phonetic analysis of infant vocalizations is hindered by the fact that many sounds  (and sound-sequences) straddle category boundaries. And even for clear instances of "canonical babbling", annotators often disagree on syllable counts, making rate estimation difficult.

In "Towards automated babble metrics" (5/26/2019), I toyed with the idea that an antique work on instrumental phonetics — Potter, Koop and Green's 1947 book Visible Speech — might have suggested a partial solution:

By recording speech in such a way that its energy envelope only is reproduced, it is possible to learn something about the effects of recurrences such as occur in the recital of rimes or poetry. In one form of portrayal, the rectified speech envelope wave is speeded up one hundred times and translated to sound pattern form as if it were an audible note.

Read the rest of this entry »

Comments (4)

Obituary: Petr Sgall (1926-2019)

Professor Emeritus Petr Sgall, professor of Indo-European, Czech studies, and general linguistics at Charles University in Prague, passed away on May 28, 2019 in Prague, the day after his 93rd birthday.

Over a lifetime of distinguished work in theoretical, mathematical and computational linguistics, he did more than any other single person to keep the Prague School linguistic tradition alive and dynamically flourishing. He was the founder of mathematical and computational linguistics in the Czech Republic, and the principal developer of the Praguian theory of Functional Generative Description as a framework for the formal description of language, which has been applied primarily to Czech, but also to English and in typological studies of a range of languages.

Read the rest of this entry »

Comments (5)