Language Log

Who's the sponsor?

December 10, 2018 @ 7:00 pm· Filed by Mark Liberman under Ambiguity, Computational linguistics

A few weeks ago I attended the last afternoon of Scale By The Bay 2018 ("So much for Big Data", 11/18/2018), and as a result, this arrived today by email:

We had a blast at Scale by the Bay. We hope you did, too. As a sponsor, the organizer has shared your email with us. If you would like to receive messages from Xxxxxxxxx, please opt-in to our mailing list.

Read the rest of this entry »

Permalink Comments (6)

The literary Turing Test

November 8, 2018 @ 2:30 pm· Filed by Mark Liberman under Computational linguistics, Psychology of language

Yesterday's SMBC:

Read the rest of this entry »

Permalink Comments (12)

"Human parity" in machine translation

November 6, 2018 @ 7:51 am· Filed by Mark Liberman under Computational linguistics

In May of 2015, I gave a talk at the Centre Cournot in Paris on the topic "Why Human Language Technology (almost) works", starting with a list of notable successes, including how well Google and Bing on-line translation did on the Centre Cournot's web site. But my theme required a few failures as well, and I found a spectacular set of examples when I tried a chapter-opening from a roman policier that I was reading (Yasmina Khadra, Le Dingue au Bistouri):

Il y a quatre choses que je déteste. Un: qu'on boive dans mon verre. Deux: qu'on se mouche dans un restaurant. Trois: qu'on me pose un lapin.

Google Translate: There are four things I hate. A: we drink in my glass. Two: we will fly in a restaurant. Three: I get asked a rabbit.

Bing Translate: There are four things that I hate. One: that one drink in my glass. Two: what we fly in a restaurant. Three: only asked me a rabbit.

Should be: There are four things I hate. One: that somebody drinks from my glass. Two: that somebody blows their nose in a restaurant. Three: that somebody stands me up.

Read the rest of this entry »

Permalink Comments (39)

Autoresponses

November 4, 2018 @ 1:39 pm· Filed by Mark Liberman under Computational linguistics

SMBC on the future of helpful gmail, a few days ago:

Read the rest of this entry »

Permalink Comments (4)

LRNLP 2018

August 18, 2018 @ 5:45 am· Filed by Mark Liberman under Computational linguistics, Phonetics and phonology

On Monday, I'm pursuing the quixotic enterprise of talking to an NLP workshop about phonetics.

LRNLP ("Language Resources for NLP") 2018 is a workshop associated with COLING 2018 in Santa Fe NM. My abstract:

Semi-automatic analysis of digital speech collections is transforming the science of phonetics, and offers interesting opportunities to researchers in other fields. Convenient search and analysis of large published bodies of recordings, transcripts, metadata, and annotations – as much as three or four orders of magnitude larger than a few decades ago – has created a trend towards “corpus phonetics,” whose benefits include greatly increased researcher productivity, better coverage of variation in speech patterns, and essential support for reproducibility.

The results of this work include insight into theoretical questions at all levels of linguistic analysis, as well as applications in fields as diverse as psychology, sociology, medicine, and poetics, as well as within phonetics itself. Crucially, analytic inputs include annotation or categorization of speech recordings along many dimensions, from words and phrase structures to discourse structures, speaker attitudes, speaker demographics, and speech styles. Among the many near-term opportunities in this area we can single out the possibility of improving parsing algorithms by incorporating features from speech as well as text.

Due to semester-initial commitments at Penn, I won't be able to stay for COLING, but I'm looking forward to an interesting day of presentations at the workshop.

Permalink Comments (2)

"Yeah day go, baby"

June 14, 2018 @ 7:15 am· Filed by Mark Liberman under Computational linguistics

Yesterday, while I was sitting in an interesting session at Speech Prosody 2018, I got a phone call that I didn't answer. The caller left a message that Google Voice transcribed this way:

Lowell is an installer sensor Grace call me. I'll pick it up. That was a break was thinking. Because you had to go to work this morning around, you know, my exact maybe go back to take the brake light. As you said you didn't feel quite right still cyber, even though I was still wearing the back. I might have something. Bye. What thank God. This f****** f*** m*********** train my f****** bank account. What I see your ex. What's your phone number? Yeah day go, baby. Does it have that switch that maybe that's what size over at light source? I'm open. Another f*****. I know what that's like I recognize. Yeah, I was.

Read the rest of this entry »

Permalink Comments (15)

AI Cyrano

June 2, 2018 @ 7:21 am· Filed by Mark Liberman under Computational linguistics, Linguistics in the comics

A recent SMBC:

Read the rest of this entry »

Permalink Comments (2)

World disfluencies

May 16, 2018 @ 6:42 am· Filed by Mark Liberman under Computational linguistics, Psychology of language, Speech technology

Disfluency has been in the news recently, for two reasons: the deployment of filled pauses in an automated conversation by Google Duplex, and a cross-linguistic study of "slowing down" in speech production before nouns vs. verbs.

Lance Ulanoff, "Did Google Duplex just pass the Turing Test?", Medium 5/8/2018:

I think it was the first “Um.” That was the moment when I realized I was hearing something extraordinary: A computer carrying out a completely natural and very human-sounding conversation with a real person. And it wasn’t just a random talk. […]

Duplex made the call and, when someone at the salon picked up, the voice AI started the conversation with: “Hi, I’m calling to book a woman’s hair cut appointment for a client, um, I’m looking for something on May third?”

Frank Seifart et al., "Nouns slow down speech: evidence from structurally and culturally diverse languages", PNAS 2018:

When we speak, we unconsciously pronounce some words more slowly than others and sometimes pause. Such slowdown effects provide key evidence for human cognitive processes, reflecting increased planning load in speech production. Here, we study naturalistic speech from linguistically and culturally diverse populations from around the world. We show a robust tendency for slower speech before nouns as compared with verbs. Even though verbs may be more complex than nouns, nouns thus appear to require more planning, probably due to the new information they usually represent. This finding points to strong universals in how humans process language and manage referential information when communicating linguistically.

Read the rest of this entry »

Permalink Comments (12)

All problems are not solved

April 30, 2018 @ 7:58 am· Filed by Mark Liberman under Computational linguistics

There's an impression among some people that "deep learning" has brought computer algorithms to the point where there's nothing left to do but to work out the details of further applications. This reminds me of what has been described as Ludwig Wittgenstein's belief in the early 1920s that the development of formal logic and the "picture theory" of meaning in his Tractatus Logico-Philosophicus reduced the elucidation (or dissolution) of all philosophical questions to a sort of clerical procedure.

Several recent articles, in different ways, call into question this modern view that Deep Learning (i.e. complex networks of linear algebra with interspersed point nonlinearities, whose millions or billions of parameters are automatically learned from digital examples) is a philosopher's stone whose application solves all algorithmic problems. Two among many others: Gary Marcus, "Deep Learning: A Critical Appraisal", arXiv.org 1/2/2018; Michael Jordan, "Artificial Intelligence — The Revolution Hasn’t Happened Yet", Medium 4/19/2018.

And two upcoming talks describe some of the remaining problems in speech and language technology.

Read the rest of this entry »

Permalink Comments (9)

DIHARD again

April 14, 2018 @ 10:46 am· Filed by Mark Liberman under Computational linguistics

The First DIHARD Speech Diarization Challenge has results!

"Diarization" is a bit of technical jargon for "figuring out who spoke when". You can read more (than you probably want to know) about the DIHARD challenge from the earlier LLOG post ("DIHARD" 2/13/2018) the DIHARD overview page, the DIHARD data description page, our ICASSP 2018 paper, etc.

This morning's post presents some evidence from the DIHARD results showing, unsurprisingly, that current algorithms have a systematically higher error rate with shorter speech segments than with longer ones. Here's an illustrative figure:

For an explanation, read on.

Read the rest of this entry »

Permalink Comments (5)

Oxford-NINJAL Corpus of Old Japanese

April 11, 2018 @ 7:33 pm· Filed by Victor Mair under Announcements, Computational linguistics

From Bjarke Frellesvig (University of Oxford), Stephen Wright Horn (NINJAL), and Toshinobu Ogiso (NINJAL):

[VHM: NINJAL = National Institute for Japanese Language and Linguistics]

We are very pleased to announce the first public release of the
Oxford-NINJAL Corpus of Old Japanese (ONCOJ). We will be grateful if you
would circulate and share this information as appropriate.

The corpus is avallable through this website: http://oncoj.ninjal.ac.jp/

Read the rest of this entry »

Permalink Comments (4)

Alexa laughs

March 8, 2018 @ 11:11 am· Filed by Mark Liberman under Computational linguistics, Elephant semifics

Now that speech technology is good enough that voice interaction with devices is becoming widespread and routine, success has created a new problem: How should a device tell when to attend to ambient sounds and try to interpret them as questions or commands?

One solution is to require a mouse click or a finger press to start things off — but this can degrade the whole "ever-attentive servant" experience. So increasingly such systems rely on a key phrase like "Hey Siri" or "OK Google" or "Alexa". But this solution brings up other problems, since users don't like the idea of their entire life's soundtrack streaming to Apple or Google or Amazon. And anyhow, streaming everything to the Mother Ship might strain battery life and network bandwidth for some devices. The answer: Create simple, low-power device-local programs that do nothing but monitor ambient audio for the relevant magic phrase.

Problem: these programs aren't yet very good. Result: lots of false positives. Mostly the false positives are relatively benign — see e.g. "Annals of helpful surveillance", 5/9/2017. But recently, many people have been creeped out by Alexa laughing at them, apparently for no reason: