Archive for Computational linguistics

"Yeah day go, baby"

Yesterday, while I was sitting in an interesting session at Speech Prosody 2018, I got a phone call that I didn't answer. The caller left a message that Google Voice transcribed this way:

Lowell is an installer sensor Grace call me. I'll pick it up. That was a break was thinking. Because you had to go to work this morning around, you know, my exact maybe go back to take the brake light. As you said you didn't feel quite right still cyber, even though I was still wearing the back. I might have something. Bye. What thank God. This f****** f*** m*********** train my f****** bank account. What I see your ex. What's your phone number? Yeah day go, baby. Does it have that switch that maybe that's what size over at light source? I'm open. Another f*****. I know what that's like I recognize. Yeah, I was.

Read the rest of this entry »

Comments (15)

AI Cyrano

Comments (2)

World disfluencies

Disfluency has been in the news recently, for two reasons: the deployment of filled pauses in an automated conversation by Google Duplex, and a cross-linguistic study of "slowing down" in speech production before nouns vs. verbs.

Lance Ulanoff, "Did Google Duplex just pass the Turing Test?", Medium 5/8/2018:

I think it was the first “Um.” That was the moment when I realized I was hearing something extraordinary: A computer carrying out a completely natural and very human-sounding conversation with a real person. And it wasn’t just a random talk. […]

Duplex made the call and, when someone at the salon picked up, the voice AI started the conversation with: “Hi, I’m calling to book a woman’s hair cut appointment for a client, um, I’m looking for something on May third?”

Frank Seifart et al., "Nouns slow down speech: evidence from structurally and culturally diverse languages", PNAS 2018:

When we speak, we unconsciously pronounce some words more slowly than others and sometimes pause. Such slowdown effects provide key evidence for human cognitive processes, reflecting increased planning load in speech production. Here, we study naturalistic speech from linguistically and culturally diverse populations from around the world. We show a robust tendency for slower speech before nouns as compared with verbs. Even though verbs may be more complex than nouns, nouns thus appear to require more planning, probably due to the new information they usually represent. This finding points to strong universals in how humans process language and manage referential information when communicating linguistically.

Read the rest of this entry »

Comments (12)

All problems are not solved

There's an impression among some people that "deep learning" has brought computer algorithms to the point where there's nothing left to do but to work out the details of further applications. This reminds me of what has been described as Ludwig Wittgenstein's belief in the early 1920s that the development of formal logic and the "picture theory" of meaning in his Tractatus Logico-Philosophicus reduced the elucidation (or dissolution) of all philosophical questions to a sort of clerical procedure.

Several recent articles, in different ways, call into question this modern view that Deep Learning (i.e. complex networks of linear algebra with interspersed point nonlinearities, whose millions or billions of parameters are automatically learned from digital examples) is a philosopher's stone whose application solves all algorithmic problems. Two among many others: Gary Marcus, "Deep Learning: A Critical Appraisal", arXiv.org 1/2/2018; Michael Jordan, "Artificial Intelligence — The Revolution Hasn’t Happened Yet", Medium 4/19/2018.

And two upcoming talks describe some of the remaining problems in speech and language technology.

Read the rest of this entry »

Comments (9)

DIHARD again

The First DIHARD Speech Diarization Challenge has  results!

"Diarization" is a bit of technical jargon for "figuring out who spoke when". You can read more (than you probably want to know) about the DIHARD challenge from the earlier LLOG post ("DIHARD" 2/13/2018) the DIHARD overview page, the DIHARD data description page, our ICASSP 2018 paper, etc.

This morning's post presents some evidence from the DIHARD results showing, unsurprisingly, that current algorithms have a systematically higher error rate with shorter speech segments than with longer ones. Here's an illustrative figure:

For an explanation, read on.

Read the rest of this entry »

Comments (5)

Oxford-NINJAL Corpus of Old Japanese

From Bjarke Frellesvig (University of Oxford), Stephen Wright Horn (NINJAL), and Toshinobu Ogiso (NINJAL):

[VHM:  NINJAL = National Institute for Japanese Language and Linguistics]

We are very pleased to announce the first public release of the
Oxford-NINJAL Corpus of Old Japanese (ONCOJ). We will be grateful if you
would circulate and share this information as appropriate.

The corpus is avallable through this website: http://oncoj.ninjal.ac.jp/

Read the rest of this entry »

Comments (4)

Alexa laughs

Now that speech technology is good enough that voice interaction with devices is becoming widespread and routine, success has created a new problem: How should a device tell when to attend to ambient sounds and try to interpret them as questions or commands?

One solution is to require a mouse click or a finger press to start things off — but this can degrade the whole "ever-attentive servant" experience. So increasingly such systems rely on a key phrase like "Hey Siri" or "OK Google" or "Alexa". But this solution brings up other problems, since users don't like the idea of their entire life's soundtrack streaming to Apple or Google or Amazon. And anyhow, streaming everything to the Mother Ship might strain battery life and network bandwidth for some devices. The answer: Create simple, low-power device-local programs that do nothing but monitor ambient audio for the relevant magic phrase.

Problem: these programs aren't yet very good. Result: lots of false positives. Mostly the false positives are relatively benign — see e.g. "Annals of helpful surveillance", 5/9/2017. But recently, many people have been creeped out by Alexa laughing at them, apparently for no reason:

Read the rest of this entry »

Comments (21)

ASR error joke of the week

I suspect that this is just as unfair as the old ASR elevator in Scotland skit was, but I don't have time to try it out.

 

Comments (7)

Hearing interactions

Listen to this 3-second audio clip, and think about what you hear:

Read the rest of this entry »

Comments (28)

Flip Donkey Doodleplunk?

Barton Beebe & Jeanne Fromer, "Are We Running Out of Trademarks? An Empirical Study of Trademark Depletion and Congestion", Harvard Law Review, February 2018:

Abstract: American trademark law has long operated on the assumption that there exists an inexhaustible supply of unclaimed trademarks that are at least as competitively effective as those already claimed.  This core empirical assumption underpins nearly every aspect of trademark law and policy.  This Article presents empirical evidence showing that this conventional wisdom is wrong. The supply of competitively effective trademarks is, in fact, exhaustible and has already reached severe levels of what we term trademark depletion and trademark congestion. We systematically study all 6.7 million trademark applications filed at the U.S. Patent and Trademark Office (PTO) from 1985 through 2016 together with the 300,000 trademarks already registered at the PTO as of 1985.  We analyze these data in light of the most frequently used words and syllables in American English, the most frequently occurring surnames in the United States, and an original dataset consisting of phonetic representations of each applied-for or registered word mark included in the PTO’s Trademark Case Files Dataset. We further incorporate data consisting of all 128 million domain names registered in the .com top-level domain and an original dataset of all 2.1 million trademark office actions issued by the PTO from 2003 through 2016. These data show that rates of word-mark depletion and congestion are increasing and have reached chronic levels, particularly in certain important economic sectors.  The data further show that new trademark applicants are increasingly being forced to resort to second-best, less competitively effective marks.  Yet registration refusal rates continue to rise.  The result is that the ecology of the trademark system is breaking down, with mounting barriers to entry, increasing consumer search costs, and an eroding public domain. In light of our empirical findings, we propose a mix of reforms to trademark law that will help to preserve the proper functioning of the trademark system and further its core purposes of promoting competition and enhancing consumer welfare.

Read the rest of this entry »

Comments (21)

Alexa disguises her name?

"Alexa Loses Her Voice" won USA Today's Super Bowl Ad Meter:

I believe that this was also the first Super Bowl ad to raise a technical question about speech technology.

Read the rest of this entry »

Comments (9)

Adversarial attacks on modern speech-to-text

Generating adversarial STT examples.

In a post on this blog recently Mark Liberman raised the lively area of so-called "adversarial" attacks for modern machine learning systems. These attacks can do amusing and somewhat frightening things such as force an object recognition algorithm to identify all images as toasters with remarkably high confidence. Seeing these applied to image recognition, he hypothesized they could also be applied to modern speech recognition (STT, or speech-to-text) based on e.g. deep learning. His hypothesis has indeed been recently confirmed.

Read the rest of this entry »

Comments (7)

Ross Macdonald: lexical diversity over the lifespan

This post is an initial progress report on some joint work with Mark Liberman. It's part of a larger effort to replicate and extend Xuan Le, Ian Lancashire, Graeme Hirst, & Regina Jokel, "Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists", Literary and Linguistic Computing 2011. Their abstract:

We present a large-scale longitudinal study of lexical and syntactic changes in language in Alzheimer's disease using complete, fully parsed texts and a large number of measures, using as our subjects the British novelists Iris Murdoch (who died with Alzheimer's), Agatha Christie (who was suspected of it), and P.D. James (who has aged healthily). […] Our results support the hypothesis that signs of dementia can be found in diachronic analyses of patients’ writings, and in addition lead to new understanding of the work of the individual authors whom we studied. In particular, we show that it is probable that Agatha Christie indeed suffered from the onset of Alzheimer's while writing her last novels, and that Iris Murdoch exhibited a ‘trough’ of relatively impoverished vocabulary and syntax in her writing in her late 40s and 50s that presaged her later dementia.

Read the rest of this entry »

Comments (11)