Language Log

Archive for Speech technology

Yanny vs. Laurel: an analysis by Benjamin Munson

May 16, 2018 @ 10:20 am· Filed by Ben Zimmer under Language on the internets, Phonetics and phonology, Psycholinguistics, Speech technology

A peculiar audio clip has turned into a viral sensation, the acoustic equivalent of "the dress" — which, you'll recall, was either white and gold or blue and black, depending on your point of view. This time around, the dividing line is between "Yanny" and "Laurel."

What do you hear?! Yanny or Laurel pic.twitter.com/jvHhCbMc8I

— Cloe Feldman (@CloeCouture) May 15, 2018

The Yanny vs. Laurel perceptual puzzle has been fiercely debated (see coverage in the New York Times, the Atlantic, Vox, and CNET, for starters). Various linguists have chimed in on social media (notably, Suzy J. Styles and Rory Turnbull on Twitter). On Facebook, the University of Minnesota's Benjamin Munson shared a cogent analysis that he provided to an inquiring reporter, and he has graciously agreed to have an expanded version of his explainer published here as a guest post.

Read the rest of this entry »

Permalink Comments (117)

World disfluencies

May 16, 2018 @ 6:42 am· Filed by Mark Liberman under Computational linguistics, Psychology of language, Speech technology

Disfluency has been in the news recently, for two reasons: the deployment of filled pauses in an automated conversation by Google Duplex, and a cross-linguistic study of "slowing down" in speech production before nouns vs. verbs.

Lance Ulanoff, "Did Google Duplex just pass the Turing Test?", Medium 5/8/2018:

I think it was the first “Um.” That was the moment when I realized I was hearing something extraordinary: A computer carrying out a completely natural and very human-sounding conversation with a real person. And it wasn’t just a random talk. […]

Duplex made the call and, when someone at the salon picked up, the voice AI started the conversation with: “Hi, I’m calling to book a woman’s hair cut appointment for a client, um, I’m looking for something on May third?”

Frank Seifart et al., "Nouns slow down speech: evidence from structurally and culturally diverse languages", PNAS 2018:

When we speak, we unconsciously pronounce some words more slowly than others and sometimes pause. Such slowdown effects provide key evidence for human cognitive processes, reflecting increased planning load in speech production. Here, we study naturalistic speech from linguistically and culturally diverse populations from around the world. We show a robust tendency for slower speech before nouns as compared with verbs. Even though verbs may be more complex than nouns, nouns thus appear to require more planning, probably due to the new information they usually represent. This finding points to strong universals in how humans process language and manage referential information when communicating linguistically.

Read the rest of this entry »

Permalink Comments (12)

An experiment with echoing Echos

February 22, 2018 @ 4:20 pm· Filed by Ben Zimmer under Language and art, Language and technology, Speech technology

Henry Cooke (aka "prehensile" on GitHub) has hatched a fascinating techno-artistic experiment. He set up two Amazon Echos to talk back and forth, each repeating a text to the other, with every iteration introducing new errors. His initial inspiration was "I Am Sitting in a Room," a 1969 work of acoustic art by Alvin Lucier, in which a text is recorded and re-recorded until all that is left is the hum of resonant frequencies in the room. (You can watch a 2014 performance with Lucier here.) Rather than replicate Lucier's text, Cooke created new ones for the two Echos to vocalize, with an added wrinkle: iterations of the texts follow the Oulipo S+7 constraint, in which each noun is replaced by another noun appearing seven steps away in the dictionary. You can see the first ten iterations (using Amazon Polly to synthesize different voices) in this video.

Read the rest of this entry »

Permalink Comments (4)

Skin motion

January 5, 2018 @ 1:43 pm· Filed by Mark Liberman under Speech technology

Nia Wesley, "Girl's tattoo of late grandma's voicemail can be played with iPhone":

A Chicago singer honored her late grandmother in a unique way.

Sakyrah Morris held on to a voicemail her grandmother sent her just a month before she passed away. She got a tattoo of the voicemail's exact sound waves.

Through technology with a company called Skin Motion, Morris can hold her iPhone camera over the tattoo and hear her grandmother's voice at any moment.

Read the rest of this entry »

Permalink Comments (7)

News program presenter meets robot avatar

December 31, 2017 @ 9:56 am· Filed by Geoffrey K. Pullum under Computational linguistics, Language and computers, Linguistics in the comics, Linguistics in the news, Speech technology

Yesterday BBC's Radio 4 program "Today", the cultural counterpart of NPR's "Morning Edition", invited into the studio a robot from the University of Sheffield, the Mishalbot, which had been trained to conduct interviews by exposure to the on-air speech of co-presenter Mishal Husain. They let it talk for three minutes with the real Mishal. (video clip here, at least for UK readers; may not be available in the US). Once again I was appalled at the credulity of journalists when confronted with AI. Despite all the evidence that the robot was just parroting Mishalesque phrases, Ms Husain continued with the absurd charade, pretending politely that her robotic alter ego was really conversing. Afterward there was half-serious on-air discussion of the possibility that some day the jobs of the Today program presenters and interviewers might be taken over by robots.

The main thing differentiating the Sheffield robot from Joseph Weizenbaum's ELIZA program of 1966 (apart from a babyish plastic face and movable fingers and eyes, which didn't work well on radio) was that the Mishalbot is voice-driven (with ELIZA you had to type on a terminal). So the main technological development has been in speech recognition engineering. On interaction, the Mishalbot seemed to me to be at sub-ELIZA level. "What do you mean? Can you give an example?" it said repeatedly, at various inappropriate points.

Read the rest of this entry »

Permalink Comments off

Voice recognition for inputting

October 22, 2017 @ 7:25 pm· Filed by Victor Mair under Language and computers, Speech technology

When I'm with my sister Heidi, whether it be in Seattle or northeast Ohio or anywhere else in the world, she's often talking to Siri. She asks Siri to look up information about trees, about food, about traditional medicines, about Yoga, about genealogy, and anything else she wants to investigate. Above all, when we're driving around, she asks Siri for directions about how to get where we're going.

To me, who doesn't even own a cell phone, this is all quite miraculous. A few days ago, at the conclusion of my "Language, Script, and Society in China" class, however, a new (for me) dimension of voice recognition was demonstrated by one of the students.

Read the rest of this entry »

Permalink Comments (21)

Awesome / sugoi すごい!

October 12, 2017 @ 2:48 pm· Filed by Victor Mair under Information technology, Language and computers, Speech technology, Translation

From Diane Moderski:

Read the rest of this entry »

Permalink Comments (7)

Siri and flatulence

March 27, 2017 @ 5:16 am· Filed by Geoffrey K. Pullum under Humor, Information technology, Intelligibility, Language and food, Language and technology, Phonetics and phonology, Silliness, Speech technology, WTF

An acquaintance of mine has a new iPhone, which he carries in a pocket that is (relevantly) below waist level. He has discovered something that dramatically illustrates the difference between (i) responding to speech and (ii) responding to speech as humans do, on the basis of knowing that it is speech.

Read the rest of this entry »

Permalink Comments off

Another fake AI failure?

November 21, 2016 @ 7:13 am· Filed by Mark Liberman under Humor, Speech technology

The "silly AI doing something stupidly funny" trope is a powerful one, partly because people like to see the mighty cast down, and partly because the "silly stupid AI" stereotype is often valid.

But as with stereotypes of human groups, the most viral examples are often fakes. Take the "Voice Recognition Elevator" skit from a few years ago, which showed an ASR system that was baffled by a Scottish accent, epitomizing the plight of Scots trapped in a dehumanized world that doesn't understand them. But back in the real world, I found that when I played the YouTube version of the skit to the 2010 version of Google Voice on my cell phone, it correctly transcribed the whole thing.

And I suspect that the recent viral "tuba-to-text conversion" meme is another artful fraud.

Read the rest of this entry »

Permalink Comments (14)

Advances in tuba-to-text conversion

November 19, 2016 @ 7:13 pm· Filed by Ben Zimmer under Humor, Language and music, Speech technology

My dad accidentally texted me with voice recognition…while playing the tuba

(h/t Chris Waigl)

[Update: Mark Liberman suggests this might be some artful fakery. See: "Another fake AI failure?"]

Permalink Comments (11)

Voice recognition for English and Mandarin typing

August 24, 2016 @ 1:13 pm· Filed by Victor Mair under Language and computers, Speech technology

In all tech considered (8/24/16), Arrti Shahani has an article titled "Voice Recognition Software Finally Beats Humans At Typing, Study Finds".

Turns out voice recognition software has improved to the point where it is significantly faster and more accurate at producing text on a mobile device than we are at typing on its keyboard. That's according to a new study by Stanford University, the University of Washington and Baidu, the Chinese Internet giant. The study ran tests in English and Mandarin Chinese.

Baidu chief scientist Andrew Ng says this should not feel like defeat. "Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone. Speech has always been a much more natural way for humans to communicate with each other," he says.

Read the rest of this entry »

Permalink Comments (18)

The swazzle: a simple device for voice modulation

October 11, 2015 @ 1:04 pm· Filed by Victor Mair under Intelligibility, Speech technology, Style and register

Until two days ago, I had never heard of this word — even though I knew about Punch and Judy shows.

From Wikipedia:

A swazzle is a device made of two strips of metal bound around a cotton tape reed. The device is used to produce the distinctive harsh, rasping voice of Punch and is held in the mouth by the Professor (performer) in a Punch and Judy show.

Swazzle can also be pronounced or spelled Schwazzle or swatchel.

I like the fact that the performer is called "Professor"!

Read the rest of this entry »

Permalink Comments (17)

Swype and Voice Recognition for mobile device inputting

January 22, 2014 @ 2:14 pm· Filed by Victor Mair under Information technology, Language and computers, Language and technology, Speech technology, Writing systems

In late 2012, while visiting my son Tom in Dallas, I noticed that he was doing something very odd with his cell phone. Most people enter text into their cell phone by pressing their thumbs (or their fingertip) on the letters of a small keyboard, whether virtual or actual. But Tom was doing something altogether different: he was sliding his finger over the glass surface of his phone and somehow, by so doing, he was able to enter text. I was dumbfounded! What amazed me most of all was how casual he was about it. He'd be talking to me about something, then glance down at his cell phone, move his fingertip around on the glass, and — presto digito! — he'd have typed a message to someone and sent it off.

Read the rest of this entry »

Permalink Comments (42)

« Previous Page — « Previous Entries

Next Entries » — Next Page »

Archive for Speech technology

Yanny vs. Laurel: an analysis by Benjamin Munson

World disfluencies

An experiment with echoing Echos

Skin motion

News program presenter meets robot avatar

Voice recognition for inputting

Awesome / sugoi すごい!

Siri and flatulence

Another fake AI failure?

Advances in tuba-to-text conversion

Voice recognition for English and Mandarin typing

The swazzle: a simple device for voice modulation

Swype and Voice Recognition for mobile device inputting

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta