Language Log

Speech-To-Text not quite perfect yet….

January 15, 2025 @ 12:55 pm· Filed by Mark Liberman under Artificial intelligence, Speech technology

Yesterday on YouTube, "Former White House chief strategist Steve Bannon sits down with Dasha Burns, POLITICO's White House bureau chief". At the end of the interview, there's a conventional exchange of thank-yous. From Dasha Burns:

All right Steve, I know you got a show to record,
thank you so much for- for beaming in here
and uh sorry for the technical difficulties everyone.
Steve thanks so much.

And Steve Bannon's response:

Dasha thank you,
and thank Politico for having me.

Read the rest of this entry »

Permalink Comments (11)

Unnatural audibles

March 8, 2024 @ 2:21 pm· Filed by Victor Mair under Artificial intelligence, Language and literature, Speech technology

I'm so far behind the times with gadgets and trinkets and services that I have never listened to a single audiobook, and I had never even heard of Audible until yesterday when Gene Hill told me that his wife, Marri, listens to tons of Audibles because she writes reviews, and as a result they give her lots of free stories to read. Of late, the publishers of Audibles are using narration by AI.

No way to overemphasize the importance of the quality of narration in an Audible. Marri most often prefers to have the author do the narration. Only the author knows how to express the precise emotional quality to a line. Or deliver the right touch of sarcasm.

Read the rest of this entry »

Permalink Comments (7)

The voices of GPS and Siri: not what you think they are

February 29, 2024 @ 8:13 pm· Filed by Victor Mair under Speech technology, Voice recognition

"Meet the Voice Behind Your GPS"

2:40 2/17/23

Read the rest of this entry »

Permalink Comments (8)

More AI shenanigans

October 21, 2023 @ 5:55 am· Filed by Victor Mair under Artificial intelligence, Language and the law, Speech technology, Voice recognition

Since When Does Eric Adams Speak Spanish, Yiddish and Mandarin?

He doesn’t. But New York City is using artificial intelligence to send robocalls featuring the mayor’s voice in many languages.

By Emma G. Fitzsimmons and Jeffery C. Mays, NYT (Oct. 20, 2023)

The calls to New Yorkers have a familiar ring to them. They all sound like Mayor Eric Adams — only in Spanish. Or Yiddish. Or Mandarin.

Has the mayor been taking language lessons?

The answer is no, and the truth is slightly more expensive and, in the eyes of privacy experts, far more worrisome.

The mayor is using artificial intelligence to reach New Yorkers through robocalls in a number of languages. The calls encourage people to apply for jobs in city government or to attend community events like concerts.

“I walk around sometimes and people turn around and say, ‘I just know that voice. That voice is so comforting. I enjoy hearing your voice,’” the mayor said at a recent news conference. “Now they’re able to hear my voice in their language.”

New York City’s embrace of the technology came this week as Mr. Adams announced a 50-page “action plan” for artificial intelligence — an effort to “strike a critical balance in the global A.I. conversation,” he said, by embracing its benefits while protecting New Yorkers from its pitfalls.

Read the rest of this entry »

Permalink Comments (6)

Our Lady of the Highway: A linguistic mystery

September 8, 2022 @ 12:19 pm· Filed by Mark Liberman under Computational linguistics, Speech technology

Current text-to-speech systems are pretty good. Their output is almost always comprehensible, and often pretty natural-sounding. But there are still glitches.

This morning, Dick Margulis sent an example of one common problem: inconsistent (and often wrong) stressing of complex nominals:

We have a winding road that we drive with our Google Maps navigator on, to keep us from taking a wrong turn in the woods. We have noticed that "West Woods Road" is rendered with a few different stress patterns as we go from turn to turn, and we can't come up with a hypothesis explaining the variation. Attached is a recording. It's a few minutes long because that's how long the trip takes. The background hum is the car.

I've extracted and concatenated the 11 Google Maps instructions from the four minutes and five seconds of the attached recording:

Read the rest of this entry »

Permalink Comments (30)

Green needle vs. brainstorm

October 2, 2021 @ 7:25 am· Filed by Victor Mair under Language on the internets, Phonetics and phonology, Pronunciation, Psycholinguistics, Speech technology

Remember "Yanny vs. Laurel", the viral acoustic sensation (28.2M views) of mid-May, 2018? It was covered extensively on Language Log (see the items under "Selected readings" below). Now we have another supposedly ambiguous recording that has gone viral (5.3M views [posted 7/3/21]):

Read the rest of this entry »

Permalink Comments (29)

"Still advent received emails from her"

September 17, 2020 @ 5:18 pm· Filed by Victor Mair under Errors, Language and computers, Miswriting, Speech technology

That's part of a message from one of my students. I knew right away what he meant, but — as always — I'm curious about what causes such off-the-wall typos. It can't be because of a spellchecker gone awry. So I asked the student, "What type of input system do you use? I'm trying to think about how that was produced."

He replied, "I use the bog-standard* American English input that Apple has. I think I missed the 'h' and it grabbed it from there? Maybe an additional incorrect letter?

[*This was the first time I encountered this expression, and I didn't know what it meant.]

I followed up:

just regular keyboard?

not on iPhone?

no shortcuts? swypes?

speech recognition input?

Read the rest of this entry »

Permalink Comments (35)

Typing by voice recognition

August 22, 2020 @ 9:40 pm· Filed by Victor Mair under Language and computers, Speech technology, Typography

E-mail message from my son, Thomas Krishna:

I'm using the voice recognizer to write you this message. When you do take your truck in for service at Toyota place, ask them if an exterior cleaning is included. Having visited you over the years I know that where you park a lot of tree debris falls onto your vehicles! This is no big deal, except for one thing, you don't want stuff to fall on top of your vents right in front of where the windshield is. I had this problem with my truck under the crepe myrtles at Lacey's house. For a while I tried using cardboard cutouts to cover them up but they did not last very well in the Sun and rain. I know that at your place things dropping off the trees is almost a continuous problem whereas for me it was only in the fall. So just thinking maybe you should try to find something that can cover those vents for when your truck is parked there.

Read the rest of this entry »

Permalink Comments (18)

"Unparalleled accuracy" == "Freud as a scrub woman"

April 27, 2019 @ 7:38 am· Filed by Mark Liberman under Computational linguistics, Elephant semifics, Speech technology

A couple of years ago, in connection with the JSALT2017 summer workshop, I tried several commercial speech-to-text APIs on some clinical recordings, with very poor results. Recently I thought I'd try again, to see how things have progressed. After all, there have been recent claims of "human parity" in various speech-to-text applications, and (for example) Google's Cloud Speech-to-Text tells us that it will "Apply the most advanced deep-learning neural network algorithms to audio for speech recognition with unparalleled accuracy", and that "Cloud Speech-to-Text accuracy improves over time as Google improves the internal speech recognition technology used by Google products."

So I picked one of the better-quality recordings of neuropsychological test sessions that we analyzed during that 2017 workshop, and tried a few segments. Executive summary: general human parity in automatic speech-to-text is still a ways off, at least for inputs like these.

Read the rest of this entry »

Permalink Comments (8)

Schadenfreudeful

April 20, 2019 @ 12:42 pm· Filed by Victor Mair under Pronunciation, Speech technology

A moment ago, I had occasion to use the word "schadenfreudeful" in a letter to someone. Wanting to see if anyone else had ever used this word, I did a Google search, and it yielded 149 ghits. I knew exactly how to say it, so didn't need any guidance in that regard, but I was intrigued by the fact that the first listing for the word was this:

Read the rest of this entry »

Permalink Comments (34)

"Speech synthesis"

February 16, 2019 @ 9:33 am· Filed by Mark Liberman under Speech technology

Ordinary language and technical terminology often diverge. We've covered the "passive voice" case at length. I don't think we've discussed the fact that for botanists, cucumbers and tomatoes are berries but strawberries and raspberries aren't — but there are many examples of such terminological divergence in fields outside of linguistics. However, the technical terminology is itself sometimes vague or ambiguous in ways that lead to confusion among outsiders, and today I want to explore one case of this kind: "speech synthesis".

Read the rest of this entry »

Permalink Comments (22)

Omarosa Manigault-Newman, sound engineer

September 3, 2018 @ 9:26 pm· Filed by Mark Liberman under Speech technology

Alayna Treene, "Scoop: How Omarosa secretly taped her victims", Axios 9/3/2018:

Omarosa taped nearly every conversation she had while working in the White House, including ones with "all of the Trumps," a source who watched her make many of the tapes tells Axios. Omarosa did this with a personal phone, almost always on record mode. […]

Before heading into meetings, she would often press "record" on her personal phone — which she carried in her pocket or in a small purse.

Here at Interspeech 2018, several of us discussed over breakfast the excellent sound quality of the clips we've heard, with everyone agreeing that we wish the interviews we often work with (clinical and even sociolinguistic) were anywhere near as good.

Read the rest of this entry »

Permalink Comments (14)

Yanny vs. Laurel, pt. 2

May 25, 2018 @ 7:07 am· Filed by Victor Mair under Language on the internets, Phonetics and phonology, Pronunciation, Psycholinguistics, Speech technology

Just when you thought you'd never have to worry about this vexing acoustic phenomenon again, "Yanny vs. Laurel: an analysis by Benjamin Munson" (5/16/18) and the comments thereto having carried out such a probing, exhaustive investigation, a 3:44 video (5/15/18) attempts surface to explain it in a way that has not yet been mentioned:

Read the rest of this entry »

Permalink Comments (27)

Archive for Speech technology

Speech-To-Text not quite perfect yet….

Unnatural audibles

The voices of GPS and Siri: not what you think they are

More AI shenanigans

Our Lady of the Highway: A linguistic mystery

Green needle vs. brainstorm

"Still advent received emails from her"

Typing by voice recognition

"Unparalleled accuracy" == "Freud as a scrub woman"

Schadenfreudeful

"Speech synthesis"

Omarosa Manigault-Newman, sound engineer

Yanny vs. Laurel, pt. 2

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta