Archive for Speech technology

The voices of GPS and Siri: not what you think they are

"Meet the Voice Behind Your GPS"

2:40   2/17/23

Read the rest of this entry »

Comments (1)

More AI shenanigans

Since When Does Eric Adams Speak Spanish, Yiddish and Mandarin?

He doesn’t. But New York City is using artificial intelligence to send robocalls featuring the mayor’s voice in many languages.

By Emma G. Fitzsimmons and Jeffery C. Mays, NYT (Oct. 20, 2023)


The calls to New Yorkers have a familiar ring to them. They all sound like Mayor Eric Adams — only in Spanish. Or Yiddish. Or Mandarin.

Has the mayor been taking language lessons?

The answer is no, and the truth is slightly more expensive and, in the eyes of privacy experts, far more worrisome.

The mayor is using artificial intelligence to reach New Yorkers through robocalls in a number of languages. The calls encourage people to apply for jobs in city government or to attend community events like concerts.

“I walk around sometimes and people turn around and say, ‘I just know that voice. That voice is so comforting. I enjoy hearing your voice,’” the mayor said at a recent news conference. “Now they’re able to hear my voice in their language.”

New York City’s embrace of the technology came this week as Mr. Adams announced a 50-page “action plan” for artificial intelligence — an effort to “strike a critical balance in the global A.I. conversation,” he said, by embracing its benefits while protecting New Yorkers from its pitfalls.

Read the rest of this entry »

Comments (6)

Our Lady of the Highway: A linguistic mystery

Current text-to-speech systems are pretty good. Their output is almost always comprehensible, and often pretty natural-sounding. But there are still glitches.

This morning, Dick Margulis sent an example of one common problem: inconsistent (and often wrong) stressing of complex nominals:

We have a winding road that we drive with our Google Maps navigator on, to keep us from taking a wrong turn in the woods. We have noticed that "West Woods Road" is rendered with a few different stress patterns as we go from turn to turn, and we can't come up with a hypothesis explaining the variation. Attached is a recording. It's a few minutes long because that's how long the trip takes. The background hum is the car.

I've extracted and concatenated the 11 Google Maps instructions from the four minutes and five seconds of the attached recording:

Read the rest of this entry »

Comments (30)

Green needle vs. brainstorm

Remember "Yanny vs. Laurel", the viral acoustic sensation (28.2M views) of mid-May, 2018?  It was covered extensively on Language Log (see the items under "Selected readings" below).  Now we have another supposedly ambiguous recording that has gone viral (5.3M views [posted 7/3/21]):

Read the rest of this entry »

Comments (29)

"Still advent received emails from her"

That's part of a message from one of my students.  I knew right away what he meant, but — as always — I'm curious about what causes such off-the-wall typos.  It can't be because of a spellchecker gone awry.  So I asked the student, "What type of input system do you use?  I'm trying to think about how that was produced."

He replied, "I use the bog-standard* American English input that Apple has. I think I missed the 'h' and it grabbed it from there? Maybe an additional incorrect letter?

[*This was the first time I encountered this expression, and I didn't know what it meant.]

I followed up:

just regular keyboard?

not on iPhone?

no shortcuts?     swypes?

speech recognition input?

Read the rest of this entry »

Comments (35)

Typing by voice recognition

E-mail message from my son, Thomas Krishna:

I'm using the voice recognizer to write you this message. When you do take your truck in for service at Toyota place, ask them if an exterior cleaning is included. Having visited you over the years I know that where you park a lot of tree debris falls onto your vehicles! This is no big deal, except for one thing, you don't want stuff to fall on top of your vents right in front of where the windshield is. I had this problem with my truck under the crepe myrtles at Lacey's house. For a while I tried using cardboard cutouts to cover them up but they did not last very well in the Sun and rain. I know that at your place things dropping off the trees is almost a continuous problem whereas for me it was only in the fall. So just thinking maybe you should try to find something that can cover those vents for when your truck is parked there.

Read the rest of this entry »

Comments (18)

"Unparalleled accuracy" == "Freud as a scrub woman"

A couple of years ago, in connection with the JSALT2017 summer workshop, I tried several commercial speech-to-text APIs on some clinical recordings, with very poor results. Recently I thought I'd try again, to see how things have progressed. After all, there have been recent claims of "human parity" in various speech-to-text applications, and (for example) Google's Cloud Speech-to-Text tells us that it will "Apply the most advanced deep-learning neural network algorithms to audio for speech recognition with unparalleled accuracy", and that "Cloud Speech-to-Text accuracy improves over time as Google improves the internal speech recognition technology used by Google products."

So I picked one of the better-quality recordings of neuropsychological test sessions that we analyzed during that 2017 workshop, and tried a few segments. Executive summary: general human parity in automatic speech-to-text is still a ways off, at least for inputs like these.

Read the rest of this entry »

Comments (8)

Schadenfreudeful

A moment ago, I had occasion to use the word "schadenfreudeful" in a letter to someone. Wanting to see if anyone else had ever used this word, I did a Google search, and it yielded 149 ghits. I knew exactly how to say it, so didn't need any guidance in that regard, but I was intrigued by the fact that the first listing for the word was this:

Read the rest of this entry »

Comments (34)

"Speech synthesis"

Ordinary language and technical terminology often diverge. We've covered the "passive voice" case at length. I don't think we've discussed  the fact that for botanists, cucumbers and tomatoes are berries but strawberries and raspberries aren't — but there are many examples of such terminological divergence in fields outside of linguistics. However, the technical terminology is itself sometimes vague or ambiguous in ways that lead to confusion among outsiders, and today I want to explore one case of this kind: "speech synthesis".

Read the rest of this entry »

Comments (22)

Omarosa Manigault-Newman, sound engineer

Alayna Treene, "Scoop: How Omarosa secretly taped her victims", Axios 9/3/2018:

Omarosa taped nearly every conversation she had while working in the White House, including ones with "all of the Trumps," a source who watched her make many of the tapes tells Axios. Omarosa did this with a personal phone, almost always on record mode. […]

Before heading into meetings, she would often press "record" on her personal phone — which she carried in her pocket or in a small purse.

Here at Interspeech 2018, several of us discussed over breakfast the excellent sound quality of the clips we've heard, with everyone agreeing that we wish the interviews we often work with (clinical and even sociolinguistic) were anywhere near as good.

Read the rest of this entry »

Comments (14)

Yanny vs. Laurel, pt. 2

Just when you thought you'd never have to worry about this vexing acoustic phenomenon again, "Yanny vs. Laurel: an analysis by Benjamin Munson" (5/16/18) and the comments thereto having carried out such a probing, exhaustive investigation, a 3:44 video (5/15/18) attempts surface to explain it in a way that has not yet been mentioned:

Read the rest of this entry »

Comments (27)

Yanny vs. Laurel: an analysis by Benjamin Munson

A peculiar audio clip has turned into a viral sensation, the acoustic equivalent of "the dress" — which, you'll recall, was either white and gold or blue and black, depending on your point of view. This time around, the dividing line is between "Yanny" and "Laurel."

The Yanny vs. Laurel perceptual puzzle has been fiercely debated (see coverage in the New York Times, the AtlanticVox, and CNET, for starters). Various linguists have chimed in on social media (notably, Suzy J. Styles and Rory Turnbull on Twitter). On Facebook, the University of Minnesota's Benjamin Munson shared a cogent analysis that he provided to an inquiring reporter, and he has graciously agreed to have an expanded version of his explainer published here as a guest post.

Read the rest of this entry »

Comments (117)

World disfluencies

Disfluency has been in the news recently, for two reasons: the deployment of filled pauses in an automated conversation by Google Duplex, and a cross-linguistic study of "slowing down" in speech production before nouns vs. verbs.

Lance Ulanoff, "Did Google Duplex just pass the Turing Test?", Medium 5/8/2018:

I think it was the first “Um.” That was the moment when I realized I was hearing something extraordinary: A computer carrying out a completely natural and very human-sounding conversation with a real person. And it wasn’t just a random talk. […]

Duplex made the call and, when someone at the salon picked up, the voice AI started the conversation with: “Hi, I’m calling to book a woman’s hair cut appointment for a client, um, I’m looking for something on May third?”

Frank Seifart et al., "Nouns slow down speech: evidence from structurally and culturally diverse languages", PNAS 2018:

When we speak, we unconsciously pronounce some words more slowly than others and sometimes pause. Such slowdown effects provide key evidence for human cognitive processes, reflecting increased planning load in speech production. Here, we study naturalistic speech from linguistically and culturally diverse populations from around the world. We show a robust tendency for slower speech before nouns as compared with verbs. Even though verbs may be more complex than nouns, nouns thus appear to require more planning, probably due to the new information they usually represent. This finding points to strong universals in how humans process language and manage referential information when communicating linguistically.

Read the rest of this entry »

Comments (12)