Remaining problems with TTS
« previous post | next post »
(…and with the New York Department of Environmental Conservation…)
Like many other online text sites, the New York Times now offers synthetic text-to-speech readings for (most of) its stories. TTS quality has improved enormously since the 1980s, when I worked with Bill Dunn from Dow Jones Information Services on (the idea of) a pre-internet version of digital news delivery, including synthesized audio versions. (See "Thanks, Bill Dunn!", 8/6/2009, for a bit more of the story.)
And this morning, while doing some brainless form checking, I listened to the audio version of Victor Mather and Jesus Jiménez, "After 7 Years, P’Nut the Squirrel Is Taken Away and Then Put Down", NYT 11/1/2024, which starts this way:
P’Nut, a pet squirrel with a popular Instagram page, was seized by state government officials on Wednesday in Pine City, N.Y., and later euthanized to test for rabies.
The TTS quality is excellent overall, though there are some sub-optimal prosodic choices (about which more later). What caught my ear, though, was the pronunciation of "P'Nut" as /pə'nʌt/, which seemed like an unlikely choice for how to pronounce the name of a beloved pet squirrel.
And indeed, Mark Longo's Instagram page, a TMZ interview and other sources) all confirm that "P'nut" should be pronounced the same way as the word "peanut".
Why did I think that? I'm not entirely sure — but /pə'nʌt/ just didn't seem plausible, and /'pi.nʌt/ did. For a bit more on the problems involved, see the (perhaps out-of-date?) discussion here of Richard Sproat's 2018 talk "Neural models of text normalization for speech applications".
The NYT site doesn't offer any details about the source of their TTS, as far as I can tell. Speechify seems to be claiming it, but reading down the page a bit, they're just inviting us to paste NYT stories into their app:
Since the New York Times website doesn’t offer a TTS tool, you will need to get one on your own. And the process is quite simple. All you need to do is find an app you will enjoy using, and start listening to the New York Times. […]
If you are wondering which app is the best for listening to the New York Times and other news outlets, you should know that Speechify is your best friend.
Whatever system produced it, the NYT synthesized audio for the whole P'Nut article is here.
Update — this 4/2/2024 Axios article says that "The Times built the narrated voice technology in partnership with a generative artificial intelligence company, but executives declined to say which firm they're working with."
Nathasn said,
November 2, 2024 @ 10:32 am
Humans pronounce it as /'pi.nʌt/ because it reminds us of something–we get the reference. Even really advanced software doesn't search for meanings that way.
Mark Liberman said,
November 2, 2024 @ 10:37 am
@Nathasn: "Humans pronounce it as /'pi.nʌt/ because it reminds us of something–we get the reference. Even really advanced software doesn't search for meanings that way."
But Google gets it:
I'm not sure that "reminds" is the right description for the method involved, but dealing with non-standard spellings is an old and partly-solved problem. However, here's another recent example where the NYT TTS system does the wrong thing. In "What’s That in Your Mouth, Bro?", 10/31/2024 — a story about Zyn nicotine pouches — we get:
After Senator Chuck Schumer, the Democratic majority leader, asked the Food and Drug Administration in January to take a closer look at Zyn’s practices, warning that the pouches posed a danger to teenagers, the response from Republicans was swift.
Representative Marjorie Taylor Greene of Georgia called for a “Zynsurrection” on X, […]
That should have been /ˌzɪn.sɚ'rεk.ʃən/, not /ˌzɐʲ.ɪn.sɚ'rεk.ʃən/, right?
Noam said,
November 2, 2024 @ 10:54 am
@Nathasan “Even really advanced software doesn't search for meanings that way.” Arguably, that’s _exactly_ how LLMs work.
Philip Taylor said,
November 2, 2024 @ 11:04 am
Bastards. The New York State Department of Environmental Conservation, that is. And just for once I refused to bowdlerise the term I used to describe them — no milder form would adequately reflect the loathing I now feel for them. As to TTS, I am afraid that for this reader it pales into insignificance compared to the story involved.
Rodger C said,
November 2, 2024 @ 11:44 am
What Philip Taylor said. And to revert to language, when did the meaning of "euthanized" change from "put out of its suffering" to "killed (an animal) by bureaucrats for any reason whatever"? By the way, Philip, American animal agencies do this sort of thing all the time–I don't know about your country.
Mark Liberman said,
November 2, 2024 @ 11:46 am
An instagram video of P'Nut: