What's (still) wrong with AI text-to-speech?
Text-To-Speech technology has improved enormously over the decades — but there's still some headroom, as Lane Greene has recently underlined for me, expressing dissatisfaction with the AI-read versions of digital articles at The Economist magazine:
I downloaded a handful of "AI Narrated" stories (as the Economist calls then), and then the human-read versions for the ones that made it into print. Before getting to Lane's complaint about repetitive prosody, I noticed a few (minor) old-fashioned errors, such as this parsing (or interpretation?) problem that makes it sound like a Supreme Court tariff is ruling rains (?) inside of Donald Trump:
Or this focus problem, where the human reader helpfully contrasts dollars with euros,
…which the AI narrative failed to do:
As for the stereotyped pitch accents that Lane complained about, here's one of the first sentences in the AI version of the example story that Lane sent me (that link will send you to the slightly-revised print version):
As he observed, it sounds fine. The print version has modified the text somewhat, but you should be able to hear that the corresponding phrase deploys a more varied set of pitch accents:
We can zero in on the subject noun phrase to see as well as hear the difference, first in the AI version:
And now the human version:
You can listen to as much as you like of the two versions, and see whether you agree with Lane that "this high-then-falling curve that is fine in one sentence, but repeated 50 times in a row is awful":
|AI Reader
|Human Reader
It's easy to quantify Lane's falling-falling-falling perception by looking at syllable-scale dipole statistics, showing a two-dimensional density plot comparing time differences against pitch differences:
|AI Reader
|Human Reader
There's a lot more to say, and many more articles to look at, but that's enough for today.
David Morris said,
February 28, 2026 @ 3:37 pm
It is also possible to interpret the first example as 'reigns'.
I recently blogged about an AI-voiceover of a summary of a Korean tv series, which mispronounced almost every Korean word (mostly names, but also familiar words like kim-chai).
Simon K said,
February 28, 2026 @ 4:00 pm
I heard an advert on a podcast last week – apologies, I can't remember what for – where the voiceover pronounced the name of the product they were selling in two different ways, with the stress on different syllables.
Andy said,
February 28, 2026 @ 4:24 pm
It's 'reins' as in 'reins in', using the metaphor of controlling a horse by pulling on its reins.