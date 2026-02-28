« previous post | next post »

Text-To-Speech technology has improved enormously over the decades — but there's still some headroom, as Lane Greene has recently underlined for me, expressing dissatisfaction with the AI-read versions of digital articles at The Economist magazine:

When we first publish a piece online, it appears with a AI-read audio. The rhythm/prosody/pitch (I'm not exactly sure which – all three?) is the same in nearly every sentence and even clause, this high-then-falling curve that is fine in one sentence, but repeated 50 times in a row is awful.

But then on Thursday, those pieces that make it into the print edition get their own, human-read version. So voilà, you have a perfect before-and-after. What I was hoping is that you could visually analyse the nature of the AI voice and compare it to the human-read version.

I downloaded a handful of "AI Narrated" stories (as the Economist calls then), and then the human-read versions for the ones that made it into print. Before getting to Lane's complaint about repetitive prosody, I noticed a few (minor) old-fashioned errors, such as this parsing (or interpretation?) problem that makes it sound like a Supreme Court tariff is ruling rains (?) inside of Donald Trump:

Your browser does not support the audio element.

Or this focus problem, where the human reader helpfully contrasts dollars with euros,

Your browser does not support the audio element.

…which the AI narrative failed to do:

Your browser does not support the audio element.

As for the stereotyped pitch accents that Lane complained about, here's one of the first sentences in the AI version of the example story that Lane sent me (that link will send you to the slightly-revised print version):

Your browser does not support the audio element.

As he observed, it sounds fine. The print version has modified the text somewhat, but you should be able to hear that the corresponding phrase deploys a more varied set of pitch accents:

Your browser does not support the audio element.

We can zero in on the subject noun phrase to see as well as hear the difference, first in the AI version:

Your browser does not support the audio element.

And now the human version:

Your browser does not support the audio element.

You can listen to as much as you like of the two versions, and see whether you agree with Lane that "this high-then-falling curve that is fine in one sentence, but repeated 50 times in a row is awful":

AI Reader Human Reader Your browser does not support the audio element. Your browser does not support the audio element.

It's easy to quantify Lane's falling-falling-falling perception by looking at syllable-scale dipole statistics, showing a two-dimensional density plot comparing time differences against pitch differences:

AI Reader Human Reader

There's a lot more to say, and many more articles to look at, but that's enough for today.

