A new voice morphing application

« previous post |

Over the years, we've documented various applications of voice morphing technology besides the malicious creation of "deep fake" audio clips. Here's a new one: Amrit Dillon, "AI erases call centre staff’s Indian accents", The Times 3/2/2025:

A French company which operates the largest number of call centres in the world is using artificial intelligence to soften Indian accents in real time to make customer conversations easier and shorter.

Teleperformance said that it was sometimes difficult for customers calling call centres in India — and the Philippines — to understand workers’ accents, leading to frustration and longer than necessary calls.

“When you have an Indian agent on the line, sometimes it’s hard to hear, to understand,” Thomas Mackenbrock, the company’s deputy chief executive, told Bloomberg News. “The technology can neutralise the accent of the Indian speaker with zero latency. This creates more intimacy, increases customer satisfaction, and reduces the average handling time. It is a win-win for both parties.”

The software, called “accent translation”, has been developed by Sanas, a start-up based in Palo Alto, California.

The article quotes an objection:

Akhilesh Agarwal, 28, worked in a call centre in Bangalore for two years years and disliked every minute of it. He is not amused by the AI accent tool.

“It’s not really neutralisation is it? That’s just sugaring the pill. It’s favouring an American or British accent above an Indian one. I wonder if they’d neutralise a Scottish or Irish accent? I doubt it,” he said.

A more extensive critique can be found in Payne et al., "Real-time accent-altering technology: The message is clear, and it is dehumanizing", PsyArXiv 2023. FWIW, my own opinion is that a system of this type is meant to fool its users rather than to degrade its employees, though it probably has that effect as well. And the moral issues involved go back to Henry Higgins in Shaw's Pygmalion,  who aimed to accomplish the same sort of accent transformation by non-computational means.

But sociolinguistic prejudices aside, I'm not yet convinced that the Sanas system actually works. From the Times article:

Some demo videos on YouTube from two years ago, when the technology was first developed, show strong Indian and African accents becoming instantly clearer and more American-sounding.

The field learned more than 50 years that "evaluation by demo" can't be trusted — and if a creditable third-party evaluation of Sanas' technology has been done, I can't find any documentation of it. The company's website is here, and includes some percentage-decorated claims of system performance:

However, they don't seem to offer any explanation of where these numbers came from.

Still, a reliable system of this type will certainly become possible within a few years, whether or not it works well now. If you're involved in product development, may I suggest the name "Pygmalion"?

Some previous relevant posts:

"Negotiating with hallucinations", 8/9/2012
"Deep fake audio", 7/17/2021
"Spontaneous SCOTUS", 3/2/2024
"Brown Revisited", 5/15/2024
"Negotiating with hallucinations: Two controlled trials", 10/30/2024



6 Comments »

  1. Wally said,

    March 4, 2025 @ 11:11 am

    I, for one, would very much neutralize a Scottish accent, so I could have some clue as to what they are saying.

  2. Gregory Kusnick said,

    March 4, 2025 @ 11:14 am

    I'm guessing the accent translation is not bidirectional, i.e. it does not apply an Indian accent to the customer's speech for the benefit of the service agent. So not quite a win-win then.

    As an aside, "a win-win for both parties" seems redundant, unless "win-win" has been reduced to a kind of intensifier of "win", so that a win-win for just one party would be sensible.

  3. KeithB said,

    March 4, 2025 @ 11:34 am

    It might help for Air Traffic control:
    https://languagelog.ldc.upenn.edu/nll/?p=41599

  4. rosie said,

    March 4, 2025 @ 12:20 pm

    I, for one, would very much neutralize most US people's accents. They're about as hard to understand as Scots.

  5. Robot Therapist said,

    March 4, 2025 @ 1:30 pm

    I listened to the first sample on that page. The main benefit of the processing (not sure in what sense it's "AI") was that it removed the sound of other call centre operators in the background.

  6. Jarek Weckwerth said,

    March 4, 2025 @ 3:25 pm

    I'm not convinced about this dehumanization thing. Systems that do real-time interpreting between "different languages" are considered good, aren't they? The EU is heading in that exact direction in Parliament, for example. Interpreting between Danish and Swedish will be no different to this, and I doubt if the Danish and Swedish EMPs are likely to find it dehumanizing.

RSS feed for comments on this post · TrackBack URI

Leave a Comment