A new voice morphing application

« previous post | next post »

Over the years, we've documented various applications of voice morphing technology besides the malicious creation of "deep fake" audio clips. Here's a new one: Amrit Dillon, "AI erases call centre staff’s Indian accents", The Times 3/2/2025:

A French company which operates the largest number of call centres in the world is using artificial intelligence to soften Indian accents in real time to make customer conversations easier and shorter.

Teleperformance said that it was sometimes difficult for customers calling call centres in India — and the Philippines — to understand workers’ accents, leading to frustration and longer than necessary calls.

“When you have an Indian agent on the line, sometimes it’s hard to hear, to understand,” Thomas Mackenbrock, the company’s deputy chief executive, told Bloomberg News. “The technology can neutralise the accent of the Indian speaker with zero latency. This creates more intimacy, increases customer satisfaction, and reduces the average handling time. It is a win-win for both parties.”

The software, called “accent translation”, has been developed by Sanas, a start-up based in Palo Alto, California.

The article quotes an objection:

Akhilesh Agarwal, 28, worked in a call centre in Bangalore for two years years and disliked every minute of it. He is not amused by the AI accent tool.

“It’s not really neutralisation is it? That’s just sugaring the pill. It’s favouring an American or British accent above an Indian one. I wonder if they’d neutralise a Scottish or Irish accent? I doubt it,” he said.

A more extensive critique can be found in Payne et al., "Real-time accent-altering technology: The message is clear, and it is dehumanizing", PsyArXiv 2023. FWIW, my own opinion is that a system of this type is meant to fool its users rather than to degrade its employees, though it probably has that effect as well. And the moral issues involved go back to Henry Higgins in Shaw's Pygmalion,  who aimed to accomplish the same sort of accent transformation by non-computational means.

But sociolinguistic prejudices aside, I'm not yet convinced that the Sanas system actually works. From the Times article:

Some demo videos on YouTube from two years ago, when the technology was first developed, show strong Indian and African accents becoming instantly clearer and more American-sounding.

The field learned more than 50 years that "evaluation by demo" can't be trusted — and if a creditable third-party evaluation of Sanas' technology has been done, I can't find any documentation of it. The company's website is here, and includes some percentage-decorated claims of system performance:

However, they don't seem to offer any explanation of where these numbers came from.

Still, a reliable system of this type will certainly become possible within a few years, whether or not it works well now. If you're involved in product development, may I suggest the name "Pygmalion"?

Some previous relevant posts:

"Negotiating with hallucinations", 8/9/2012
"Deep fake audio", 7/17/2021
"Spontaneous SCOTUS", 3/2/2024
"Brown Revisited", 5/15/2024
"Negotiating with hallucinations: Two controlled trials", 10/30/2024



19 Comments »

  1. Wally said,

    March 4, 2025 @ 11:11 am

    I, for one, would very much neutralize a Scottish accent, so I could have some clue as to what they are saying.

  2. Gregory Kusnick said,

    March 4, 2025 @ 11:14 am

    I'm guessing the accent translation is not bidirectional, i.e. it does not apply an Indian accent to the customer's speech for the benefit of the service agent. So not quite a win-win then.

    As an aside, "a win-win for both parties" seems redundant, unless "win-win" has been reduced to a kind of intensifier of "win", so that a win-win for just one party would be sensible.

  3. KeithB said,

    March 4, 2025 @ 11:34 am

    It might help for Air Traffic control:
    https://languagelog.ldc.upenn.edu/nll/?p=41599

  4. rosie said,

    March 4, 2025 @ 12:20 pm

    I, for one, would very much neutralize most US people's accents. They're about as hard to understand as Scots.

  5. Robot Therapist said,

    March 4, 2025 @ 1:30 pm

    I listened to the first sample on that page. The main benefit of the processing (not sure in what sense it's "AI") was that it removed the sound of other call centre operators in the background.

  6. Jarek Weckwerth said,

    March 4, 2025 @ 3:25 pm

    I'm not convinced about this dehumanization thing. Systems that do real-time interpreting between "different languages" are considered good, aren't they? The EU is heading in that exact direction in Parliament, for example. Interpreting between Danish and Swedish will be no different to this, and I doubt if the Danish and Swedish EMPs are likely to find it dehumanizing.

  7. Viseguy said,

    March 4, 2025 @ 6:14 pm

    @Jarek Weckwerth: The colonial origins of English-as-spoken-in-India makes a difference, I think.

  8. Anonymous Historian said,

    March 5, 2025 @ 1:41 am

    Scam call centers would absolutely love this technology.

  9. Richard Hershberger said,

    March 5, 2025 @ 6:30 am

    A simpler way to improve the experience would be to invest in better quality audio, and suppress the background noise.

  10. Gunnar H said,

    March 5, 2025 @ 7:47 am

    @Jarek Weckwerth

    Interpreting between Danish and Swedish will be no different to this

    While Danish and Swedish are quite similar, they are not just "different accents." There are significant differences in vocabulary (and minor differences in grammar), so at most you would be converting Danish to "Danish with a Swedish pronunciation" or vice versa.

    It might help with comprehension (though a significant number of false friends might introduce even greater confusion), but practically every non-trivial sentence would be obviously foreign.

    Norwegian (bokmål) and Danish are much closer in vocabulary and grammar, so something like this could in theory work pretty convincingly for that pair, though it would often sound fairly stilted or slightly ungrammatical, and you would still hit the occasional mismatch in vocabulary.

  11. Jarek Weckwerth said,

    March 5, 2025 @ 3:34 pm

    @ Gunnar H

    I was being slightly facetious, but in fact to your point that Danish and Swedish aren't just different accents: neither is Indian English vs. any other type of English since they are different dialects, not accents. There are lexical, grammatical and pragmatic (!) differences, too, along the lines of Kindly state if you agree to do the same.

  12. Jarek Weckwerth said,

    March 5, 2025 @ 3:34 pm

    BTW, in case that is not a convincing example, apologies for my subpar command of Indian English.

  13. Gunnar H said,

    March 6, 2025 @ 4:10 am

    Whether or not "Kindly state if you agree to do the same" is a convincing example of a characteristic Indian English sentence, I don't think it supports your case.

    In British or American English it may sound awkward and overly formal, but the sentence is grammatical and the words are actual English words used with their proper meaning and correct inflection.

    And I believe that this holds true pretty generally for Indian English: There are no doubt a number of interesting features of various dialects, not to mention differing levels of proficiency among L2 (and L3+) speakers, but, accent aside, standard Indian English is close to British English, as is apparent in written form.

    In contrast, I believe that a direct "accent translation" from Danish to Swedish or vice versa, merely adjusting the pronunciation of each word, will usually not produce a correct, grammatical sentence in the target language.

  14. Philip Taylor said,

    March 6, 2025 @ 6:15 am

    And in its more terse form "Kindly state if you agree to same" is (IMHO) very typical of business English of an earlier era.

  15. Chris Button said,

    March 6, 2025 @ 11:08 am

    John Wells' magisterial three-volume "Accents of English" (1982) makes extensive reference to R.K. Bansal's "The Intelligibility of Indian English: Measurements of the Intelligibility of Connected Speech, and Sentence and Word Material, Presented to Listeners of Different Nationalities" (1969).

  16. Julian said,

    March 7, 2025 @ 6:55 am

    "Real-time accent-altering technology: The message is clear, and it is dehumanizing"
    Sorry, but no.
    It's a purely practical matter.
    When I call a call centre, I'm a customer. It's the operator's *job* to make me understand them. It's not my job to do a crash course in familiarising myself with their accent (which is an absurd idea in any case).
    When I say to the operator "could you talk more slowly please?" It might be because I'm hard of hearing; it might be because they're talking too fast for me (some people do); it might be because their foreign accent is strong enough to impede communication.**
    I say it because it's business and I *need to understand* them . it's not because I'm disrespecting their accent.
    If technology can improve the communication, that's good.
    ** It's interesting how quite slight differences of accent can throw you off. Recent example: giving info to the receptionist at the hospital before day surgery: "Name?" "Address?" "Next of kin?" Then something that sounded like "RELaja?" I had to get her to repeat this three times before realising the word was "religion", with non-standard word stress. On the phone the issues multiply.

  17. Philip Taylor said,

    March 7, 2025 @ 9:26 am

    I find myself fairly regularly having to ask someone on the telephone to speak more slowly — I do my best to avoid any suggestion that this might be because they are not a native speaker by adding "I think we have a rather bad line …". The most recent case was a lady from Amazon, and in the end I had no option but to ask to speak to her supervisor, but even that did not help — Amazon ended up refunding all of the amount I had paid, despite my repeated attempts to explain that only one of the two items in the package (the lower-cost item, in fact) had been damaged in transit.

  18. KevinM said,

    March 9, 2025 @ 1:27 pm

    @Richard Hershberger. Exactly. One of the main problems with call center intelligibility, especially during COVID, was that you were speaking via a hookup to a person working from home, often on a cell phone.
    @Philip Taylor. I share your experience. Communication difficulties aside, it's just cheaper for Amazon to cancel the transaction (or duplicate it). It's the retail equivalent of bombing an infection with antibiotics instead of diagnosing it. As a result, I have in my basement several redundant items that arrived with parts missing. Satisfyingly, I can use them … for parts.

  19. Jarek Weckwerth said,

    March 10, 2025 @ 5:26 pm

    In contrast, I believe that a direct "accent translation" from Danish to Swedish or vice versa, merely adjusting the pronunciation of each word, — That is not at all what I had in mind; sorry for being unclear. I was thinking of full-on automatic translation from actual Danish to actual Swedish where there are both differences that you could call "mere accent" differences, and more substantial differences in lexicon and syntax. My point was that (1) the magnitude of these differences (of both types BTW) would be similar to that between Indian and e.g. American English, and (2) that no-one would think that the process is dehumanizing to anyone. In other words, it's not that "accent reduction" tech is dehumanizing in its own right.

RSS feed for comments on this post · TrackBack URI

Leave a Comment