Cambodian voice traffic

« previous post | next post »

A Rest of World article from November that I missed when it first came out, but am posting on now because it speaks to the comments on several recent Language Log posts (e.g., here and here):

"Fifty percent of Facebook Messenger’s total voice traffic comes from Cambodia. Here’s why:

Keyboards weren't designed for Khmer. So Cambodians have just decided to ignore them", By Vittoria Elliott and Bopha Phorn (12 November 2021)

The first four paragraphs of this longish article

In 2018, the team at Facebook had a puzzle on their hands. Cambodian users accounted for nearly 50% of all global traffic for Messenger’s voice function, but no one at the company knew why, according to documents released by whistleblower Frances Haugen.

One employee suggested running a survey, according to internal documents viewed by Rest of World. Did it have to do with low literacy levels? they wondered. In 2020, a Facebook study attempted to ask users in countries with high audio use, but was only able to find a single Cambodian respondent, the same documents showed. The mystery, it seemed, stayed unsolved.

The answer, surprisingly, has less to do with Facebook, and more to do with the complexity of the Khmer language, and the way users adapt for a technology that was never designed with them in mind.

In Cambodia, everyone from tuk-tuk drivers to Prime Minister Hun Sen prefers to send voice notes instead of messages. Facebook’s study revealed that it wasn’t just Cambodians who favor voice messages — though nowhere else was it more popular. In the study, which included 30 users from the Dominican Republic, Senegal, Benin, Ivory Coast, and that single Cambodian, 87% of respondents said that they used voice tools to send notes in a different language from the one set on their apps. This was true on WhatsApp — the most popular platform among the survey respondents — along with Messenger and Telegram.

The article goes into considerable detail about how hard it is to type in Cambodian.  However, despite the relative ease of voice messaging, it has drawbacks of its own:

But relying on voice tools also generates its own particular set of problems. Conversations become ephemeral. The same users also complained that they can’t scroll back to recall details of their exchange, and can only replay them by remembering the specific pattern of voice messages they left — one long and two short, for example. It’s impossible to use search functions for content in a chat history. Yet, at the same time, the inconvenience doesn’t appear to outweigh the benefits. Written messages now tend to be dominated by business or English-language exchanges.

While the Facebook employee imagined the behavior to be related to low literacy, Cambodia’s literacy rate is around 80%, according to the most recent World Bank data. 

“Many young people, if they [do] want to type, will write out Khmer words in Latin text,” said Sok Pongsametrey, a Phnom Penh-based software engineer and chief operating officer at POSCAR Digital, a company that builds digital tools for education. Other times, if a letter is too difficult to spell, they may resort to spelling the word incorrectly using a more easily available character, or abbreviations of a word accompanied by ellipses, knowing that a reader will understand the implied word.

There are ripple effects. Sok said these types of workarounds make it more difficult for engineers working on machine learning to train AI in the language. He also worries that these shortcuts will mean that young people will lose their familiarity with the Khmer script.

“I am very careful when I write in Khmer, because it’s an art,” he said. “But, young people, they think [using Latin text] is easy.”

All of the issues raised above for Cambodian writing are echoed in other "hard to type" languages.  It's not just a question of refractory keyboards and clumsy input systems — lord knows that countless different kinds have been designed for all the noteworthy languages of the world.  In many (most?) cases, it has to do with the nature of the scripts themselves:  ligatures, the need to compress / expand varying numbers of intricate components into equisized shapes, etc.  Several students in my "Language, Script, and Society in China" course wrote papers on these topics for Chinese, Japanese, and Korean writing, and I hope to publish them in the not too distant future.


Selected readings


  1. AntC said,

    February 11, 2022 @ 9:40 pm

    While the Facebook employee imagined the behavior to be related to low literacy, Cambodia’s literacy rate is around 80%, …

    Does that mean ability to read specifically? Not necessarily to write?

    Aren't there speech-to-text apps for Cambodian? (And such that if literacy reading rates are high, users would quickly spot snafus.)

    Messaging in speech-to-text would get round the difficulties of scrolling/searching through a conversation.

  2. cliff arroyo said,

    February 13, 2022 @ 3:26 am

    Okay, I'll bite…. why Cambodian especially?

    A lot of the difficulties of inputting Cambodian are also found in Thai, Lao and Burmese (as well as Southern Indian scripts like Telugu or Malayalam).

    Is it a question of degree? Something else? Smaller speaking population (so not as much dedicated software?)

    Does Cambodian lend itself more to ad hoc romanization?

  3. Alexander Browne said,

    February 13, 2022 @ 4:47 pm

    @ cliff arroyo

    My uninformed thoughts, in reverse order

    Does Cambodian lend itself more to ad hoc romanization?

    AFAIK Khmer does not have tones like many other SEA languages, which may make romanization simpler.

    Is it a question of degree? Something else? Smaller speaking population (so not as much dedicated software?)

    Cambodia lost much (most?) of its educated population to the Khmer Rouge, killed or exiled. Thailand had a much stronger economy through the period when computing was introduced. Not sure about Lao(s). Maybe Lao can piggy-back from Thai, since they are pretty similar IIRC?

    A lot of the difficulties of inputting Cambodian are also found in Thai, Lao and Burmese (as well as Southern Indian scripts like Telugu or Malayalam).

    For Indian languages (and I think Burmese, but lesser), there's a lot more English to fall back on at least for educated speakers.

  4. John Swindle said,

    February 16, 2022 @ 8:22 pm

    @AntC: Maybe Khmer speech-to-text hasn’t had enough source material to become as reliable as one might wish.

  5. Michael Watts said,

    February 21, 2022 @ 2:04 am

    I find it hard to believe that typing in Khmer might be more difficult than typing in Japanese. The explanation just doesn't make sense.

  6. Victor Mair said,

    February 21, 2022 @ 7:32 am

    To start with, Khmer script has ligatures.

    Plus this:


    Khmer is written from left to right. Words within the same sentence or phrase are generally run together with no spaces between them. Consonant clusters within a word are "stacked", with the second (and occasionally third) consonant being written in reduced form under the main consonant. Originally there were 35 consonant characters, but modern Khmer uses only 33. Each character represents a consonant sound together with an inherent vowel, either â or ô; in many cases, in the absence of another vowel mark, the inherent vowel is to be pronounced after the consonant.

    There are some independent vowel characters, but vowel sounds are more commonly represented as dependent vowels, additional marks accompanying a consonant character, and indicating what vowel sound is to be pronounced after that consonant (or consonant cluster). Most dependent vowels have two different pronunciations, depending in most cases on the inherent vowel of the consonant to which they are added. There are also a number of diacritics used to indicate further modifications in pronunciation. The script also includes its own numerals and punctuation marks.



    And other things.

    The Khmer people are not stupid. If, despite their historical attachment to it, the Khmer people think their script is hard to type, who are we to gainsay them?

RSS feed for comments on this post