Language Log

What's (still) wrong with text-to-speech?

March 2, 2026 @ 5:00 am · Filed by Mark Liberman under Artificial intelligence, Computational linguistics

Text-To-Speech technology has improved enormously over the decades — but there's still some headroom, as a friend has recently underlined for me. He observes that when The Economist magazine first publishes a piece online, it appears with a AI-read audio, and then later with a human-read version:

The rhythm/prosody/pitch (I'm not exactly sure which – all three?) is the same in nearly every sentence and even clause. This high-then-falling pattern is fine in one sentence, but repeated 50 times in a row is awful.

Later, those pieces that make it into the print edition get their own, human-read version. So voilà, you have a perfect before-and-after.

Read the rest of this entry »

Permalink Comments (8)

The Chinese Computer: Competition or Cooperation?

March 1, 2026 @ 1:35 am · Filed by Victor Mair under Artificial intelligence, Language and computers, Typing

The Chinese Computer: Competition or Cooperation?
book review by
David Moser
Beijing Capital Normal University

Thomas Mullaney’s The Chinese Computer is a fascinating account of the decades-long effort by linguists, computer scientists and engineers to incorporate Chinese characters into the digital age. Drawing on a vast body of historical and scientific sources, the book offers the reader an lively account of the formidable technical challenges involved in creating practical and intuitive input methods for one of the world’s most complex writing systems. The reader will come away with an increased awareness of the contributions that Chinese computing brought to modern computer science.

Chinese scholars and sinologists working in the 1980s and 90s will recall the early generations of Chinese word processors—slow, unreliable, and crash-prone—when every incremental gain in speed or compatibility felt like a small miracle. Thanks to the ingenuity and innovation of computer input developers, today anyone on the planet can create Chinese texts using an impressive ecosystem of powerful and user-friendly tools.

Read the rest of this entry »

Permalink Comments (3)

Planes, patches, pilots, and propaganda

March 1, 2026 @ 1:26 am · Filed by Victor Mair under Language and the military, Signs, Translation

Air Force billboard in Shijiazhuang, Hebei Province, China:

Courtesy of The Great Translation Movement (TGTM) — here.

Read the rest of this entry »

Permalink Comments (6)

Hanging a trans flag from El Capitan

March 1, 2026 @ 1:23 am · Filed by Victor Mair under Crash blossoms, Headlinese

François Lang says:

This WSJ headline garden-pathed me; I got the correct parse only on the third try!

Federal Worker Fired After Hanging Trans Flag at Yosemite Sues Government

Former Park Service employee claims free speech violations after organizing climbers for display at ‘El Cap’

By Allison Pohle, WSJ (2/23/26)

Read the rest of this entry »

Permalink Comments (15)

Crazy characters

March 1, 2026 @ 1:20 am · Filed by Victor Mair under Language and culture, Writing systems

Taken outside a hotel in Shenzhen:

Read the rest of this entry »

Permalink Comments off

Unifying Arabic topolects through AI

February 28, 2026 @ 3:29 pm · Filed by Victor Mair under Artificial intelligence, Computational linguistics, Dialects, Topolects

Meet Habibi – the Chinese AI unit ing 20 Arabic dialects in a Middle East first
Lead author says there are many differences between Arabic dialects and Modern Standard Arabic, which is used in official circumstances
Zhao Ziwen, SCMP, 28 Feb 2026

The paper that presents this new model is called “Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis”. It was published last month on arXiv, an open-access repository that is not peer-reviewed. I will be interested to hear what Language Log readers think of its prospects.

Read the rest of this entry »

Permalink Comments (4)

Tariffs

February 28, 2026 @ 6:31 am · Filed by Mark Liberman under Etymology

With all the recent news about tariffs, I wondered where the word came from. So I consulted the OED:

< Italian tariffa ‘arithmetike or casting of accounts’ (Florio), ‘a book of rates for duties’ (Baretti), = Spanish tarifa, Portuguese tarifa, < Arabic taʿrīf notification, explanation, definition, article, < ʿarafa in 1st conj. to notify, make known. So French tarif.

Read the rest of this entry »

Permalink Comments (17)

Washington State Spanish

February 27, 2026 @ 8:20 pm · Filed by Mark Liberman under Bilingualism

"Callers to Washington state hotline press 2 for Spanish and get accented AI English instead", AP News 2/27/2026:

For months, callers to the Washington state Department of Licensing who have requested automated service in Spanish have instead heard an AI voice speaking English in a strong Spanish accent.

A recording:

Permalink Comments (16)

Spacing in Korean

February 26, 2026 @ 4:44 pm · Filed by Victor Mair under Parsing, Pedagogy, Writing systems

The role of a Scotsman, John Ross (1842-1915), in creating it. Although he was a Christian missionary who spent over half his life in China, he was apparently a gigachad.

The following video is densely packed with solid information and moves rapidly, so you have to pay close attention to follow it.

Read the rest of this entry »

Permalink Comments (8)

Rampant plagiarism in the Chinese literary world

February 25, 2026 @ 10:09 pm · Filed by Victor Mair under Artificial intelligence, Language and literature

"It cannot read the human heart" by Yan Ge (b/1984), London Review of Books Blog (2/20/26)

Read the rest of this entry »

Permalink Comments (38)

Saving Sámi

February 24, 2026 @ 3:30 pm · Filed by Victor Mair under Endangered languages, Language extinction

"How toddlers in Finland are saving an endangered Sámi language"
by Erika Benke, BBC (5 days ago)

Special nurseries are helping the Sámi people in Finland to bring their almost-lost language back from the brink of extinction.

When I stayed in the Arctic Circle to finish writing The True History of Tea with Erling Hoh, I was amazed by the symbiotic relationship the Sámi there had with their vast herds of reindeer. And, yes, they do ride them, which someone was asking about here recently.

Read the rest of this entry »

Permalink Comments (3)

The full name of Bangkok

February 24, 2026 @ 8:10 am · Filed by Victor Mair under Names, Parsing, Spelling

@kattoksthai

Replying to @Mamba Did you know that Bangkok has the longest city name in the world? I dare you to say it too! #bangkok #thailand #thai

♬ original sound – Kat Talks Thai

Read the rest of this entry »

Permalink Comments (27)

"Written Cantonese must have word segmentation"

February 23, 2026 @ 7:09 am · Filed by Victor Mair under Grammar, Language reform, Parsing, Writing systems

That's the title of an essay that appeared in my e-mail today from an outfit called Cantonese Script Reform 粵字改革. Here's what they say:

Written Cantonese must have spaces, like Korean. The calligraphic issue must give way. For the space itself is a grammatical marker that marks the beginning and the end of a word. This tool of demarcation will allow poet and playwright to invent new words by putting words together within the confinements delineated by the spaces between words. Written Cantonese needs all the tools imaginable for it to revitalise and resurrect its lost vocabulary. A Hebrew-esque recycling off ancient words for purposes anew is the way to go. But we can’t do that if we can’t tell if this is a new word because we can’t tell if these characters familiar so and so sequenced are merely a fanciful poetic playful arrangement or other mark of the invention of a new word, where a familiar noun is turned into a verb or verb is turned into an adjective or an adjective is now henceforth interpreted as a noun in this particular context.

Read the rest of this entry »

Permalink Comments (33)

Language Log

What's (still) wrong with text-to-speech?

The Chinese Computer: Competition or Cooperation?

Planes, patches, pilots, and propaganda

Hanging a trans flag from El Capitan

Crazy characters

Unifying Arabic topolects through AI

Tariffs

Washington State Spanish

Spacing in Korean

Rampant plagiarism in the Chinese literary world

Saving Sámi

The full name of Bangkok

"Written Cantonese must have word segmentation"

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta