Language Log

Archive for Language and computers

Another sinograph for Unicode — the third-person gender-neutral pronoun

January 17, 2026 @ 10:12 am· Filed by Victor Mair under Colloquial, Diglossia and digraphia, Language and computers, Phonetics and phonology, Typography, Writing systems

No sooner had I posted about a block of 11,328 proposed Small Seal characters dating back roughly two millennia being incorporated in UNICODE than a single spanking new sinograph surfaced and was urgently put forward for inclusion, and it is causing a bit of a ruckus. That is the third-person gender-neutral pronoun [X也], which is pronounced the same as all the other characters for supposedly gendered Sinitic third-person pronouns, viz., tā (see below for their graphic forms).

N.B.: The proposed neograph under dicussion is provisionally being written as [X也], but bear in mind that, as I have pointed out countless times, all sinographs, by the exigencies / inherent nature of the script, whether they have 1 stroke or 64 / n strokes, must be squeezed inside the same size box as all other sinographs. In other words, [X也] perforce — once the typographers get it worked out — will eventually have to fit inside exactly the same space as 也 and biáng (you can get an authentic plate of these belt-like Shaanxi noodles at Xi'an Sizzling Woks, 40th & Chestnut in University City next to Penn [opens at 11:30 AM, closed on Tuesdays]). There are many Language Log posts about diverse aspects of this jabberwockyish character. Just look it up under "biang"

Read the rest of this entry »

Permalink Comments (18)

Word division and computer lockouts

December 31, 2025 @ 9:58 am· Filed by Victor Mair under Language and computers, Language and culture, Parsing

Random storefront in Taiwan:

Read the rest of this entry »

Permalink Comments (12)

The true (sort of) story about VPNs in China, part 2

August 15, 2025 @ 6:12 am· Filed by Victor Mair under Censorship, Language and computers

The reason for this part 2 about VPNs in China is that so many good responses to my inquiries about VPNdom in China came pouring in just after the first post went out. Also, there is such a wide variety of different viewpoints and experiences that you cannot expect there to be a single standard out there. One thing has become very clear to me, and that is that the Chinese government wants to keep people on their toes and resort to a lot of self-policing. This is common government policy in China, not just with regard to the internet, but to many aspects of social and political life.

For an up-to-date comprehensive primer about VPN usage in the PRC, I recommend that you read carefully this article: "Are VPN's Legal in China?" Not really. It's a tricky business. You can be heavily fined and get in real trouble if the internet police catch you with one, especially if you use it to read / write something the CCP disapproves of.

Read the rest of this entry »

Permalink Comments (1)

Does handwriting still matter?

January 28, 2025 @ 3:57 pm· Filed by Victor Mair under Language and computers, Language and psychology, Writing

It's a subject that won't go away.

When I was in high school, I concocted an embarrassingly sophomoric signature:

I wrote that iteration of my youthful signature on the front flyleaf of my beloved Webster's New Collegiate Dictionary (1960), which, from that year till today has been one of my most precious possessions.

When I went away to college in 1961 and ever since, I adopted a signature that was the exact antithesis of that early one:

It was / is mechanical and measured, with no flourishes whatsoever.

Most people I know have one of three basic types of signatures:

1. extravagant, fast, illegible — these are usually "important" people who have to sign their signature scores of times each week; doctors; lawyers; executives; entertainers….

2. beautiful, well-composed, flowing, legible — my sisters, most women

3. crotched, cramped, crooked, angular, unesthetic, slow — my brothers and me, engineers, scientists, who write with what I call "chicken scratches"

Read the rest of this entry »

Permalink Comments (16)

The history of characters in computers

January 3, 2025 @ 6:34 pm· Filed by Victor Mair under Language and computers, Writing systems

Sino-Platonic Papers is pleased to announce the publication of its three-hundred-and-sixtieth issue:

“Kanji and the Computer: A Brief History of Japanese Character Set Standards,” by James Breen.

https://www.sino-platonic.org/complete/spp360_kanji_computers_japanese_character_set.pdf

ABSTRACT
This paper describes the development of the character coding systems and standards that enable Japanese text to be recorded and used in computer systems. The Japanese coding systems, which were first developed in the late 1970s, pioneered the approaches to handling the large numbers of kanji characters and established a pathway that was adopted in other standards for Asian languages. The paper covers the development of the major Japanese standards and their evolution into the Unicode character standard, which is now the basis for all language coding.

Read the rest of this entry »

Permalink Comments (1)

God of Scrabble

December 13, 2024 @ 7:50 pm· Filed by Victor Mair under Language and computers, Multilingualism, Spelling

I recall Malaysia-based New Zealander Nigel Richards' multiple Scrabble championships in English and French from earlier years and thought that I had written about them, but apparently not on Language Log. Now he has won again, this time in Spanish, so it's about time that he became known to our readership, if they don't already know him..

"Scrabble star wins Spanish world title – despite not speaking Spanish: Nigel Richards has also been champion in English and – after memorising dictionary in nine weeks – French", Ashifa Kassam in Madrid, The Guardian (12/10/24)

Read the rest of this entry »

Permalink Comments (15)

Searle's "Chinese room" and the enigma of understanding

November 28, 2024 @ 7:45 am· Filed by Victor Mair under Artificial intelligence, Language and computers, Language and philosophy

In this comment to "'Neutrino Evidence Revisited (AI Debates)' | Is Mozart's K297b authentic?" (11/13/24), I questioned whether John Searle's "Chinese room" argument was intelligently designed and encouraged those who encounter it to reflect on what it did — and did not — demonstrate.

In the same comment, I also queried the meaning of "understand" and its synonyms ("comprehend", and so forth).

Both the "Chinese room" and "understanding" had been raised by skeptics of AI, so here I'm treating them together.

Read the rest of this entry »

Permalink Comments (21)

Taiwan Mandarin vs. Mainland Mandarin

November 8, 2024 @ 7:49 pm· Filed by Victor Mair under Borrowing, Language and computers

In recent weeks and months, we've been having many posts and comments about Taiwanese language. Today's post is quite different: it's all about the difference between Mandarin as spoken on the mainland and as spoken on Taiwan.

"Words of Influence: PRC terms and Taiwanese identity", by Karen Huang, Taiwan Insight (8 November 2024)

What is a ‘video clip’ in Mandarin Chinese? In Taiwan, a video clip is yingpian (影片), while in China, it is referred to as shipin (視頻). Similarly, tomatoes are called fanqie (番茄) in Taiwan, but xihongshi (西红柿) in China. These vocabulary differences between Taiwan Mandarin (Guoyu 國語) and PRC Mandarin (Putonghua 普通话) are expected. After all, it is natural for different dialects of a language to have some differences in their vocabulary—just like how ‘rubbish bin’ in British English is ‘garbage can’ in American English.

Read the rest of this entry »

Permalink Comments (7)

Font making for oracle bone inscription studies

September 18, 2024 @ 8:18 pm· Filed by Victor Mair under Language and computers, Typography, Writing systems

"Jingyuan Digital Platform: Font Making and Database Development for Shang Oracle Bones (Part 1)", Peichao Qin, The Digital Orientalist (9/17/24)

If you're wondering what "Jingyuan" means, it's a fancy, allusive way to say "Mirrored contexts [for thorough investigations]" ([gézhì] jìngyuán [格致]鏡原) (source), just a means for the creator of the platform to give it a proprietary designation.

A goodly proportion of Language Log readers probably have some idea of what oracle bone inscriptions are, but just to refresh our memories and for the benefit of new and recent readers who are not familiar with the history of Sinographic scripts, I'm going to jump right into the third paragraph of Qin's article, which is like a basic primer of oracle bone inscription studies.

Read the rest of this entry »

Permalink Comments (7)

Pitfalls of machine translation

August 29, 2024 @ 3:59 pm· Filed by Victor Mair under Language and computers, Translation

[This is a guest post by Thomas Batchelor]

I was recently looking at a tourist bus around the Matsu Islands of Taiwan, and they have a timetable online with the route and locations for picking up passengers, as below.

[VHM: Don't trouble yourself by trying to read the fine print of the schedule itself. Just pay attention to the note about the pickup location at the bottom of the schedule, which is enlarged below the fold.]

Read the rest of this entry »

Permalink Comments (7)

Triple review of books on characters and computers

August 23, 2024 @ 1:48 pm· Filed by Victor Mair under Announcements, Books, Language and computers, Writing systems

Sino-Platonic Papers is pleased to announce the publication of its three-hundred-and-fifty-fourth issue: "Handling Chinese Characters on Computers: Three Recent Studies" (pdf), by J. Marshall Unger (August, 2024).

Abstract
Writing systems with large character sets pose significant technological challenges, and not all researchers focus on the same aspects of those challenges or of the various attempts that have been made to meet them. A comparative reading of three recent books—The Chinese Computer by Thomas Mullaney (2024), Kingdom of Characters by Jing Tsu (2022), and Codes of Modernity by Uluğ Kuzuoğlu (2023)—makes this abundantly clear. All deal with the ways in which influential users of Chinese characters have responded to the demands of modern technology, but differ from one another considerably in scope and their selection and treatment of relevant information long known to linguists and historians.

Read the rest of this entry »

Permalink Comments (2)

China VPN redux

July 17, 2024 @ 10:29 am· Filed by Victor Mair under Censorship, Language and computers

Chapter 1

A professor in China who is collaborating with a famous American professor of Chinese literature wanted to read one of my Language Log (LL) posts because he had heard that it's being widely discussed around the world. However, because of China's rigid censorship rules, he couldn't open the LL post.

The Chinese professor asked the American professor to help him gain access to my post.

The American professor asked me to help the Chinese professor.

I suggested to the Chinese professor to use a VPN. Without a VPN, Chinese are not able to access LL, Wikipedia, Wiktionary, Google, X, etc., etc. In other words, without a VPN, Chinese are cut off from most of the information on the internet that is outside the Great Firewall, i.e., most of the cutting edge, valuable information in the world.

The Catch 22 is that it is a crime to use a VPN in China.

Can you imagine having to live in a benighted place like the PRC?

Read the rest of this entry »

Permalink Comments (5)

Kanji brush writing on an iPad

July 7, 2024 @ 3:12 pm· Filed by Victor Mair under Language and computers, Writing

The article is in Japanese, but you should be able to get an idea of what's going on from the videos and stills.

iPad書道はいいぞ pic.twitter.com/P4hregIAl1

— 書きちらし (@kakichirashi) June 29, 2024

Read the rest of this entry »

Permalink Comments (1)

« Previous Entries

Archive for Language and computers

Another sinograph for Unicode — the third-person gender-neutral pronoun

Word division and computer lockouts

The true (sort of) story about VPNs in China, part 2

Does handwriting still matter?

The history of characters in computers

God of Scrabble

Searle's "Chinese room" and the enigma of understanding

Taiwan Mandarin vs. Mainland Mandarin

Font making for oracle bone inscription studies

Pitfalls of machine translation

Triple review of books on characters and computers

China VPN redux

Kanji brush writing on an iPad

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta