Archive for Language and computers

The history of characters in computers

Sino-Platonic Papers is pleased to announce the publication of its three-hundred-and-sixtieth issue:

“Kanji and the Computer: A Brief History of Japanese Character Set Standards,” by James Breen.

https://www.sino-platonic.org/complete/spp360_kanji_computers_japanese_character_set.pdf

ABSTRACT
This paper describes the development of the character coding systems and standards that enable Japanese text to be recorded and used in computer systems. The Japanese coding systems, which were first developed in the late 1970s, pioneered the approaches to handling the large numbers of kanji characters and established a pathway that was adopted in other standards for Asian languages. The paper covers the development of the major Japanese standards and their evolution into the Unicode character standard, which is now the basis for all language coding.

Read the rest of this entry »

Comments (1)

God of Scrabble

I recall Malaysia-based New Zealander Nigel Richards' multiple Scrabble championships in English and French from earlier years and thought that I had written about them, but apparently not on Language Log.  Now he has won again, this time in Spanish, so it's about time that he became known to our readership, if they don't already know him..

"Scrabble star wins Spanish world title – despite not speaking Spanish:  Nigel Richards has also been champion in English and – after memorising dictionary in nine weeks – French", Ashifa Kassam in Madrid, The Guardian (12/10/24)

Read the rest of this entry »

Comments (15)

Searle's "Chinese room" and the enigma of understanding

In this comment to "'Neutrino Evidence Revisited (AI Debates)' | Is Mozart's K297b authentic?" (11/13/24), I questioned whether John Searle's "Chinese room" argument was intelligently designed and encouraged those who encounter it to reflect on what it did — and did not — demonstrate.

In the same comment, I also queried the meaning of "understand" and its synonyms ("comprehend", and so forth).

Both the "Chinese room" and "understanding" had been raised by skeptics of AI, so here I'm treating them together.

Read the rest of this entry »

Comments (21)

Taiwan Mandarin vs. Mainland Mandarin

In recent weeks and months, we've been having many posts and comments about Taiwanese language.  Today's post is quite different:  it's all about the difference between Mandarin as spoken on the mainland and as spoken on Taiwan.

"Words of Influence: PRC terms and Taiwanese identity", by Karen Huang, Taiwan Insight (8 November 2024)

What is a ‘video clip’ in Mandarin Chinese? In Taiwan, a video clip is yingpian (影片), while in China, it is referred to as shipin (視頻). Similarly, tomatoes are called fanqie (番茄) in Taiwan, but xihongshi (西红柿) in China. These vocabulary differences between Taiwan Mandarin (Guoyu 國語) and PRC Mandarin (Putonghua 普通话) are expected. After all, it is natural for different dialects of a language to have some differences in their vocabulary—just like how ‘rubbish bin’ in British English is ‘garbage can’ in American English.

Read the rest of this entry »

Comments (7)

Font making for oracle bone inscription studies

"Jingyuan Digital Platform: Font Making and Database Development for Shang Oracle Bones (Part 1)", Peichao Qin, The Digital Orientalist (9/17/24)

If you're wondering what "Jingyuan" means, it's a fancy, allusive way to say "Mirrored contexts [for thorough investigations]" ([gézhì] jìngyuán [格致]鏡原) (source), just a means for the creator of the platform to give it a proprietary designation.

A goodly proportion of Language Log readers probably have some idea of what oracle bone inscriptions are, but just to refresh our memories and for the benefit of new and recent readers who are not familiar with the history of Sinographic scripts, I'm going to jump right into the third paragraph of Qin's article, which is like a basic primer of oracle bone inscription studies.

Read the rest of this entry »

Comments (7)

Pitfalls of machine translation

[This is a guest post by Thomas Batchelor]

I was recently looking at a tourist bus around the Matsu Islands of Taiwan, and they have a timetable online with the route and locations for picking up passengers, as below.

[VHM:  Don't trouble yourself by trying to read the fine print of the schedule itself.  Just pay attention to the note about the pickup location at the bottom of the schedule, which is enlarged below the fold.]

Read the rest of this entry »

Comments (7)

Triple review of books on characters and computers

Sino-Platonic Papers is pleased to announce the publication of its three-hundred-and-fifty-fourth issue:  "Handling Chinese Characters on Computers: Three Recent Studies" (pdf), by J. Marshall Unger (August, 2024).

Abstract
Writing systems with large character sets pose significant technological challenges, and not all researchers focus on the same aspects of those challenges or of the various attempts that have been made to meet them. A comparative reading of three recent books—The Chinese Computer by Thomas Mullaney (2024), Kingdom of Characters by Jing Tsu (2022), and Codes of Modernity by Uluğ Kuzuoğlu (2023)—makes this abundantly clear. All deal with the ways in which influential users of Chinese characters have responded to the demands of modern technology, but differ from one another considerably in scope and their selection and treatment of relevant information long known to linguists and historians.

Read the rest of this entry »

Comments (2)

China VPN redux

Chapter 1

A professor in China who is collaborating with a famous American professor of Chinese literature wanted to read one of my Language Log (LL) posts because he had heard that it's being widely discussed around the world.  However, because of China's rigid censorship rules, he couldn't open the LL post.

The Chinese professor asked the American professor to help him gain access to my post.

The American professor asked me to help the Chinese professor.

I suggested to the Chinese professor to use a VPN.  Without a VPN, Chinese are not able to access LL, Wikipedia, Wiktionary, Google, X, etc., etc.  In other words, without a VPN, Chinese are cut off from most of the information on the internet that is outside the Great Firewall, i.e., most of the cutting edge, valuable information in the world.

The Catch 22 is that it is a crime to use a VPN in China.

Can you imagine having to live in a benighted place like the PRC?

Read the rest of this entry »

Comments (5)

Kanji brush writing on an iPad

The article is in Japanese, but you should be able to get an idea of what's going on from the videos and stills.

Read the rest of this entry »

Comments (1)

Fissures in the Great Firewall caused by X

Things are becoming dicey for the CCP/PRC regime:

"A cartoon cat has been vexing China’s censors – now he says they are on his tail"

By Tessa Wong, Asia Digital Reporter, BBC (6/10/24)

Here's the dilemma faced by the Chinese communist authorities.   It would be very easy for the censors to shut down all VPNs and invoke strictly draconian internet controls that would make it impossible for netizens to communicate with the outside internet.  But that would mean that China would no longer have access to external information and communication, which the government desperately needs if they are going to continue to acquire advanced technology and science from abroad, not to mention operate their economic initiatives such as BRI (Belt and Road Initiative).

Read the rest of this entry »

Comments (5)

Microsoft Copilot goes looking for an obscure sinograph

and finds it!

Back in early February, I asked the Lexicography at Wenlin Institute discussion group if they could help me find a rare Chinese character in Unicode, and I sent along a picture of the glyph.  It won't show up in most browsers, but you can see the character here.  You can also see it in the first item of the list of "Selected Readings" below.  In the following post, when you see this symbol, , just imagine you're seeing this glyph.

On 2/27/04, Richard Warmington kindly responded as follows:

I asked Microsoft Copilot (a chatbot integrated into Microsoft's Edge browser), "Can you tell me anything about the Chinese character ?"

The answer began as follows:

Certainly! The Chinese character is an intriguing one. Let’s explore it:

1. Character Details:
Character:
Unicode Code Point: U+24B25
[…]

Read the rest of this entry »

Comments (17)

We need libraries and we need computers

Both for the flow of and access to information.

More than a week ago, the Seattle Public Library system, a large and wonderful institution that thousands rely on every day, went offline after ransomware hackers attacked it.

"Why did ransomware hackers target Seattle Public Library?", GeekWire, by Taylor Soper (May 29, 2024)

This is an excellent article that explains why the criminals went after a library, how they carried out their dirty work, and what the authorities are doing to restore services.

Read the rest of this entry »

Comments (8)

Implementing Pāṇini's grammar

[Here's the conclusion to the hoped for trifecta on things Indian — see the preface here.  It comes in the form of a guest post by Arun Prasad]

The cornerstone of traditional Sanskrit grammar is Pāṇini's Aṣṭādhyāyī, which in around 4,000 short rules defines a comprehensive system for generating valid Sanskrit expressions. It continues to prompt vigorous discussion to this today, some of which has featured in Language Log before.
 
As a professional software engineer and amateur Sanskritist, my lens is more pragmatic: if we could implement the Aṣṭādhyāyī in code and generate an exhaustive list of Sanskrit words, we could create incredibly valuable tools for Sanskrit students and scholars.
 
To that end, I have implemented just over 2,000 of the Aṣṭādhyāyī's rules in code, with an online demo here. These rules span all major sections of the text that pertain to morphology, including: derivation of verbs, nominals, secondary roots, primary nominal bases, and secondary nominal bases; compounding; accent; and sandhi.

Read the rest of this entry »

Comments (1)