Archive for Language and computers

China VPN redux

Chapter 1

A professor in China who is collaborating with a famous American professor of Chinese literature wanted to read one of my Language Log (LL) posts because he had heard that it's being widely discussed around the world.  However, because of China's rigid censorship rules, he couldn't open the LL post.

The Chinese professor asked the American professor to help him gain access to my post.

The American professor asked me to help the Chinese professor.

I suggested to the Chinese professor to use a VPN.  Without a VPN, Chinese are not able to access LL, Wikipedia, Wiktionary, Google, X, etc., etc.  In other words, without a VPN, Chinese are cut off from most of the information on the internet that is outside the Great Firewall, i.e., most of the cutting edge, valuable information in the world.

The Catch 22 is that it is a crime to use a VPN in China.

Can you imagine having to live in a benighted place like the PRC?

Read the rest of this entry »

Comments (5)

Kanji brush writing on an iPad

The article is in Japanese, but you should be able to get an idea of what's going on from the videos and stills.

Read the rest of this entry »

Comments (1)

Fissures in the Great Firewall caused by X

Things are becoming dicey for the CCP/PRC regime:

"A cartoon cat has been vexing China’s censors – now he says they are on his tail"

By Tessa Wong, Asia Digital Reporter, BBC (6/10/24)

Here's the dilemma faced by the Chinese communist authorities.   It would be very easy for the censors to shut down all VPNs and invoke strictly draconian internet controls that would make it impossible for netizens to communicate with the outside internet.  But that would mean that China would no longer have access to external information and communication, which the government desperately needs if they are going to continue to acquire advanced technology and science from abroad, not to mention operate their economic initiatives such as BRI (Belt and Road Initiative).

Read the rest of this entry »

Comments (5)

Microsoft Copilot goes looking for an obscure sinograph

and finds it!

Back in early February, I asked the Lexicography at Wenlin Institute discussion group if they could help me find a rare Chinese character in Unicode, and I sent along a picture of the glyph.  It won't show up in most browsers, but you can see the character here.  You can also see it in the first item of the list of "Selected Readings" below.  In the following post, when you see this symbol, , just imagine you're seeing this glyph.

On 2/27/04, Richard Warmington kindly responded as follows:

I asked Microsoft Copilot (a chatbot integrated into Microsoft's Edge browser), "Can you tell me anything about the Chinese character ?"

The answer began as follows:

Certainly! The Chinese character is an intriguing one. Let’s explore it:

1. Character Details:
Character:
Unicode Code Point: U+24B25
[…]

Read the rest of this entry »

Comments (17)

We need libraries and we need computers

Both for the flow of and access to information.

More than a week ago, the Seattle Public Library system, a large and wonderful institution that thousands rely on every day, went offline after ransomware hackers attacked it.

"Why did ransomware hackers target Seattle Public Library?", GeekWire, by Taylor Soper (May 29, 2024)

This is an excellent article that explains why the criminals went after a library, how they carried out their dirty work, and what the authorities are doing to restore services.

Read the rest of this entry »

Comments (8)

Implementing Pāṇini's grammar

[Here's the conclusion to the hoped for trifecta on things Indian — see the preface here.  It comes in the form of a guest post by Arun Prasad]

The cornerstone of traditional Sanskrit grammar is Pāṇini's Aṣṭādhyāyī, which in around 4,000 short rules defines a comprehensive system for generating valid Sanskrit expressions. It continues to prompt vigorous discussion to this today, some of which has featured in Language Log before.
 
As a professional software engineer and amateur Sanskritist, my lens is more pragmatic: if we could implement the Aṣṭādhyāyī in code and generate an exhaustive list of Sanskrit words, we could create incredibly valuable tools for Sanskrit students and scholars.
 
To that end, I have implemented just over 2,000 of the Aṣṭādhyāyī's rules in code, with an online demo here. These rules span all major sections of the text that pertain to morphology, including: derivation of verbs, nominals, secondary roots, primary nominal bases, and secondary nominal bases; compounding; accent; and sandhi.

Read the rest of this entry »

Comments (1)

Hype over AI and Classical Chinese / Literary Sinitic

From the get-go, I'm dubious about any claims that current AI can fully and accurately translate Classical Chinese / Literary Sinitic (CC/LS) into Modern Standard Mandarin (MSM), much less English or other language, on a practical, functional basis.  Since the following article is from one of China's official propaganda "news" outlets (China Daily [CD]), the chances that we will get an accurate accounting of the true situation is next to nil anyway.

Language system translates ancient Chinese texts

By Li Wenfang in Guangzhou | China Daily | Updated: 2023-11-03 09:42

It starts out on a sour note:

If foreigners learning Chinese think the modern language is difficult to grasp, they should be glad they don't have to learn classical Chinese. Ancient texts are far more challenging, and not easy for even native Chinese speakers to decipher.

This is a cockamamie approach to the analysis of a written language in its ancient stages.  What is it about ancient classical Chinese texts that makes them so difficult?  How do they differ from modern Chinese texts?  What about their morphology, their grammar, their syntax, their phonology and prosody, their lexicon, their literary allusions…?

A fundamental, fatal flaw in the conceptualization of Sinitic on the part of conservative indigenous scholars is that there are no essential linguistic discrepancies between CC/LS and MSM, only stylistic disparities.

Anyway, for what it's worth, the CD article continues:

Read the rest of this entry »

Comments (10)

Mirabile scriptu: fake kanji created by AI

Read the rest of this entry »

Comments (1)

Annals of Artificial Stupidity

Katie Deighton, "What Can’t the Internet Handle in 2022? Apostrophes", WSJ 9/29/2022:

Sybren Stüvel is an Amsterdam-based software developer with a fairly uncommon name and a surprisingly common predicament.

As he completes the tasks of daily life, computers refuse to accept his name as valid or mangle it entirely. A credit card provider rejected his moniker, a Vancouver hotel hit bumps locating his reservation—as he stood there exhausted from a nine-hour plane trip—and an airline wouldn’t let him check into a flight. “You can imagine my stress level,” he said.

While buying insurance, he said, “They asked me to confirm that my last name is indeed Stüvel.”

Read the rest of this entry »

Comments (57)

Zero-COVID: null with a difference

In Chinese, it is called "qīng líng 清零" (lit., "clear zero").  Because the concept never made sense to me as a practical means for coping with the pandemic coronavirus called SARS-CoV-2, I wrote a post trying to understand what the Chinese authorities mean by it:  see "Dynamic zero" (5/19/22).  In that post, I discussed the problem from many different angles, including:

  1. "zero moment point" in robotics
  2. "zero-sum game" in mathematics
  3. "zero dynamics" in mathematics

If "Zero-COVID" genuinely interests / concerns you, I recommend that you spend some time on the "Dynamic zero" post.  Here I will cite only this brief passage from it:

…before it was rushed into use for the current "zero [Covid control]" policy, "qīng líng 清零" started out in literary texts as an adjective implying "lonely; lonesome; solitary; desolate".  More recently, it was employed in computing as a verb denoting "to reset; to clear the memory".  From there, it was adapted by Chinese epidemiologists in the sense of "to reduce to zero; to zero out".  That may be their goal, but it is not happening, despite their fiercest efforts at FTTIS ("Find, Test, Trace, Isolate and Support").

Not to mention mass prescription of mRNA and other medicines, plus masks.

Read the rest of this entry »

Comments (15)

Information Management and Library Science

Just out today, this is one of the longest book reviews I have ever written:

Jack W. Chen, Anatoly Detwyler, Xiao Liu, Christopher M. B. Nugent, and Bruce Rusk, eds., Literary Information in China:  A History (New York:  Columbia University Press, 2021).

Reviewed by Victor H. Mair

MCLC Resource Center Publication (Copyright September, 2022)

I am calling it to your attention because the book under review, which I will refer to here as LIIC, signals a sea change in:

1. Sinology
2. Information technology
3. Academic attitudes toward the study of language and literature

Read the rest of this entry »

Comments (4)

Biblical Hebrew Computing

Three years ago, we visited a proposal for "Classical Chinese computing" (12/19/19).  The post began thus:

Several colleagues called this article to my attention:

"Programming Language for the ancient Chinese"

Here's the introduction:

文言, or wenyan, is an esoteric programming language that closely follows the grammar and tone of classical Chinese literature. Moreover, the alphabet of wenyan contains only traditional Chinese characters and 「」 quotes, so it is guaranteed to be readable by ancient Chinese people. You too can try it out on the online editor, download a compiler, or view the source code.

The home page then goes through "Syntax", "Compilation", and "Get (Source Code; Online Editor; Reference".

Read the rest of this entry »

Comments (9)

Infinitely malleable electronic brain — software and hardware

When I was a little boy, among the gifts from my parents that I treasured most were science kits that allowed me to construct my own instrumentation and use it for various experiments and observations, e.g., microscopes, radios and other electronic circuitry, chemistry sets, ingenious language games, and so on.  (This was in the late 40s and 50s in rural Stark County, northeast Ohio, mind you, when I was between the ages of about 5 and 15.)  But my favorite of all was a box full of materials for computer construction.  It consisted of a peg board, switches, wires, screws, small nuts and bolts, metal bands and clips, batteries, little light bulbs, etc.  Please remember that this was long before personal computers were invented.

Read the rest of this entry »

Comments (15)