Archive for Language and computers

Robot philosopher-calligrapher

I was aware of this article more than four years ago when it first appeared, but didn't post on it then because I didn't think many people would be interested in it:

"Forget Marx and Mao. Chinese City Honors Once-Banned Confucian", Ian Johnson, NYT (10/18/17)


(Credit: Lam Yik Fei for The New York Times)

Now that we're on a Chinese calligraphy and philosophy roll and have a number of robot calligraphy posts under our belt (see "Selected readings" below), writing a post about a robotic philosopher-calligrapher is not so outlandish after all.

Read the rest of this entry »

Comments (2)

Is Korean diverging into two languages?, part 2

To make sense of the story that follows, one must understand that the Korean word "agassi 아가씨" used to refer to a young lady from the upper class, but now in North Korea means “slave of feudal society” and has a very negative connotation there.

"Hidden meaning of Korean term 'agassi' leads to murder", by Choi Jae-hee, The Korea Herald (5/3/22)

Because the linguistic psychology that lies behind the tragic crime recounted in this article is intricate and subtle, it is necessary to recount it at some length:

An error in a mobile translation application recently prompted a 35-year-old Chinese man in Jeongeup, North Jeolla Province, to murder a Korean resident.

Read the rest of this entry »

Comments (27)

Character amnesia yet again: game (almost) over

Last week, I witnessed a palpable, powerful, poignant demonstration of tíbǐwàngzì 提筆忘字 ("forgetting how to write sinographs; character amnesia").  This happened in a colloquium where, during the discussion period, someone mentioned the standard eight-volume Historical Atlas of China (1982-1988) edited by the renowned geographer Tan Qixiang (1911-1992).  A member of the gathering requested that the name be written on the whiteboard in sinographs.  A colleague — a tenured professor of medieval Chinese history — popped up and said they could write the name in characters.

Already a little bit wobbly on the semantophore / radical on the left side of the first character (the surname), with a little bit of kibitzing from colleagues, the volunteer managed to produce the requisite semantophore after several false starts and erasures.  After that great achievement (producing the semantophore amid much embarrassment), they turned to the phonophore on the right side but were getting nowhere fast, even with suggestions from colleagues who were looking on.

Finally, someone looked up the name on their phone and presto digito*, the correct writing emerged:  譚其驤 / 谭其骧 (the group — scholars all — collectively preferred the traditional form over the simplified one).

—–

[*VHM:  I remember hearing this expression when I was young, but it barely exists on the internet, and I can't find it in dictionaries either.]

Read the rest of this entry »

Comments (1)

Why is Facebook's Chinese translation still so terrible?

[This is a guest post by Jenny Chu]

Has Language Log been following up on the great sorrow that is Facebook's (Chinese) translation feature? The last reference I found was this one

It came up today when I was reading this somewhat viral post on Facebook

I switched on the auto-translate option to help me understand. The results were not just astonishingly bad, but had a surprisingly medical bent.

 
今天這個主權政府作承諾的時候大辭炎炎,七情上面,結果又是如何?–> "Today, when the private government is working, the weather is colon inflammation, above the sentiment, what is the result?"

Read the rest of this entry »

Comments (11)

The paradox of hard and easy

If you're interested in one-way functions and Kolmogorov complexity, you'll probably want to read this mind-crunching article:

"Researchers Identify ‘Master Problem’ Underlying All Cryptography", by Erica Klarreich, Quanta Magazine (April 6, 2022)

The existence of secure cryptography depends on one of the oldest questions in computational complexity.

To ease our way, here are brief descriptions of the two key terms:

In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, "easy" and "hard" are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems. Not being one-to-one is not considered sufficient for a function to be called one-way….

(source)

In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963.

(source)

Read the rest of this entry »

Comments (17)

The weirdness of typing errors

In this age of typing on computers and other digital devices, when we daily input thousands upon thousands of words, we are often amazed at the number and types of mistakes we make.  Many of them are simple and straightforward, as when our fingers stumblingly hit the wrong keys by sheer accident.  People who type on phones warn their correspondents about the likelihood that their messages are prone to contain such errors because they include some such warning at the bottom: 

Please forgive spelling / grammatical errors; typed on glass // sent from my phone.

Read the rest of this entry »

Comments (37)

Postdocs on ancient scripts: Chinese and Aegean

Since these are on subjects that are of interest to many of us, I'm calling them to your attention.

From Mattia Cartolano:

The INSCRIBE project is hiring!

Two post-doc positions are now available:

  1. Evolution of Graphic Codes: The Origins of the Chinese Script
  2. Undeciphered Aegean Scripts: New perspectives in Computational Linguistics

Deadline for applications: Sunday 27 March 2022
If you want to find out more, write to s.ferrara@unibo.it

Read the rest of this entry »

Comments off

A mishmash of languages, "dialects", and characters

We've just been through the problems of standard language versus the vernaculars in Arabic (see "Selected readings" below).  Now we're going to look at a photograph, a caption, a book review, and a letter to the editor that encompass these contentious issues in spades — but for Chinese.  Here's the photograph:

Read the rest of this entry »

Comments (5)

Language is not script and script is not language

Trying to clear up the confusion between the two is a battle we have been waging for decades, and nowhere is the problem more severe than in the study of Sinitic languages and the Sinographic script.  The crisis (not a "danger + opportunity"!) has come to the surface again this month with the appearance of a new book by Jing Tsu titled Kingdom of Characters: The Language Revolution That Made China Modern (Riverhead Books, 2022).

The publication of Tsu's book has generated a lot of excitement, publicity, and reviews.  Here I would like to call attention to the brief remarks of an anonymous correspondent (a famous, reclusive linguist) that are right on target:

Reimagining "antiquated" Chinese

Reproduced below is the text of a book review in Science that you may not have seen. It is classified as "Linguistics", though the reviewer is a historian at Cal State Poly, Pomona. Notice that Chinese is assumed to be "antiquated" and in need of being "reimagined"!  There is simply no sign of Science understanding the difference between a human language and a writing system. This is consistent with the way they have always treated linguistics; they have no idea what the subject really is.

Read the rest of this entry »

Comments (19)

AI cat and mouse robot censorship war

Now it's getting interesting:

"China’s internet police losing man-versus-machine duel on social media"

Stephen Chen, SCMP (11/14/21)

    Hordes of bot accounts using clever dodging tactics are causing burnout among human censors, police investigative paper finds
    Authorities may respond by raising a counter-army of automated accounts or even an AI-driven public opinion leader

Read the rest of this entry »

Comments (3)

rime-cantonese, a Cantonese lexicon for building keyboards and more

The following is a guest post by Mingfei Lau. A short intro about the author:

My name is Mingfei Lau, a member of The Linguistic Society of Hong Kong Jyutping Workgroup. I am a language engineer at Amazon and I work on different projects on Cantonese resource development in my spare time.


Today, Pinyin is undoubtedly the most popular way to type Mandarin. But what about Cantonese? This wasn’t easy until rime-cantonese, the normalized Cantonese Jyutping[1] lexicon appeared. Lo and behold, you can now type Cantonese in Jyutping just like typing Mandarin in Pinyin.

Read the rest of this entry »

Comments (4)

Massive long-term data storage

News release in EurekAlert, Optica (10/28/21):

"High-speed laser writing method could pack 500 terabytes of data into CD-sized glass disc:  Advances make high-density, 5D optical storage practical for long-term data archiving"

Caption

Researchers developed a new fast and energy-efficient laser-writing method for producing nanostructures in silica glass. They used the method to record 6 GB data in a one-inch silica glass sample. The four squares pictured each measure just 8.8 X 8.8 mm. They also used the laser-writing method to write the university logo and mark on the glass.

Credit

Yuhao Lei and Peter G. Kazansky, University of Southampton

Source

Read the rest of this entry »

Comments (9)

Difficult languages and easy languages, part 3

There may well be a dogma out there stating that all languages are equally complex, but I don't believe it, especially not if it has to be "drummed" into our minds.  I have learned many languages.  Some of them are exceedingly hard (because of their complexity) and some of them are relatively easy (because they are comparatively simple).  I have often said that Mandarin is the easiest language I ever learned to speak, but the hardest to read and write in characters (though very easy in Romanization).  And remember these posts:

"Difficult languages and easy languages" (3/4/17)

"Difficult languages and easy languages, part 2" (5/28/19)

Read the rest of this entry »

Comments (33)