Archive for Information technology

Information Management and Library Science

Just out today, this is one of the longest book reviews I have ever written:

Jack W. Chen, Anatoly Detwyler, Xiao Liu, Christopher M. B. Nugent, and Bruce Rusk, eds., Literary Information in China:  A History (New York:  Columbia University Press, 2021).

Reviewed by Victor H. Mair

MCLC Resource Center Publication (Copyright September, 2022)

I am calling it to your attention because the book under review, which I will refer to here as LIIC, signals a sea change in:

1. Sinology
2. Information technology
3. Academic attitudes toward the study of language and literature

Read the rest of this entry »

Comments (4)

Language is not script and script is not language, part 2

[This is a guest post by Paul Shore.]

    The 2022 book Kingdom of Characters by Yale professor Jing Tsu is currently #51,777 in Amazon's sales ranking.  (The label "Best Seller" on the Amazon search-results listing for it incorporates the amusing mouseover qualification "in [the subject of] Unicode Encoding Standard".)  I haven't read the book yet:  the Arlington, Virginia library system's four copies have a wait list, and so I have a used copy coming to me in the mail.  What I have experienced, though, is a fifty-minute National Public Radio program from their podcast / broadcast series Throughline, entitled "The Characters That Built China", that's a partial summary of the material in the book, a summary that was made with major cooperation from Jing Tsu herself, with numerous recorded remarks by her alternating with remarks by the two hosts:  https://www.npr.org/podcasts/510333/throughline (scroll down to the May 26th episode).  Based on what's conveyed in this podcast / broadcast episode, I think many people on Language Log and elsewhere who care about fostering a proper understanding of human language among the general public might agree that that ranking of 51,777 is still several million too high.  But while the influence of the book's ill-informed, misleading statements about language was until a few days ago mostly confined to those individuals who'd taken the trouble to get hold of a copy of the book or had taken the trouble to listen to the Throughline episode as a podcast (it was presumably released as such on its official date of May 26th), with the recent broadcasting of the episode on NPR proper those nocive ideas have now been splashed out over the national airwaves.  And since NPR listeners typically have their ears "open like a greedy shark, to catch the tunings of a voice [supposedly] divine" (Keats), this program seems likely to inflict an unusually high amount of damage on public knowledge of linguistics. 

Read the rest of this entry »

Comments (27)

Google Translate is even better now, part 2

"Google Translate learns 24 new languages"
Isaac Caswell, Google blog (5/11/22)

==========

Illustrated green globe with the word "hello" translated into different languages.

For years, Google Translate has helped break down language barriers and connect communities all over the world. And we want to make this possible for even more people — especially those whose languages aren’t represented in most technology. So today we’ve added 24 languages to Translate, now supporting a total of 133 used around the globe.

Over 300 million people speak these newly added languages — like Mizo, used by around 800,000 people in the far northeast of India, and Lingala, used by over 45 million people across Central Africa. As part of this update, Indigenous languages of the Americas (Quechua, Guarani and Aymara) and an English dialect (Sierra Leonean Krio) have also been added to Translate for the first time.

Read the rest of this entry »

Comments (24)

The weirdness of typing errors

In this age of typing on computers and other digital devices, when we daily input thousands upon thousands of words, we are often amazed at the number and types of mistakes we make.  Many of them are simple and straightforward, as when our fingers stumblingly hit the wrong keys by sheer accident.  People who type on phones warn their correspondents about the likelihood that their messages are prone to contain such errors because they include some such warning at the bottom: 

Please forgive spelling / grammatical errors; typed on glass // sent from my phone.

Read the rest of this entry »

Comments (37)

Cambodian voice traffic

A Rest of World article from November that I missed when it first came out, but am posting on now because it speaks to the comments on several recent Language Log posts (e.g., here and here):

"Fifty percent of Facebook Messenger’s total voice traffic comes from Cambodia. Here’s why:

Keyboards weren't designed for Khmer. So Cambodians have just decided to ignore them", By Vittoria Elliott and Bopha Phorn (12 November 2021)

The first four paragraphs of this longish article

In 2018, the team at Facebook had a puzzle on their hands. Cambodian users accounted for nearly 50% of all global traffic for Messenger’s voice function, but no one at the company knew why, according to documents released by whistleblower Frances Haugen.

One employee suggested running a survey, according to internal documents viewed by Rest of World. Did it have to do with low literacy levels? they wondered. In 2020, a Facebook study attempted to ask users in countries with high audio use, but was only able to find a single Cambodian respondent, the same documents showed. The mystery, it seemed, stayed unsolved.

The answer, surprisingly, has less to do with Facebook, and more to do with the complexity of the Khmer language, and the way users adapt for a technology that was never designed with them in mind.

Read the rest of this entry »

Comments (6)

Pen scanner

New product:

With the Scanmarker Air no more Retyping- Simply Scanning!

Scan any text in a document or book and it's instantly available on your PC/Mac in any program including Word, Google Docs, Evernote and more. You can also use it on your smartphone/tablet with our app.

  • Super Easy to use
  • Scanmarker Air is 30 times faster than manual retyping
  • Scans up to 3,000 characters a minute and will save hours of tedious work
  • Can read aloud any scanned text
  • Instant translation to over 70 languages- including reading the translation aloud!

Read the rest of this entry »

Comments (25)

Language is not script and script is not language

Trying to clear up the confusion between the two is a battle we have been waging for decades, and nowhere is the problem more severe than in the study of Sinitic languages and the Sinographic script.  The crisis (not a "danger + opportunity"!) has come to the surface again this month with the appearance of a new book by Jing Tsu titled Kingdom of Characters: The Language Revolution That Made China Modern (Riverhead Books, 2022).

The publication of Tsu's book has generated a lot of excitement, publicity, and reviews.  Here I would like to call attention to the brief remarks of an anonymous correspondent (a famous, reclusive linguist) that are right on target:

Reimagining "antiquated" Chinese

Reproduced below is the text of a book review in Science that you may not have seen. It is classified as "Linguistics", though the reviewer is a historian at Cal State Poly, Pomona. Notice that Chinese is assumed to be "antiquated" and in need of being "reimagined"!  There is simply no sign of Science understanding the difference between a human language and a writing system. This is consistent with the way they have always treated linguistics; they have no idea what the subject really is.

Read the rest of this entry »

Comments (19)

The pain of forgetting one's mother / father tongue

And the pleasure of regaining it with the help of IT.

"Forgetting My First Language:  When I speak Cantonese with my parents now, I rely on translation apps."

By Jenny Liao, New Yorker
September 3, 2021

This is a perennial problem among immigrants, especially those who move to their adoptive country before the age of about eleven and a half years.  There are so many poignant moments in this article that I wish I could quote the whole of it.  Instead, I will only highlight a few of the most salient passages.

Read the rest of this entry »

Comments (8)

Another chapter in the history of the Chinese typewriter

Brian Merriman ran into this article and device when researching electronic typewriters from the 1980s:

Read the rest of this entry »

Comments (2)

"Train hard, dream big"

[This is a guest post by Bernhard Riedel]

I stumbled across what was probably a mis-MT in the context of the Olympic Games.  (article in Korean)

"During a foot kick on the way to the gold medal, some hangul became visible. But…"

On the black belt of the athlete from Spain, one can see "기차 하드, 꿈 큰" which is wonderful gibberish. Netizens in Korea were puzzled but also quick to guess an erroneous machine translation.

기차(汽車): (railway) train (definitely *not* related to "to train")
하드: (en:hard, transliterated)
꿈: dream (noun built from the verb 꾸다(to dream) with the nominalizer ㅁ/음)
큰: big (from the verb 크다) in the form used when modifying a noun that follows

Read the rest of this entry »

Comments (1)

Uncommon words of anguish

From a manual for a thermal printer:

Dǎyìn kòngzhì bǎn nèizhì GB18030 Zhōngwén zìkù, chèdǐ miǎnchú shēngpì zì de kǔnǎo

打印控制板内置 GB18030 中文字库,彻底免除生僻字的苦恼

Printer control panel built-in GB18030 Chinese character, thoroughly remove the uncommon words of anguish

(courtesy of Amy de Buitléir)

A more accurate English translation would be:

Printer control panel with built-in GB18030 Chinese character font, thoroughly removing the anguish brought about by uncommon / obscure characters

"GB" stands for "guóbiāo 国标" ("national standard"), and is used for many technical terms in the PRC (another instance of encroaching digraphia, for which see here and here [with extensive bibliography]).

Read the rest of this entry »

Comments (14)

Data vs. information

[This is a guest post by Conal Boyce]

The following was drafted as an Appendix to a project whose working title is "The Emperor's New Information" (after Penrose, The Emperor's New Mind). It's still a work-in-progress, so feedback would be welcome. For example: Are the two examples persuasive? Do they need technical clarification or correction? Have others at LL noticed how certain authors "who should know better" use the term information where data is dictated by the context, or employ the two terms at random, as if they were synonyms?

Read the rest of this entry »

Comments (35)

Alphabetical storage, ordering, and retrieval

We just had a good discussion about a Sinitic language written with an alphabet:

"The look, feel, and sound of Dungan language" (10/15/20)

Under "Selected readings" below, there are listed additional earlier posts about writing Sinitic languages with Romanization.

One of the major advantages of the alphabet over a morphosyllabic / logographic ideopicto-phonetic writing system like the Sinographic script is that it is very easy to order and find / retrieve the entire lexicon with the former, whereas carrying out these tasks with the latter is toilsome at best and torturesome at worst.  See:

Victor H. Mair, "The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects", Sino-Platonic Papers, 1 (February, 1986), 1-31 pp.

Read the rest of this entry »

Comments (29)