Archive for Information technology

The weirdness of typing errors

In this age of typing on computers and other digital devices, when we daily input thousands upon thousands of words, we are often amazed at the number and types of mistakes we make.  Many of them are simple and straightforward, as when our fingers stumblingly hit the wrong keys by sheer accident.  People who type on phones warn their correspondents about the likelihood that their messages are prone to contain such errors because they include some such warning at the bottom: 

Please forgive spelling / grammatical errors; typed on glass // sent from my phone.

Read the rest of this entry »

Comments (37)

Cambodian voice traffic

A Rest of World article from November that I missed when it first came out, but am posting on now because it speaks to the comments on several recent Language Log posts (e.g., here and here):

"Fifty percent of Facebook Messenger’s total voice traffic comes from Cambodia. Here’s why:

Keyboards weren't designed for Khmer. So Cambodians have just decided to ignore them", By Vittoria Elliott and Bopha Phorn (12 November 2021)

The first four paragraphs of this longish article

In 2018, the team at Facebook had a puzzle on their hands. Cambodian users accounted for nearly 50% of all global traffic for Messenger’s voice function, but no one at the company knew why, according to documents released by whistleblower Frances Haugen.

One employee suggested running a survey, according to internal documents viewed by Rest of World. Did it have to do with low literacy levels? they wondered. In 2020, a Facebook study attempted to ask users in countries with high audio use, but was only able to find a single Cambodian respondent, the same documents showed. The mystery, it seemed, stayed unsolved.

The answer, surprisingly, has less to do with Facebook, and more to do with the complexity of the Khmer language, and the way users adapt for a technology that was never designed with them in mind.

Read the rest of this entry »

Comments (6)

Pen scanner

New product:

With the Scanmarker Air no more Retyping- Simply Scanning!

Scan any text in a document or book and it's instantly available on your PC/Mac in any program including Word, Google Docs, Evernote and more. You can also use it on your smartphone/tablet with our app.

  • Super Easy to use
  • Scanmarker Air is 30 times faster than manual retyping
  • Scans up to 3,000 characters a minute and will save hours of tedious work
  • Can read aloud any scanned text
  • Instant translation to over 70 languages- including reading the translation aloud!

Read the rest of this entry »

Comments (25)

Language is not script and script is not language

Trying to clear up the confusion between the two is a battle we have been waging for decades, and nowhere is the problem more severe than in the study of Sinitic languages and the Sinographic script.  The crisis (not a "danger + opportunity"!) has come to the surface again this month with the appearance of a new book by Jing Tsu titled Kingdom of Characters: The Language Revolution That Made China Modern (Riverhead Books, 2022).

The publication of Tsu's book has generated a lot of excitement, publicity, and reviews.  Here I would like to call attention to the brief remarks of an anonymous correspondent (a famous, reclusive linguist) that are right on target:

Reimagining "antiquated" Chinese

Reproduced below is the text of a book review in Science that you may not have seen. It is classified as "Linguistics", though the reviewer is a historian at Cal State Poly, Pomona. Notice that Chinese is assumed to be "antiquated" and in need of being "reimagined"!  There is simply no sign of Science understanding the difference between a human language and a writing system. This is consistent with the way they have always treated linguistics; they have no idea what the subject really is.

Read the rest of this entry »

Comments (19)

The pain of forgetting one's mother / father tongue

And the pleasure of regaining it with the help of IT.

"Forgetting My First Language:  When I speak Cantonese with my parents now, I rely on translation apps."

By Jenny Liao, New Yorker
September 3, 2021

This is a perennial problem among immigrants, especially those who move to their adoptive country before the age of about eleven and a half years.  There are so many poignant moments in this article that I wish I could quote the whole of it.  Instead, I will only highlight a few of the most salient passages.

Read the rest of this entry »

Comments (8)

Another chapter in the history of the Chinese typewriter

Brian Merriman ran into this article and device when researching electronic typewriters from the 1980s:

Read the rest of this entry »

Comments (2)

"Train hard, dream big"

[This is a guest post by Bernhard Riedel]

I stumbled across what was probably a mis-MT in the context of the Olympic Games.  (article in Korean)

"During a foot kick on the way to the gold medal, some hangul became visible. But…"

On the black belt of the athlete from Spain, one can see "기차 하드, 꿈 큰" which is wonderful gibberish. Netizens in Korea were puzzled but also quick to guess an erroneous machine translation.

기차(汽車): (railway) train (definitely *not* related to "to train")
하드: (en:hard, transliterated)
꿈: dream (noun built from the verb 꾸다(to dream) with the nominalizer ㅁ/음)
큰: big (from the verb 크다) in the form used when modifying a noun that follows

Read the rest of this entry »

Comments (1)

Uncommon words of anguish

From a manual for a thermal printer:

Dǎyìn kòngzhì bǎn nèizhì GB18030 Zhōngwén zìkù, chèdǐ miǎnchú shēngpì zì de kǔnǎo

打印控制板内置 GB18030 中文字库,彻底免除生僻字的苦恼

Printer control panel built-in GB18030 Chinese character, thoroughly remove the uncommon words of anguish

(courtesy of Amy de Buitléir)

A more accurate English translation would be:

Printer control panel with built-in GB18030 Chinese character font, thoroughly removing the anguish brought about by uncommon / obscure characters

"GB" stands for "guóbiāo 国标" ("national standard"), and is used for many technical terms in the PRC (another instance of encroaching digraphia, for which see here and here [with extensive bibliography]).

Read the rest of this entry »

Comments (14)

Data vs. information

[This is a guest post by Conal Boyce]

The following was drafted as an Appendix to a project whose working title is "The Emperor's New Information" (after Penrose, The Emperor's New Mind). It's still a work-in-progress, so feedback would be welcome. For example: Are the two examples persuasive? Do they need technical clarification or correction? Have others at LL noticed how certain authors "who should know better" use the term information where data is dictated by the context, or employ the two terms at random, as if they were synonyms?

Read the rest of this entry »

Comments (35)

Alphabetical storage, ordering, and retrieval

We just had a good discussion about a Sinitic language written with an alphabet:

"The look, feel, and sound of Dungan language" (10/15/20)

Under "Selected readings" below, there are listed additional earlier posts about writing Sinitic languages with Romanization.

One of the major advantages of the alphabet over a morphosyllabic / logographic ideopicto-phonetic writing system like the Sinographic script is that it is very easy to order and find / retrieve the entire lexicon with the former, whereas carrying out these tasks with the latter is toilsome at best and torturesome at worst.  See:

Victor H. Mair, "The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects", Sino-Platonic Papers, 1 (February, 1986), 1-31 pp.

Read the rest of this entry »

Comments (29)

"One I first saw": more on homophonically induced typing errors

A little over a week ago, I described how I mistyped "stalk" for "stock".  That led to a vigorous discussion of precisely how people pronounce "stalk".  (As a matter of fact, in my own idiolect I do pronounce "stock" and "stalk" identically.)  See:

"Take stalk of: thoughts on philology and Sinology" (3/29/20)

I just now typed "One I first saw…" when I meant "When I first saw…".

Read the rest of this entry »

Comments (46)

Are you in the book today?

[This is a guest post by Nathan Hopson, who sent along the two screen shots with which it begins.]

Another splendid example of why punctuation matters and why machine translation is dumb…

Read the rest of this entry »

Comments (18)

Literary Sinitic / Classical Chinese dependency parsing

We are keenly aware that, while advances in machine translation of Vernacular Sinitic (VS) (Mandarin) are quite impressive and fundamentally serviceable, they cannot be applied directly to the translation of Literary Sinitic / Classical Chinese (LS/CC).  That would be like using an Italian translating program for Latin, a Hindi translation program for Sanskrit, or a Modern Greek translation program for Classical Greek, probably even less useful than these parallel cases, because the whole structure and nature of LS/CC and VS are different from each other.

However, now there is available a LS/CC parsing program that takes us on a major step toward a functional system for the machine translation of the literary / classical written language (it is only a written / book language, not a spoken language).  It was developed by  YASUOKA Koichi 安岡 孝一 of Kyoto University's Institute for Research in Humanities (Jinbun kagaku kenkyūjo 人文科学研究所) and is available here.

Read the rest of this entry »

Comments (5)