Language Log

Archive for Information technology

Google Translate is even better now, part 2

May 12, 2022 @ 5:30 pm· Filed by Victor Mair under Artificial intelligence, Information technology, Translation

"Google Translate learns 24 new languages"
Isaac Caswell, Google blog (5/11/22)

==========

Illustrated green globe with the word "hello" translated into different languages.

For years, Google Translate has helped break down language barriers and connect communities all over the world. And we want to make this possible for even more people — especially those whose languages aren’t represented in most technology. So today we’ve added 24 languages to Translate, now supporting a total of 133 used around the globe.

Over 300 million people speak these newly added languages — like Mizo, used by around 800,000 people in the far northeast of India, and Lingala, used by over 45 million people across Central Africa. As part of this update, Indigenous languages of the Americas (Quechua, Guarani and Aymara) and an English dialect (Sierra Leonean Krio) have also been added to Translate for the first time.

Read the rest of this entry »

Permalink Comments (24)

The weirdness of typing errors

March 14, 2022 @ 11:13 am· Filed by Victor Mair under Computational linguistics, Errors, Etymology, Information technology, Language and computers, Language and psychology, Miswriting, Phonetics and phonology, Psychology of language, Typing

In this age of typing on computers and other digital devices, when we daily input thousands upon thousands of words, we are often amazed at the number and types of mistakes we make. Many of them are simple and straightforward, as when our fingers stumblingly hit the wrong keys by sheer accident. People who type on phones warn their correspondents about the likelihood that their messages are prone to contain such errors because they include some such warning at the bottom:

Please forgive spelling / grammatical errors; typed on glass // sent from my phone.

Read the rest of this entry »

Permalink Comments (37)

Cambodian voice traffic

February 11, 2022 @ 7:57 pm· Filed by Victor Mair under Information technology, Typing, Writing systems

A Rest of World article from November that I missed when it first came out, but am posting on now because it speaks to the comments on several recent Language Log posts (e.g., here and here):

"Fifty percent of Facebook Messenger’s total voice traffic comes from Cambodia. Here’s why:

Keyboards weren't designed for Khmer. So Cambodians have just decided to ignore them", By Vittoria Elliott and Bopha Phorn (12 November 2021)

The first four paragraphs of this longish article

In 2018, the team at Facebook had a puzzle on their hands. Cambodian users accounted for nearly 50% of all global traffic for Messenger’s voice function, but no one at the company knew why, according to documents released by whistleblower Frances Haugen.

One employee suggested running a survey, according to internal documents viewed by Rest of World. Did it have to do with low literacy levels? they wondered. In 2020, a Facebook study attempted to ask users in countries with high audio use, but was only able to find a single Cambodian respondent, the same documents showed. The mystery, it seemed, stayed unsolved.

The answer, surprisingly, has less to do with Facebook, and more to do with the complexity of the Khmer language, and the way users adapt for a technology that was never designed with them in mind.

Read the rest of this entry »

Permalink Comments (6)

Pen scanner

February 9, 2022 @ 12:14 pm· Filed by Victor Mair under Information technology

New product:

With the Scanmarker Air no more Retyping- Simply Scanning!

Scan any text in a document or book and it's instantly available on your PC/Mac in any program including Word, Google Docs, Evernote and more. You can also use it on your smartphone/tablet with our app.

Super Easy to use
Scanmarker Air is 30 times faster than manual retyping
Scans up to 3,000 characters a minute and will save hours of tedious work
Can read aloud any scanned text
Instant translation to over 70 languages- including reading the translation aloud!

Read the rest of this entry »

Permalink Comments (25)

Language is not script and script is not language

January 23, 2022 @ 8:51 pm· Filed by Victor Mair under Diglossia and digraphia, Emojis and emoticons, Information technology, Language and computers, Language teaching and learning, Typing, Writing systems

Trying to clear up the confusion between the two is a battle we have been waging for decades, and nowhere is the problem more severe than in the study of Sinitic languages and the Sinographic script. The crisis (not a "danger + opportunity"!) has come to the surface again this month with the appearance of a new book by Jing Tsu titled Kingdom of Characters: The Language Revolution That Made China Modern (Riverhead Books, 2022).

The publication of Tsu's book has generated a lot of excitement, publicity, and reviews. Here I would like to call attention to the brief remarks of an anonymous correspondent (a famous, reclusive linguist) that are right on target:

Reimagining "antiquated" Chinese

Reproduced below is the text of a book review in Science that you may not have seen. It is classified as "Linguistics", though the reviewer is a historian at Cal State Poly, Pomona. Notice that Chinese is assumed to be "antiquated" and in need of being "reimagined"! There is simply no sign of Science understanding the difference between a human language and a writing system. This is consistent with the way they have always treated linguistics; they have no idea what the subject really is.

Read the rest of this entry »

Permalink Comments (19)

The pain of forgetting one's mother / father tongue

September 4, 2021 @ 7:45 am· Filed by Victor Mair under Information technology, Language loss

And the pleasure of regaining it with the help of IT.

"Forgetting My First Language: When I speak Cantonese with my parents now, I rely on translation apps."

By Jenny Liao, New Yorker
September 3, 2021

This is a perennial problem among immigrants, especially those who move to their adoptive country before the age of about eleven and a half years. There are so many poignant moments in this article that I wish I could quote the whole of it. Instead, I will only highlight a few of the most salient passages.

Read the rest of this entry »

Permalink Comments (8)

Another chapter in the history of the Chinese typewriter

August 14, 2021 @ 8:22 pm· Filed by Victor Mair under Information technology, Language and computers, Typography

Brian Merriman ran into this article and device when researching electronic typewriters from the 1980s:

Read the rest of this entry »

Permalink Comments (2)

"Train hard, dream big"

July 26, 2021 @ 5:10 am· Filed by Victor Mair under Alphabets, Elephant semifics, Errors, Information technology, Lost in translation

[This is a guest post by Bernhard Riedel]

I stumbled across what was probably a mis-MT in the context of the Olympic Games. (article in Korean)

"During a foot kick on the way to the gold medal, some hangul became visible. But…"

On the black belt of the athlete from Spain, one can see "기차 하드, 꿈 큰" which is wonderful gibberish. Netizens in Korea were puzzled but also quick to guess an erroneous machine translation.

기차(汽車): (railway) train (definitely *not* related to "to train")
하드: (en:hard, transliterated)
꿈: dream (noun built from the verb 꾸다(to dream) with the nominalizer ㅁ/음)
큰: big (from the verb 크다) in the form used when modifying a noun that follows

Read the rest of this entry »

Permalink Comments (1)

Uncommon words of anguish

July 18, 2021 @ 5:31 am· Filed by Victor Mair under Diglossia and digraphia, Information technology, Language and computers, Lost in translation, Writing systems

From a manual for a thermal printer:

Dǎyìn kòngzhì bǎn nèizhì GB18030 Zhōngwén zìkù, chèdǐ miǎnchú shēngpì zì de kǔnǎo

打印控制板内置 GB18030 中文字库,彻底免除生僻字的苦恼

Printer control panel built-in GB18030 Chinese character, thoroughly remove the uncommon words of anguish

(courtesy of Amy de Buitléir)

A more accurate English translation would be:

Printer control panel with built-in GB18030 Chinese character font, thoroughly removing the anguish brought about by uncommon / obscure characters

"GB" stands for "guóbiāo 国标" ("national standard"), and is used for many technical terms in the PRC (another instance of encroaching digraphia, for which see here and here [with extensive bibliography]).

Read the rest of this entry »

Permalink Comments (14)

Data vs. information

February 7, 2021 @ 3:44 am· Filed by Victor Mair under Computational linguistics, Information technology, Language and biology, Language and science

[This is a guest post by Conal Boyce]

The following was drafted as an Appendix to a project whose working title is "The Emperor's New Information" (after Penrose, The Emperor's New Mind). It's still a work-in-progress, so feedback would be welcome. For example: Are the two examples persuasive? Do they need technical clarification or correction? Have others at LL noticed how certain authors "who should know better" use the term information where data is dictated by the context, or employ the two terms at random, as if they were synonyms?

Read the rest of this entry »

Permalink Comments (35)

Alphabetical storage, ordering, and retrieval

October 18, 2020 @ 6:57 am· Filed by Victor Mair under Alphabets, Dictionaries, Information technology, Language and computers, Lexicon and lexicography

We just had a good discussion about a Sinitic language written with an alphabet:

"The look, feel, and sound of Dungan language" (10/15/20)

Under "Selected readings" below, there are listed additional earlier posts about writing Sinitic languages with Romanization.

One of the major advantages of the alphabet over a morphosyllabic / logographic ideopicto-phonetic writing system like the Sinographic script is that it is very easy to order and find / retrieve the entire lexicon with the former, whereas carrying out these tasks with the latter is toilsome at best and torturesome at worst. See:

Victor H. Mair, "The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects", Sino-Platonic Papers, 1 (February, 1986), 1-31 pp.

Read the rest of this entry »

Permalink Comments (29)

"One I first saw": more on homophonically induced typing errors

April 8, 2020 @ 1:52 pm· Filed by Victor Mair under Errors, Information technology, Language and computers, Miswriting, Phonetics and phonology, Typography

A little over a week ago, I described how I mistyped "stalk" for "stock". That led to a vigorous discussion of precisely how people pronounce "stalk". (As a matter of fact, in my own idiolect I do pronounce "stock" and "stalk" identically.) See:

"Take stalk of: thoughts on philology and Sinology" (3/29/20)

I just now typed "One I first saw…" when I meant "When I first saw…".

Read the rest of this entry »

Permalink Comments (46)