Archive for Language and computers

Emojis vs. emoticons

Here's an emoji:  😻

Here's an emoticon:  :‐)

As we will see below, the superficial resemblance of the two words is completely coincidental — even though they both have to do with the visual depiction of emotions and ideas in texts.

This post began as a comment to "Emoticons as writing" (7/7/19), but it soon became too long and too complex to fit in a comment, so it now receives separate treatment of its own.

Read the rest of this entry »

Comments (25)

Emoticons as writing

This morning I received this card from a friend:

Read the rest of this entry »

Comments (22)

Mandarin hospital robocalls

Article in The Washington Post (6/18/19):

"Robocalls are overwhelming hospitals and patients, threatening a new kind of health crisis"

" … Many of the messages seemed to be the same: Speaking in Mandarin, an unknown voice threatened deportation unless the person who picked up the phone provided their personal information…."

Read the rest of this entry »

Comments (7)

The CCP's Learning / Learning Xi (Thought) app

A couple of nights ago, I had dinner with one of my students from China and his parents, both of whom are members of the Chinese Communist Party (CCP).  The father is a doctor and has to work 10 hours a day, during which he sees a hundred patients every day.  Most of them are suffering from diabetes.  At the end of his long day, the father is required (i.e., not optional) to log into the Party's Xuéxí / Xué Xi 学习 ("Learning / Learn Xi [Thought]") app — full name "Xuéxí / Xué Xi qiángguó 学习强国" ("Learning / Learn Xi [Thought]" to strengthen the nation"), which was launched in the early part of 2015.

Read the rest of this entry »

Comments (29)

Odevity or parity

[This is a guest post by Jeffrey Shallit]

A Chinese student here at Waterloo used the term "odevity" for what English-speaking computer scientists typically call "parity" — the property of an integer being odd or even.

I had never heard this term before, so I used Google Scholar to look at where it is being used.  It is used almost exclusively by Chinese engineers, mathematicians, and computer scientists.  The first usage I was able to find with Google Book Search was in 1972, obtained with this search.

Read the rest of this entry »

Comments (23)

Digitizing specialized language dictionaries

[The following is a guest post by David Dettmann.  The "Schwarz Uyghur dictionary" to which he refers in the third paragraph is this:  Henry G. Schwarz, An Uyghur-English dictionary (Bellingham, Washington:  Center for East Asian Studies, Western Washington University, 1992).]


It is a bit of a nerdy obsession of mine to customize my computers to comfortably use languages that I've studied.

About 10 years ago, I got relatively proficient with using optical character recognition (OCR) software and scanner hardware. Any time I found an essential dictionary for the languages I studied, I converted them to unicode OCR scans in pdf format (i.e., converting images of pages to text). I later used that data to create dictionary content files that would work together with the Mac OS dictionary application. I did this process with several dictionaries that I found essential while I studied Kazakh, Uzbek, and Uyghur.

This process was particularly useful for me to use the Schwarz Uyghur dictionary. I could not get used to the alphabetical order that he favored (which was different from typical Latin order AND Uyghur Arabic script order). As a result, any lookup would just take forever. That said, the formatting of each page was quite pleasant, and there were some nice illustrations of plants of traditional Uyghur medicine as well as handy keys at the bottom of each page to explain abbreviations.

Read the rest of this entry »

Comments (5)

Phonetic annotations as a welcome aid for learning how to read and write Sinographs

In several recent posts, we've been discussing the most efficient, least painful way to acquire facility with hanzi / kanji / hanja 漢字 ("Sinographs; Chinese characters").  Lord knows there are endless numbers of them and they are so intricately constructed that it is an arduous task to master the two thousand or so that are necessary for basic literacy.

It would be so much easier to learn the Sinographs if language pedagogues would provide phonetic annotations for each character.  Better yet, the phonetic annotations should be divided into words with spaces between them according to the official orthographic rules.

Read the rest of this entry »

Comments (26)

Korean inputting on cellphones

For the first time in my life, I closely observed someone inputting Korean on a cell phone.  (I was sitting behind the person doing it on the train ride to the city this afternoon.)  Of course, I don't know exactly how it works, but what I observed was very interesting.

First of all, the young woman's phone had a special feature I've never seen in any other type of inputting.  Namely, she could use a little, built-in, popup, electronic magnifying glass to hover over a particular syllable block that she had composed to inspect it carefully to see that she had formed it correctly.  She did this fairly often.

Next, she seemed to spend a lot of time typing and retyping individual syllable blocks to make sure she got them right.

Read the rest of this entry »

Comments (22)

Automated transcription-cum-translation

Marc Sarrel received the following message on his voicemail:

Read the rest of this entry »

Comments (7)

Chinese translation app with built-in censorship

What good is a translation app that automatically censors politically sensitive terms?  Well, a leading Chinese translation app is now doing exactly that.

"A Chinese translation app is censoring politically sensitive terms, report says", Zoey Chong, CNET (11/27/18)

iFlytek, a voice recognition technology provider in China, has begun censoring politically sensitive terms from its translation app, South China Morning Post reported citing a tweet by Jane Manchun Wong. Wong is a software engineer who tweets frequently about hidden features she uncovers by performing app reverse-engineering.

In the tweet, Wong shows that when she tried to translate certain phrases such as "Taiwan independence," "Tiananmen square" and "Tiananmen square massacre" from English to Chinese, the system failed to churn out results for sensitive terms or names. The same happened when she tried to translate "Taiwan independence" from Chinese to English — results showed up as an asterisk.

Read the rest of this entry »

Comments (6)

Idiosyncratic stroke order

Comments (15)

I pressed the "correct" button three times and the ATM ate my card

That's what happened to Paul Midler when confronted with this display on an ATM in China:

Read the rest of this entry »

Comments (10)

Words in Vietnamese

In "Diacriticless Vietnamese on a sign in San Francisco" (9/30/18), we discussed the advisability of joining syllables into words or separating all syllables.  The ensuing string of comments revealed that there is a correlation between linking syllables and word spacing on the one hand and the necessity for diacritical marks on the other hand.

This prompted me to ask the following questions of several colleagues who are specialists on Vietnamese:

Roughly what percentage of Vietnamese lexemes (words) are monosyllabic? Disyllabic? Any trisyllabic or higher?

The average length of a word in Mandarin is almost exactly two syllables.

Can you think of examples in Vietnamese parsing where it would be clearer or more helpful to have the syllables of words joined together?

Read the rest of this entry »

Comments (34)