Archive for Language and computers

Odevity or parity

[This is a guest post by Jeffrey Shallit]

A Chinese student here at Waterloo used the term "odevity" for what English-speaking computer scientists typically call "parity" — the property of an integer being odd or even.

I had never heard this term before, so I used Google Scholar to look at where it is being used.  It is used almost exclusively by Chinese engineers, mathematicians, and computer scientists.  The first usage I was able to find with Google Book Search was in 1972, obtained with this search.

Read the rest of this entry »

Comments (22)

Digitizing specialized language dictionaries

[The following is a guest post by David Dettmann.  The "Schwarz Uyghur dictionary" to which he refers in the third paragraph is this:  Henry G. Schwarz, An Uyghur-English dictionary (Bellingham, Washington:  Center for East Asian Studies, Western Washington University, 1992).]


It is a bit of a nerdy obsession of mine to customize my computers to comfortably use languages that I've studied.

About 10 years ago, I got relatively proficient with using optical character recognition (OCR) software and scanner hardware. Any time I found an essential dictionary for the languages I studied, I converted them to unicode OCR scans in pdf format (i.e., converting images of pages to text). I later used that data to create dictionary content files that would work together with the Mac OS dictionary application. I did this process with several dictionaries that I found essential while I studied Kazakh, Uzbek, and Uyghur.

This process was particularly useful for me to use the Schwarz Uyghur dictionary. I could not get used to the alphabetical order that he favored (which was different from typical Latin order AND Uyghur Arabic script order). As a result, any lookup would just take forever. That said, the formatting of each page was quite pleasant, and there were some nice illustrations of plants of traditional Uyghur medicine as well as handy keys at the bottom of each page to explain abbreviations.

Read the rest of this entry »

Comments (5)

Phonetic annotations as a welcome aid for learning how to read and write Sinographs

In several recent posts, we've been discussing the most efficient, least painful way to acquire facility with hanzi / kanji / hanja 漢字 ("Sinographs; Chinese characters").  Lord knows there are endless numbers of them and they are so intricately constructed that it is an arduous task to master the two thousand or so that are necessary for basic literacy.

It would be so much easier to learn the Sinographs if language pedagogues would provide phonetic annotations for each character.  Better yet, the phonetic annotations should be divided into words with spaces between them according to the official orthographic rules.

Read the rest of this entry »

Comments (26)

Korean inputting on cellphones

For the first time in my life, I closely observed someone inputting Korean on a cell phone.  (I was sitting behind the person doing it on the train ride to the city this afternoon.)  Of course, I don't know exactly how it works, but what I observed was very interesting.

First of all, the young woman's phone had a special feature I've never seen in any other type of inputting.  Namely, she could use a little, built-in, popup, electronic magnifying glass to hover over a particular syllable block that she had composed to inspect it carefully to see that she had formed it correctly.  She did this fairly often.

Next, she seemed to spend a lot of time typing and retyping individual syllable blocks to make sure she got them right.

Read the rest of this entry »

Comments (22)

Automated transcription-cum-translation

Marc Sarrel received the following message on his voicemail:

Read the rest of this entry »

Comments (7)

Chinese translation app with built-in censorship

What good is a translation app that automatically censors politically sensitive terms?  Well, a leading Chinese translation app is now doing exactly that.

"A Chinese translation app is censoring politically sensitive terms, report says", Zoey Chong, CNET (11/27/18)

iFlytek, a voice recognition technology provider in China, has begun censoring politically sensitive terms from its translation app, South China Morning Post reported citing a tweet by Jane Manchun Wong. Wong is a software engineer who tweets frequently about hidden features she uncovers by performing app reverse-engineering.

In the tweet, Wong shows that when she tried to translate certain phrases such as "Taiwan independence," "Tiananmen square" and "Tiananmen square massacre" from English to Chinese, the system failed to churn out results for sensitive terms or names. The same happened when she tried to translate "Taiwan independence" from Chinese to English — results showed up as an asterisk.

Read the rest of this entry »

Comments (6)

Idiosyncratic stroke order

Comments (15)

I pressed the "correct" button three times and the ATM ate my card

That's what happened to Paul Midler when confronted with this display on an ATM in China:

Read the rest of this entry »

Comments (10)

Words in Vietnamese

In "Diacriticless Vietnamese on a sign in San Francisco" (9/30/18), we discussed the advisability of joining syllables into words or separating all syllables.  The ensuing string of comments revealed that there is a correlation between linking syllables and word spacing on the one hand and the necessity for diacritical marks on the other hand.

This prompted me to ask the following questions of several colleagues who are specialists on Vietnamese:

Roughly what percentage of Vietnamese lexemes (words) are monosyllabic? Disyllabic? Any trisyllabic or higher?

The average length of a word in Mandarin is almost exactly two syllables.

Can you think of examples in Vietnamese parsing where it would be clearer or more helpful to have the syllables of words joined together?

Read the rest of this entry »

Comments (34)

The growing impact of "biaoqing" ("expressions") on the internet in China

Gabriele de Seta has a serious, scholarly article on "Biaoqing: The circulation of emoticons, emoji, stickers, and custom images on Chinese digital media platforms" in First Monday, Volume 23, Number 9 – 3 September 2018.  Here's the abstract:

The Mandarin Chinese term biaoqing, or ‘expression’, categorizes genres of visual content ranging from emoticons and emoji to stickers and custom images. This article is grounded on ethnographic research and approaches biaoqing in terms of their circulation across Chinese digital media platforms. By formulating a comprehensive typology of biaoqing genres, I foreground the situated socio-technical specificities of their circulation: the creative play with typographical compositions, the affective repurposing of graphical emoticons, the platformed monetization of proprietary stickers, and the user-driven proliferation of custom images. Drawing on this typology, I argue for the need to recognize the circulation of biaoqing as an emergent and malleable category of semiotic resources profoundly shaped by two decades of development of the Internet in China.

Read the rest of this entry »

Comments (1)

Spectral Sinographs

Comments (20)

Opening and closing necrophilia

Comments (13)

Fub

The University of Pennsylvania is instituting a Two-Step Verification for PennKey WebLogins. Up till now, our PennKey for login consisted of a Username and Password. After much effort and practice, I finally mastered that. Now, however, for the sake of greater security, after using our PennKey to log in, we will in addition be asked to go through a second step that requires us to enter a randomly generated number that will be sent to us via cell phone.

That really freaked me out, since I don't have a cell phone.

Read the rest of this entry »

Comments (48)