Archive for Language and computers

Words in Vietnamese

In "Diacriticless Vietnamese on a sign in San Francisco" (9/30/18), we discussed the advisability of joining syllables into words or separating all syllables.  The ensuing string of comments revealed that there is a correlation between linking syllables and word spacing on the one hand and the necessity for diacritical marks on the other hand.

This prompted me to ask the following questions of several colleagues who are specialists on Vietnamese:

Roughly what percentage of Vietnamese lexemes (words) are monosyllabic? Disyllabic? Any trisyllabic or higher?

The average length of a word in Mandarin is almost exactly two syllables.

Can you think of examples in Vietnamese parsing where it would be clearer or more helpful to have the syllables of words joined together?

Read the rest of this entry »

Comments (34)

The growing impact of "biaoqing" ("expressions") on the internet in China

Gabriele de Seta has a serious, scholarly article on "Biaoqing: The circulation of emoticons, emoji, stickers, and custom images on Chinese digital media platforms" in First Monday, Volume 23, Number 9 – 3 September 2018.  Here's the abstract:

The Mandarin Chinese term biaoqing, or ‘expression’, categorizes genres of visual content ranging from emoticons and emoji to stickers and custom images. This article is grounded on ethnographic research and approaches biaoqing in terms of their circulation across Chinese digital media platforms. By formulating a comprehensive typology of biaoqing genres, I foreground the situated socio-technical specificities of their circulation: the creative play with typographical compositions, the affective repurposing of graphical emoticons, the platformed monetization of proprietary stickers, and the user-driven proliferation of custom images. Drawing on this typology, I argue for the need to recognize the circulation of biaoqing as an emergent and malleable category of semiotic resources profoundly shaped by two decades of development of the Internet in China.

Read the rest of this entry »

Comments (1)

Spectral Sinographs

Comments (20)

Opening and closing necrophilia

Comments (13)

Fub

The University of Pennsylvania is instituting a Two-Step Verification for PennKey WebLogins. Up till now, our PennKey for login consisted of a Username and Password. After much effort and practice, I finally mastered that. Now, however, for the sake of greater security, after using our PennKey to log in, we will in addition be asked to go through a second step that requires us to enter a randomly generated number that will be sent to us via cell phone.

That really freaked me out, since I don't have a cell phone.

Read the rest of this entry »

Comments (48)

Corpora and the Second Amendment: Responding to Weisberg on the meaning of "bear arms" [Updated, and updated again]

An introduction and guide to my series of posts "Corpora and the Second Amendment" is available here. The corpus data that is discussed can be downloaded here. That link will take you to a shared folder in Dropbox. Important: Use the "Download" button at the top right of the screen.

New URL for COFEA and COEME: https://lawcorpus.byu.edu.

The Originalism Blog has a guest post, by David Weisberg, taking issue with the conclusion in Dennis Baron's Washington Post op-ed that newly available evidence of historical usage shows that in District of Columbia v. Heller, Justice Scalia misinterpreted the phrase keep and bear arms. That's an issue that I wrote about yesterday ("The coming corpus-based reexamination of the Second Amendment") and that I'm going to be dealing with in a series of posts over the next several weeks.

One of Weisberg's arguments concerns a linguistic issue that I'm planning to address, and I think that Weisberg is mistaken. At the risk of getting out ahead of myself, I want to respond to Weisberg briefly now, with a more detailed explanation to come.

Read the rest of this entry »

Comments (36)

Really weird sinographs

Scott Wilson has written an entertaining, and I dare say edifying, article on "W.T.F. Japan: Top 5 strangest kanji ever 【Weird Top Five】", SoraNews24 (10/6/16) — sorry I missed it when it first came out.  Wilson refers to the "Top 5 strangest kanji", but he actually treats nearly three times that many.  The reason he emphasizes "5" is so that he can stick with his theme of W.T.F., cf.:

Scott Wilson, "W.T.F. Japan: Top 5 most difficult kanji ever【Weird Top Five】", SoraNews24 (8/4/16)

Scott Wilson, "W.T.F. Japan: Top 5 kanji with the longest readings【Weird Top Five】", SoraNews24 (4/20/17)

Read the rest of this entry »

Comments (18)

Kanji as commodity

On Friday, April 27, I participated in "Seeking a Future for East Asia’s Past:  A Workshop on Sinographic Sphere Studies" at Boston University.  Among the participants was Terry Kawashima who talked about the commodification and fetishization of kanji.  The following paragraphs are a revised version of a portion of her remarks:

Read the rest of this entry »

Comments (4)

Colossal translation fail at the Boao Forum for Asia

China is currently hosting the Boao Forum for Asia in Hainan, the smallest and southernmost province of the PRC.  The BFA bills itself as the "Asian Davos", after the World Economic Forum held annually in Davos, Switzerland.  The BFA draws representatives from many countries, so naturally they have to provide translation services.  Unfortunately, the machine translation system they used this year failed miserably.  Here are screenshots of a couple of examples:

Read the rest of this entry »

Comments (14)

The elegance of Google Translate

When I was in graduate school, some of my best friends were mathematicians.  I was always intrigued by their approach to problem solving.  They told me that merely solving problems was not satisfying to them.  Rather, their goal was to solve problems elegantly.

This morning, I was reminded of the modus operandi of mathematicians when I asked Google Translate (GT) to render a short passage of German into English.

Read the rest of this entry »

Comments (39)

The letter * has bee* ba**ed in Chi*a

Since the announcement by the Chinese Communist Party (CCP) yesterday that the President of China would no longer be limited to two five-year terms in office, as had been the case since the days when Chairman Mao ruled, there has been much turmoil and trepidation among China watchers and Chinese citizens.  Essentially, it means that Xi Jinping has become dictator for life, which is not what people had been hoping for since Richard Nixon went to China 46 years and 5 days ago.  What everyone had expected was that China would "reform and open up" (gǎigé kāifàng 改革開放), which became an official policy as of December, 1978.  Instead, all indications from the first five years of Xi's regime and the newly announced policy changes regarding Xi Jinping thought and governance are that China has jumped right back to the 1950s in terms of policies and procedures.

Read the rest of this entry »

Comments (34)

Shadowsocks

The immediate reason for writing this post is the curiosity of an important Chinese product, Shadowsocks, whose name is known only in English and whose author, clowwindy, has only an English name.

Shadowsocks is an open-source encrypted proxy project, widely used in mainland China to circumvent Internet censorship. It was created in 2012 by a Chinese programmer named "clowwindy", and multiple implementations of the protocol have been made available since. Typically, the client software will open a socks5 proxy on the machine it is run, which internet traffic can then be directed towards, similarly to an SSH tunnel. Unlike an SSH tunnel, shadowsocks can also proxy UDP traffic.

Source

Read the rest of this entry »

Comments (9)

Don't blame Google Translate

Douglas Hofstadter has a critical article in the latest issue of The Atlantic (1/30/18):

"The Shallowness of Google Translate:  The program uses state-of-the-art AI techniques, but simple tests show that it's a long way from real understanding." (1/30/18).

Hofstadter criticizes GT for not being as good as himself at translating from French, German, and Chinese into English.  I will let others respond to his critique of the French and German translations, but I will comment on his critique of the Chinese to English translation.

Read the rest of this entry »

Comments (21)