Language Log

Outsiders and hard drives

July 4, 2015 @ 3:17 pm· Filed by Victor Mair under Language and computers, Language and society, Puns

It's a bit of a mystery how and why "outsiders" (wàidìrén 外地人) are referred to by Shanghainese as "hard disks / drives" (yìngpán 硬盘).

Intrigued, I asked around, and here are some of the replies I received.

Read the rest of this entry »

Permalink Comments (3)

Chinese Telegraph Code (CTC)

May 24, 2015 @ 8:36 pm· Filed by Victor Mair under Changing times, Information technology, Language and computers, Writing systems

Michael Rank has an interesting article on Scribd entitled "Chinese telegram, 1978" (5/22/2015).

It's about a 1978 telegram that he bought on eBay. Here's a photograph:

Read the rest of this entry »

Permalink Comments (30)

7,530,000 mainlanders petition Taiwan actress to change her name

May 14, 2015 @ 8:27 am· Filed by Victor Mair under Language and computers, Names, Writing systems

From David Moser:

7.5 mil. #China netizens don't recognize character in Taiwanese actress' name, signed petition to make her change it pic.twitter.com/3KodhcHG2i

— Chris Derps (@ChrisDerps) May 11, 2015

Read the rest of this entry »

Permalink Comments (44)

Paperless reading

April 12, 2015 @ 12:02 pm· Filed by Victor Mair under Dictionaries, Information technology, Language acquisition, Language and computers, Language and education, Language and technology, Language teaching and learning, Pedagogy

Just a little over a year ago, I made the following post:

"The future of Chinese language learning is now" (4/5/14)

The second half of that post consisted of an account of a lecture that David Moser (of Beijing Capital Normal University and Academic Director of Chinese Studies at CET Beijing) had delivered a few days earlier (on 4/1/14) at Penn: "Is Character Writing Still a Basic Skill? The New Digital Chinese Tools and their Implications for Chinese Learning".

Read the rest of this entry »

Permalink Comments (1)

Autocomplete strikes again

March 3, 2015 @ 8:43 am· Filed by Geoffrey K. Pullum under Awesomeness, Changing times, Errors, Language and computers, Language and sports, Language and technology, Orthography, Silliness, Spelling

I think I know how an unsuitable but immensely rich desert peninsula got chosen by FIFA (the international governing body for major soccer tournaments) to host the soccer World Cup in 2022.

First, a personal anecdote that triggered my hypothesis about the decision. I recently sent a text message from my smartphone and then carelessly slipped it into my pocket without making sure it had gone to sleep.

Read the rest of this entry »

Permalink Comments off

Cantonese input methods

January 20, 2015 @ 6:24 pm· Filed by Victor Mair under Language and computers, Topolects, Writing systems

Despite the efforts of the central government to clamp down on and diminish the role of Cantonese in education and in public life generally, the language has been experiencing a heady resurgence, especially in connection with the prolonged Umbrella Movement last fall.

"Cantonese resurgent" (12/11/12)

"Here’s why the name of Hong Kong’s 'Umbrella Movement' is so subversive" (10/23/14)

"Translating the Umbrella Revolution" (10/3/14)

"Cantonese protest slogans" (10/26/14), etc.

Read the rest of this entry »

Permalink Comments (9)

Zhou Youguang, 109 and going strong

January 13, 2015 @ 1:50 pm· Filed by Victor Mair under Announcements, Language and computers, Language reform, Transcription

A year ago, I wrote "Zhou Youguang, Father of Pinyin" (1/14/14) to celebrate Zhou xiansheng's 108th birthday and his many accomplishments in language reform and applied linguistics. Included in that post were a portrait of ZYG in his study and numerous links concerning the man and his works.

Read the rest of this entry »

Permalink Comments (3)

Stylometric analysis of the Sony Hacking

January 10, 2015 @ 10:59 pm· Filed by Victor Mair under Language and computers, Language and the media, Language and the movies, Linguistics in the news

The question of who was behind the hacking of Sony peaked a couple of weeks ago, but it is still a live issue. The United States government insists that it was the North Koreans who did it:

"Chief Says FBI Has No Doubt That North Korea Attacked Sony" (New York Times — January 8, 2015)‎

James B. Comey, director of the Federal Bureau of Investigation, said on Wednesday that no one should doubt that the North Korean government was behind the destructive attack on Sony’s computer network last fall.

Read the rest of this entry »

Permalink Comments (13)

Kazakh

December 12, 2014 @ 8:08 pm· Filed by Victor Mair under Language and computers, Translation

Google Translate just keeps getting bigger and bigger and better and better. As of today, it now includes Kazakh. And here's the first word that I typed in Google Translate + Kazakh:

Қазақ

Read the rest of this entry »

Permalink Comments (25)

Tim Cook, Bent Man

October 31, 2014 @ 12:23 pm· Filed by Victor Mair under Language and computers, Language and gender

Last week, China was gaga over Facebook chairman Mark Zuckerberg for gamely, if somewhat lamely, speaking Mandarin before an audience of Tsinghua University students:

"Zuckerberg's Mandarin" (10/23/14)

In the days following his sensational performance at Tsinghua, while not universally showered with adulation (and Facebook is still blocked in China), Zuckerberg was generally acclaimed for his gutsy, good-natured effort to speak to Chinese people in their own language.

In stark contrast, poor Tim Cook (Apple CEO) was mocked by the Chinese netizenry for his declaration in Bloomberg Businessweek: "So let me be clear: I’m proud to be gay…."

"Tim Cook Speaks Up" (10/30/14)

The resultant hullabaloo on the Chinese internet was instantaneous:

"Tim Cook Coming Out Has Turned China Into a Nation of 5th-Graders: Despite the Apple CEO's good intentions, Chinese netizens can't seem to stop mocking iPhones for being gay. " (10/30/2014)

Read the rest of this entry »

Permalink Comments (18)

The paucity of two-letter words

September 3, 2014 @ 10:25 am· Filed by Geoffrey K. Pullum under Language and computers

The number of possible two-letter lower-case strings over the English alphabet (not including the apostrophe) is 26² = 676. This morning I ran a script to test which two-letter sequences show up as words included in the standard 25,143-word list of words supplied with many Unix-derived systems (usually at /usr/share/dict/words). I found the proportion of two-letter sequences that are 2-letter words is roughly 9 percent (59/676 ≈ 0.09). That is, more than 90 percent of the logically possible two-letter combinations from aa to zz do not occur as spellings of common English words. You might think a lot of the explanation lies in phonetics: vowelless combinations like pq or bn are unpronounceable. But I then did the same thing for two-letter standard Unix commands: bc (basic calculator), cp (copy files), ls (list files), mv (move or rename files), etc. These arbitrarily adopted program names do not have to be pronounceable, and usually aren't. And I found that the ratio of two-letter Unix commands (more precisely, two-letter commands that have manual entries on Apple OS X version 10.6.8.) to two-letter sequences that are not Unix commands is almost exactly the same (62/676 ≈ 0.09). Why? Could it be that some kind of natural law discourages packing too many meanings into character strings (or phoneme sequences) of a given length, because it is likely to give rise to confusion or mnemonic problems? Does every language waste (as it were) at least 90 percent of the space available in the length-N sequences of letters or sounds that it uses, possibly for every N > 1?

Read the rest of this entry »

Permalink Comments (42)

Stray Chinese characters in English language documents

August 22, 2014 @ 9:33 pm· Filed by Victor Mair under Language and computers, Writing systems

Lawrence Evalyn wrote to me saying that he received the official communication below about a new student card that is being issued by his university. He was perplexed by all the Chinese characters that got inserted in the text. They seem to appear consistently in certain places and for certain letters. [N.B.: The communication has been anonymized for posting on Language Log.]

Read the rest of this entry »

Permalink Comments (10)

Is the Urdu script on the verge of dying?

June 29, 2014 @ 3:20 am· Filed by Victor Mair under Diglossia and digraphia, Language and computers, Language on the internets, Writing

Hindi-Urdu, also referred to as Hindustani, is the classic case of a digraphia, so much so that there has been a long-standing controversy over whether they are one language or two. Their colloquial spoken forms are nearly identical, but when written down, the one in the Devanāgarī script, the other in the Nastaʿlīq script, they have a very different look and "feel".

Read the rest of this entry »

Permalink Comments (56)

Archive for Language and computers

Outsiders and hard drives

Chinese Telegraph Code (CTC)

7,530,000 mainlanders petition Taiwan actress to change her name

Paperless reading

Autocomplete strikes again

Cantonese input methods

Zhou Youguang, 109 and going strong

Stylometric analysis of the Sony Hacking

Kazakh

Tim Cook, Bent Man

The paucity of two-letter words

Stray Chinese characters in English language documents

Is the Urdu script on the verge of dying?

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta