Archive for Information technology

More on Chinese telegraph codes

John McVey was rooting around in Language Log for recent posts about telegraphic codes, and stumbled upon this:

"Chinese Telegraph Code (CTC)" (5/24/15)

What we learned there is that the CTC consists of 10,000 numbers arbitrarily assigned to the same amount of characters, one number per character.

Read the rest of this entry »


The ultimate Chinese character input method

Never mind that it doesn't work, this is the supreme pipe dream for inputting Chinese characters on electronic communication and information processing devices.  Of the many thousands of Chinese character inputting systems (see also here and here) that have been devised, some work fairly well and some barely function at all, but this one has to take the cake for being the most ridiculous of all.  It is all the more preposterous that initially it was intended for smartwatches with their tiny glass surfaces.

The name of the system gives it away, that is, yībǐyīzì 一筆一字 ("one stroke one character").

Read the rest of this entry »

Comments (12)


Here's another eye-opening article from Quartz:

"Stop texting right now and learn from the Chinese: there’s a better way to message" (7/02/15) by Josh Horwitz.

I missed the article when it came out back in July, and even now wouldn't have known about this new fad that is sweeping China if Kyle Wilcox hadn't called it to my attention.

What the article describes is the craze for sending short audio clips instead of text messages.

Read the rest of this entry »

Comments (27)

Chinese Telegraph Code (CTC)

Michael Rank has an interesting article on Scribd entitled "Chinese telegram, 1978" (5/22/2015).

It's about a 1978 telegram that he bought on eBay.  Here's a photograph:

Read the rest of this entry »

Comments (30)

Paperless reading

Just a little over a year ago, I made the following post:

"The future of Chinese language learning is now"  (4/5/14)

The second half of that post consisted of an account of a lecture that David Moser (of Beijing Capital Normal University and Academic Director of Chinese Studies at CET Beijing) had delivered a few days earlier (on 4/1/14) at Penn:  "Is Character Writing Still a Basic Skill?  The New Digital Chinese Tools and their Implications for Chinese Learning".

Read the rest of this entry »

Comments (1)

Error-laden phishing attempts

Phishers trawling for email account names are generally smart enough to pull all sorts of programming tricks, forging headers and obtaining lists of spammable addresses and setting up arrangements to capture login names and passwords obediently typed in by the gullible; but then they give themselves away with errors of grammar and punctuation that are just too gross to be perpetrated by the authorized guys at the communications and technology services unit.

I received a phishing spam today that had no To-line at all (none of that "undisclosed recipients" stuff, and no mention of my email address in it anywhere). It looked sort of convincing in its announcement that webmail account holders would have to take certain steps to ensure the preservation of their address books after being "upgraded to a new enhanced Outlook interface". (My own university has, tragically, been induced to do an upgrade of this kind to its employee email services.) But the linguistic errors in the message begin with the 13th character in the From line (that second comma is wrong). I reproduce below the raw text of what I received, stripping out only the locally generated receipt and spam-checking headers (and by the way, this message—spam though it is—succeeded in getting a spam score of 0).

Read the rest of this entry »

Comments off

Famous last words

Guest post by Karen Stollznow

In recent weeks we've been following the tragedy and mystery of the Malaysia Airlines flight 370 that vanished on March 8 with 239 people on board. Less than an hour after taking off from Kuala Lumpur en route to Beijing all communication was cut off. The plane diverted unexpectedly across the Indian Ocean and disappeared from civilian air traffic control screens. There has been much controversy surrounding the transcript of the last incoming transmission between the air traffic controller and the cockpit of the ill-fated flight.

We tend to have a morbid fascination with people's last words. We assign profound meaning and philosophical insights to the final words uttered by those who face their fate ahead of us. There are numerous books and websites that chronicle the linguistic legacies of famous people such as Douglas Fairbank's ironic, "I've never felt better," to Woodrow Wilson's courageous, "I am ready," and the betrayal expressed in Julius Caesar's "Et tu, Brute?" maintains a database of last words from cockpit recordings, transcripts, and air traffic control tapes. These are disturbing announcements of impeding doom, including: "Actually, these conditions don't look very good at all, do they?" through to an assortment of cuss words, and moving farewells like, "Amy, I love you."

Read the rest of this entry »

Comments off

The sparseness of linguistic data

Gary Marcus and Ernest Davis say in a New York Times piece on why we shouldn't buy all the hype about the Big Data revolution in science:

Big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like "in a row"). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of language.

To select an example more or less at random, a book review that the actor Rob Lowe recently wrote for this newspaper contained nine trigrams such as "dumbed-down escapist fare" that had never before appeared anywhere in all the petabytes of text indexed by Google. To witness the limitations that big data can have with novelty, Google-translate "dumbed-down escapist fare" into German and then back into English: out comes the incoherent "scaled-flight fare." That is a long way from what Mr. Lowe intended — and from big data's aspirations for translation.

Read the rest of this entry »

Comments off

The future of Chinese language learning is now

When I began learning Mandarin nearly half a century ago, I knew exactly how I wanted to acquire proficiency in the language.  Nobody had to tell me how to do this; I knew it instinctively.  The main features of my desired regimen would be to:

1. pay little or no attention to memorizing characters (I would have been content with actively mastering 25 or so very high frequency characters and passively recognizing at most a hundred or so high frequency characters during the first year)

2. focus on pronunciation, vocabulary, grammar, particles, morphology, syntax, idioms, patterns, constructions, sentence structure, rhythm, prosody, and so forth — real language, not the script

3. read massive amounts of texts in Romanization and, if possible later on (after about half a year when I had the basics of the language nailed down), in character texts that would be phonetically annotated

Read the rest of this entry »

Comments (40)

Swype and Voice Recognition for mobile device inputting

In late 2012, while visiting my son Tom in Dallas, I noticed that he was doing something very odd with his cell phone.  Most people enter text into their cell phone by pressing their thumbs (or their fingertip) on the letters of a small keyboard, whether virtual or actual.  But Tom was doing something altogether different:  he was sliding his finger over the glass surface of his phone and somehow, by so doing, he was able to enter text.  I was dumbfounded!  What amazed me most of all was how casual he was about it.  He'd be talking to me about something, then glance down at his cell phone, move his fingertip around on the glass, and — presto digito! — he'd have typed a message to someone and sent it off.

Read the rest of this entry »

Comments (42)

Stupid FBI threat scam email

I recently heard of another friend-of-a-friend case in which people were taken in by one of the false email help-I'm-stranded scams, and actually sent money overseas in what they thought was a rescue for a relative who had been mugged in Spain. People really do respond to these scam emails, and they lose money, bigtime. Today I received the first Nigerian spam I have seen in which I am (purportedly) threatened by the FBI and Patriot Act government if I don't get in touch and hand over personal details that will permit the FBI to release my $3,500,000.

I wish there was more that people with basic common sense could do to spread the word about scamming detection to those who are somewhat lacking in it. The best I have been able to do is to write occasional Language Log posts pointing out the almost unbelievable degree of grammatical and orthographic incompetence in most scam emails. Sure, everyone makes the odd spelling mistake (childrens' for children's and the like), but it is simply astonishing that literate people do not notice the implausibility of customs officials or bank officers or police employees being as inarticulate as the typical scam email.

The one I just received is almost beyond belief (though see my afterthought at the end of this post). The worst thing I can think of to do to the senders is to publish the message here on Language Log, to warn the unwary, and perhaps permit those who are interested to track the culprit down. I reproduce the full content of the message source below, with nothing expurgated except for the x-ing out of my email address and local server names. I mark in red font the major errors in grammar and punctuation, plus a few nonlinguistic suspicious features.

Read the rest of this entry »

Comments off

The language of phone numbers

What xkcd is getting at with the latest comic is about syntax and semantics. I'll show you the syntax below, but as far as meaning is concerned, the point is that cell phone numbers have almost no semantics. The area code part (the first three digits) used to function as a locational marker when phones were in fixed locations in houses, but since Americans not only tend to move every three years or so but they now take phone numbers with them, and cell phone universality only really began to pick up in America five to ten years ago, it really does tend to reflect a former abode. My cool son Calvin, for example, has a number which implies that he lives in Oakland, California; he doesn't, he does his video game programming in the Pacific North West.

And the rest of the number, the other seven digits? Space enough there for some real personal information, but it is not used. It functions merely as arbitrary material to distinguish one cell phone's location point in the information universe from all the others.

Read the rest of this entry »

Comments (49)

Noisily channeling Claude Shannon

There's a passage in James Gleick's "Auto Crrect Ths!", NYT 8/4/2012, that's properly spelled but in need of some content correction:

If you type “kofee” into a search box, Google would like to save a few milliseconds by guessing whether you’ve misspelled the caffeinated beverage or the former United Nations secretary-general. It uses a probabilistic algorithm with roots in work done at AT&T Bell Laboratories in the early 1990s. The probabilities are based on a “noisy channel” model, a fundamental concept of information theory. The model envisions a message source — an idealized user with clear intentions — passing through a noisy channel that introduces typos by omitting letters, reversing letters or inserting letters.

“We’re trying to find the most likely intended word, given the word that we see,” Mr. [Mark] Paskin says. “Coffee” is a fairly common word, so with the vast corpus of text the algorithm can assign it a far higher probability than “Kofi.” On the other hand, the data show that spelling “coffee” with a K is a relatively low-probability error. The algorithm combines these probabilities. It also learns from experience and gathers further clues from the context.

The same probabilistic model is powering advances in translation and speech recognition, comparable problems in artificial intelligence. In a way, to achieve anything like perfection in one of these areas would mean solving them all; it would require a complete model of human language. But perfection will surely be impossible. We’re individuals. We’re fickle; we make up words and acronyms on the fly, and sometimes we scarcely even know what we’re trying to say.

Read the rest of this entry »

Comments (7)