Language Log

Archive for Information technology

Kindly do the needful

August 31, 2016 @ 1:53 am· Filed by Geoffrey K. Pullum under Errors, Grammar, Humor, Information technology, Language and business, Language and computers, Language and technology

A phishing spam I received today from "Europe Trade" (it claims to be in Wisconsin but its address domain is in Belarus) said this:

Good Day sir/madam,

I am forwarding the attached document to you as instructed for confirmation,

Please kindly do the needful and revert

Best regards
Sarah Griffith

There were two attachments, allegedly called "BL-document.pdf" and "Invoice.pdf"; they were identical. Their icons said they were PDF files of size 21KB (everyone trusts PDF), but viewing them in Outlook caused Word Online to open them, whereupon they claimed to be password-protected PDF files of a different size, 635KB. However, the link I was supposed to click to open them actually led to a misleadingly named HTML file, which doubtless would have sucked me down to hell or sent all my savings to Belarus or whatever. I don't know what you would have done (some folks are more gullible than others), but I decided I would not kindly do the needful, or even revert. Sorry, Sarah.

Read the rest of this entry »

Permalink Comments off

Clueless Microsoft language processing

August 30, 2016 @ 2:29 pm· Filed by Geoffrey K. Pullum under Computational linguistics, Dialects, Grammar, Information technology, Language and computers, Language and technology, Links, Logic, Semantics, Silliness, Spelling, Syntax

A rather poetic and imaginative abstract I received in my email this morning (it's about a talk on computational aids for composers), contains the following sentence:

We will metaphorically drop in on Wolfgang composing at home in the morning, at an orchestra rehearsal in the afternoon, and find him unwinding in the evening playing a spot of the new game Piano Hero which is (in my fictional narrative) all the rage in the Viennese coffee shops.

There's nothing wrong with the sentence. What makes me bring it to your notice is the extraordinary modification that my Microsoft mail system performed on it. I wonder if you can see the part of the message that it felt it should mess with, in a vain and unwanted effort at helping me do my job more efficiently?

Read the rest of this entry »

Permalink Comments off

Spamferences thrive; junk journals prosper

July 26, 2016 @ 3:28 pm· Filed by Geoffrey K. Pullum under Information technology, Language and computers, Language on the internets, The academic scene

I was recently moved (screaming and struggling, as four strong men held me down by my arms and legs) to a new web-based university email system designed and run by Microsoft: Office 365. Naturally, it's ill-designed slow-loading crap, burdened by misfeatures and pointless pop-ups that I do not want popping up, and it fails to allow various elementary operations that I often need (every upgrade is a downgrade). But that is not my topic today. I want to note one special sad consequence of moving to an entirely new system: all my previous email system's Bayesian machine learning about spam classification has been lost. The Office 365 system has had hardly any data to learn from as yet, so I am seeing some of the stuff that would have been coming to me all along if it had not been caught by machine learning and dumped in the spam bin. And what has truly amazed me is the daily flow of advertising for spamferences and junk journals.

Read the rest of this entry »

Permalink Comments off

AI for youth: success and failure

March 25, 2016 @ 9:18 pm· Filed by Victor Mair under Information technology, Language and computers

Success: Xiaoice is a Microsoft chatbot program that has become popular in China. Her name is written in various ways:

"Xiaoice" 42,400 ghits (that's pronounced "xiǎo ice")
"小冰" 362,000 ghits (that's pronounced "xiǎo bīng")
"小ice" 11,200 ghits (that's pronounced "xiǎo ice")
"Little Bing" 16,000 ghits (she's obviously named after Microsoft's search engine*)
"Little Ice" for the chatbot doesn't work, because that's the name of Ice-T's son.

Not all of these ghits are to the Chinese chatbot program; some are for Facebook and Twitter monikers, etc., but most do refer to the Microsoft chatbot.

Read the rest of this entry »

Permalink Comments (10)

More on Chinese telegraph codes

February 5, 2016 @ 3:11 pm· Filed by Victor Mair under Information technology, Writing systems

John McVey was rooting around in Language Log for recent posts about telegraphic codes, and stumbled upon this:

"Chinese Telegraph Code (CTC)" (5/24/15)

What we learned there is that the CTC consists of 10,000 numbers arbitrarily assigned to the same amount of characters, one number per character.

Read the rest of this entry »

Permalink Comments off

The ultimate Chinese character input method

December 27, 2015 @ 11:38 pm· Filed by Victor Mair under Information technology, Writing systems

Never mind that it doesn't work, this is the supreme pipe dream for inputting Chinese characters on electronic communication and information processing devices. Of the many thousands of Chinese character inputting systems (see also here and here) that have been devised, some work fairly well and some barely function at all, but this one has to take the cake for being the most ridiculous of all. It is all the more preposterous that initially it was intended for smartwatches with their tiny glass surfaces.

The name of the system gives it away, that is, yībǐyīzì 一筆一字 ("one stroke one character").

Read the rest of this entry »

Permalink Comments (12)

Push-to-talk

October 31, 2015 @ 12:41 am· Filed by Victor Mair under Information technology, Language and computers

Here's another eye-opening article from Quartz:

"Stop texting right now and learn from the Chinese: there’s a better way to message" (7/02/15) by Josh Horwitz.

I missed the article when it came out back in July, and even now wouldn't have known about this new fad that is sweeping China if Kyle Wilcox hadn't called it to my attention.

What the article describes is the craze for sending short audio clips instead of text messages.

Read the rest of this entry »

Permalink Comments (27)

Chinese Telegraph Code (CTC)

May 24, 2015 @ 8:36 pm· Filed by Victor Mair under Changing times, Information technology, Language and computers, Writing systems

Michael Rank has an interesting article on Scribd entitled "Chinese telegram, 1978" (5/22/2015).

It's about a 1978 telegram that he bought on eBay. Here's a photograph:

Read the rest of this entry »

Permalink Comments (30)

Paperless reading

April 12, 2015 @ 12:02 pm· Filed by Victor Mair under Dictionaries, Information technology, Language acquisition, Language and computers, Language and education, Language and technology, Language teaching and learning, Pedagogy

Just a little over a year ago, I made the following post:

"The future of Chinese language learning is now" (4/5/14)

The second half of that post consisted of an account of a lecture that David Moser (of Beijing Capital Normal University and Academic Director of Chinese Studies at CET Beijing) had delivered a few days earlier (on 4/1/14) at Penn: "Is Character Writing Still a Basic Skill? The New Digital Chinese Tools and their Implications for Chinese Learning".

Read the rest of this entry »

Permalink Comments (1)

Error-laden phishing attempts

January 26, 2015 @ 9:30 am· Filed by Geoffrey K. Pullum under Errors, Information technology, Language and technology, Punctuation, Writing

Phishers trawling for email account names are generally smart enough to pull all sorts of programming tricks, forging headers and obtaining lists of spammable addresses and setting up arrangements to capture login names and passwords obediently typed in by the gullible; but then they give themselves away with errors of grammar and punctuation that are just too gross to be perpetrated by the authorized guys at the communications and technology services unit.

I received a phishing spam today that had no To-line at all (none of that "undisclosed recipients" stuff, and no mention of my email address in it anywhere). It looked sort of convincing in its announcement that webmail account holders would have to take certain steps to ensure the preservation of their address books after being "upgraded to a new enhanced Outlook interface". (My own university has, tragically, been induced to do an upgrade of this kind to its employee email services.) But the linguistic errors in the message begin with the 13th character in the From line (that second comma is wrong). I reproduce below the raw text of what I received, stripping out only the locally generated receipt and spam-checking headers (and by the way, this message—spam though it is—succeeded in getting a spam score of 0).

Read the rest of this entry »

Permalink Comments off

Famous last words

April 7, 2014 @ 11:25 am· Filed by Geoffrey K. Pullum under Information technology, Language and the media, Literacy, Transcription

Guest post by Karen Stollznow

In recent weeks we've been following the tragedy and mystery of the Malaysia Airlines flight 370 that vanished on March 8 with 239 people on board. Less than an hour after taking off from Kuala Lumpur en route to Beijing all communication was cut off. The plane diverted unexpectedly across the Indian Ocean and disappeared from civilian air traffic control screens. There has been much controversy surrounding the transcript of the last incoming transmission between the air traffic controller and the cockpit of the ill-fated flight.

We tend to have a morbid fascination with people's last words. We assign profound meaning and philosophical insights to the final words uttered by those who face their fate ahead of us. There are numerous books and websites that chronicle the linguistic legacies of famous people such as Douglas Fairbank's ironic, "I've never felt better," to Woodrow Wilson's courageous, "I am ready," and the betrayal expressed in Julius Caesar's "Et tu, Brute?" Planecrashinfo.com maintains a database of last words from cockpit recordings, transcripts, and air traffic control tapes. These are disturbing announcements of impeding doom, including: "Actually, these conditions don't look very good at all, do they?" through to an assortment of cuss words, and moving farewells like, "Amy, I love you."

Read the rest of this entry »

Permalink Comments off

The sparseness of linguistic data

April 7, 2014 @ 4:42 am· Filed by Geoffrey K. Pullum under Changing times, Grammar, Information technology, Language and computers, Lost in translation, Research tools, Resources

Gary Marcus and Ernest Davis say in a New York Times piece on why we shouldn't buy all the hype about the Big Data revolution in science:

Big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like "in a row"). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of language.

To select an example more or less at random, a book review that the actor Rob Lowe recently wrote for this newspaper contained nine trigrams such as "dumbed-down escapist fare" that had never before appeared anywhere in all the petabytes of text indexed by Google. To witness the limitations that big data can have with novelty, Google-translate "dumbed-down escapist fare" into German and then back into English: out comes the incoherent "scaled-flight fare." That is a long way from what Mr. Lowe intended — and from big data's aspirations for translation.

Read the rest of this entry »

Permalink Comments off

The future of Chinese language learning is now

April 5, 2014 @ 3:14 pm· Filed by Victor Mair under Dictionaries, Information technology, Language acquisition, Language and computers, Language and education, Language and technology, Language teaching and learning, Pedagogy

When I began learning Mandarin nearly half a century ago, I knew exactly how I wanted to acquire proficiency in the language. Nobody had to tell me how to do this; I knew it instinctively. The main features of my desired regimen would be to:

1. pay little or no attention to memorizing characters (I would have been content with actively mastering 25 or so very high frequency characters and passively recognizing at most a hundred or so high frequency characters during the first year)

2. focus on pronunciation, vocabulary, grammar, particles, morphology, syntax, idioms, patterns, constructions, sentence structure, rhythm, prosody, and so forth — real language, not the script

3. read massive amounts of texts in Romanization and, if possible later on (after about half a year when I had the basics of the language nailed down), in character texts that would be phonetically annotated

Read the rest of this entry »

Permalink Comments (40)

« Previous Page — « Previous Entries

Next Entries » — Next Page »

Archive for Information technology

Kindly do the needful

Clueless Microsoft language processing

Spamferences thrive; junk journals prosper

AI for youth: success and failure

More on Chinese telegraph codes

The ultimate Chinese character input method

Push-to-talk

Chinese Telegraph Code (CTC)

Paperless reading

Error-laden phishing attempts

Famous last words

The sparseness of linguistic data

The future of Chinese language learning is now

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta