Archive for Information technology

Stupid FBI threat scam email

I recently heard of another friend-of-a-friend case in which people were taken in by one of the false email help-I'm-stranded scams, and actually sent money overseas in what they thought was a rescue for a relative who had been mugged in Spain. People really do respond to these scam emails, and they lose money, bigtime. Today I received the first Nigerian spam I have seen in which I am (purportedly) threatened by the FBI and Patriot Act government if I don't get in touch and hand over personal details that will permit the FBI to release my $3,500,000.

I wish there was more that people with basic common sense could do to spread the word about scamming detection to those who are somewhat lacking in it. The best I have been able to do is to write occasional Language Log posts pointing out the almost unbelievable degree of grammatical and orthographic incompetence in most scam emails. Sure, everyone makes the odd spelling mistake (childrens' for children's and the like), but it is simply astonishing that literate people do not notice the implausibility of customs officials or bank officers or police employees being as inarticulate as the typical scam email.

The one I just received is almost beyond belief (though see my afterthought at the end of this post). The worst thing I can think of to do to the senders is to publish the message here on Language Log, to warn the unwary, and perhaps permit those who are interested to track the culprit down. I reproduce the full content of the message source below, with nothing expurgated except for the x-ing out of my email address and local server names. I mark in red font the major errors in grammar and punctuation, plus a few nonlinguistic suspicious features.

Read the rest of this entry »

Comments off

The language of phone numbers

What xkcd is getting at with the latest comic is about syntax and semantics. I'll show you the syntax below, but as far as meaning is concerned, the point is that cell phone numbers have almost no semantics. The area code part (the first three digits) used to function as a locational marker when phones were in fixed locations in houses, but since Americans not only tend to move every three years or so but they now take phone numbers with them, and cell phone universality only really began to pick up in America five to ten years ago, it really does tend to reflect a former abode. My cool son Calvin, for example, has a number which implies that he lives in Oakland, California; he doesn't, he does his video game programming in the Pacific North West.

And the rest of the number, the other seven digits? Space enough there for some real personal information, but it is not used. It functions merely as arbitrary material to distinguish one cell phone's location point in the information universe from all the others.

Read the rest of this entry »

Comments (49)

Noisily channeling Claude Shannon

There's a passage in James Gleick's "Auto Crrect Ths!", NYT 8/4/2012, that's properly spelled but in need of some content correction:

If you type “kofee” into a search box, Google would like to save a few milliseconds by guessing whether you’ve misspelled the caffeinated beverage or the former United Nations secretary-general. It uses a probabilistic algorithm with roots in work done at AT&T Bell Laboratories in the early 1990s. The probabilities are based on a “noisy channel” model, a fundamental concept of information theory. The model envisions a message source — an idealized user with clear intentions — passing through a noisy channel that introduces typos by omitting letters, reversing letters or inserting letters.

“We’re trying to find the most likely intended word, given the word that we see,” Mr. [Mark] Paskin says. “Coffee” is a fairly common word, so with the vast corpus of text the algorithm can assign it a far higher probability than “Kofi.” On the other hand, the data show that spelling “coffee” with a K is a relatively low-probability error. The algorithm combines these probabilities. It also learns from experience and gathers further clues from the context.

The same probabilistic model is powering advances in translation and speech recognition, comparable problems in artificial intelligence. In a way, to achieve anything like perfection in one of these areas would mean solving them all; it would require a complete model of human language. But perfection will surely be impossible. We’re individuals. We’re fickle; we make up words and acronyms on the fly, and sometimes we scarcely even know what we’re trying to say.

Read the rest of this entry »

Comments (7)

Passport pickup by pinyin

Yesterday I went to the Beijing Public Security Bureau (Gōng'ān jú 公安局) to renew my visa.  While waiting in the main hall for my number to be called, I had ample time to walk around and familiarize myself with the operations there.  One thing in particular piqued my curiosity.  Namely, I saw four gray, metal cabinets full of hundreds of passports (three for Chinese, one for foreigners) waiting to be picked up.

I watched a clerk filing passports into the slots on the mechanized, revolving shelves inside the cabinets.  Wondering how the passports were arranged so that they could be readily retrieved when called for, I asked the supervisor how the passports were ordered on the shelves.  Her reply left me both startled and pleased.

Read the rest of this entry »

Comments (30)

The economics of Chinese character usage

Under the above rubric, my friend Apollo Wu sent around a note (copied below) about the economic impact of the use of Chinese characters in the operation of his business.  Since Apollo was for many years (from 1973 to 1998) a top translator in the Chinese Translation Service at United Nations headquarters in New York, he knows whereof he speaks.  Among other interesting tidbits that I heard from Apollo over the decades was that, of the official languages of the United Nations (Arabic, Mandarin Chinese, English, French, Russian, and Castilian Spanish) Chinese was by far the least efficient and most expensive to process.

Read the rest of this entry »

Comments (21)

Password strength

We neglected to mention this while the relevant cartoon was the current one at xkcd, but a couple of days ago there was a nice analysis of why through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember but easy for computers to guess. Check it out. The observation seems correct: if you try it out on one of the web interfaces that assess the strength of your password as you choose it, you'll find that a word with a few letters replaced by miscellaneous digits and so on, like Ne8r@$k@, gets high marks but grizzle snip grunt mackerel doesn't (and probably won't be accepted beyond the first 8 to 12 characters). Yet if you mutter "grizzle snip grunt mackerel" under your breath once, you'll find you remember it all day, even without using it. And length is your main security. The example the cartoon gives contrasts a 3-day brute-force cracking time (for about 28 bits of entropy) with a 550-year time (for about 44).

[Comments are closed unless you have a password. If you have forgotten your password, click here.]

Comments off

Edinburgh, Taiwan (Province of China)

I got a royalty check from Chicago today, and I stared in astonishment at the home address on the payment advice. It was roughly correct in the first four lines, but the last line, after "EDINBURGH EH3 6RY", where the country name "United Kingdom" should have come, said "TAIWAN, PROVINCE OF CHINA".

Read the rest of this entry »

Comments off

One apostrophe short of a good hoax

The LulzSec hackers who broke into the computer systems of The Sun by exploiting a weakness in a mailback page on an outdated Solaris server really can program; they would never expect a script to work with a misspelled variable name, or a closing single quote omitted. But spell English correctly? They couldn't even write a simple four-word headline without a tell-tale error:

They meant media mogul's body. A nice spoof front page ruined by a failure to recall that genitive singular nouns are spelled with ’s in English. The curse of the forgotten letter strikes again.

Comments off

Cursive and Characters: Dying Arts

In "The Case for Cursive," (NYT [April 28, 2011]), Katie Zezima states that:

For centuries, cursive handwriting has been an art. To a growing number of young people, it is a mystery.

The sinuous letters of the cursive alphabet, swirled on countless love letters, credit card slips and banners above elementary school chalk boards are going the way of the quill and inkwell. With computer keyboards and smartphones increasingly occupying young fingers, the gradual death of the fancier ABC’s is revealing some unforeseen challenges.

Read the rest of this entry »

Comments (107)