Archive for Language and technology

Comprehend this!

Perhaps the most illiterate phishing spam yet: ignoring the incompetence of having Velez Restrepo as the sender, jg_van88 (at a Chinese address) as the reply-to, and Mr(.) John Galvan as the alleged sender, with the X-Accept-Language set to Spanish, this message has at least 20 linguistic errors in the text, which is roughly one for each four words.

From gvelez@une.net.co
Wed Dec 15 11:11:57 2010
Date: Wed, 15 Dec 2010 03:11:43 -0800
From: velez restrepo guillermo <gvelez@une.net.co>
Subject: Comprehend This Proposal
Bcc:
Reply-to: jg_van88@w.cn
X-Mailer: Sun Java(tm) System Messenger Express 7.3-11.01 64bit (built Sep 1 2009)
X-Accept-Language: es
Priority: normal

Good day,

I am Mr John Galvan a staff of a private offshore AIG Private bank united kingdom.

I have a great proposal that we interest and benefit you, this proposal of mine is worth of £15,500,000.00 Million Pounds.I intend to give Four thy Percent of the total funds as compensation for your assistance. I will notify you on the full transaction on receipt of your response if interested, and I shall send you the details.

Kind Regards,
Mr. John Galvan

Read the rest of this entry »

Comments (72)

Enforced francophony from Microsoft

Microsoft Word has really done it to me this time. I need some expert help, Language Log readers. I have a perfectly ordinary file (a simple letter template showing my home address), created in Word on an American Macintosh Powerbook using an American-purchased copy of Word, and when I open it as a copy on my UK-purchased MacBook Pro (though not when I open it as the original) almost everything works except that the file is deranged, and thinks it is supposed to be in French.

Editing the file provokes enforcement of French spacing conventions (colons and semicolons are preceded by an extra inserted space that I do not type); the double quotation symbols (‘‘like this’’) appear as those funny French marks that look a bit like pairs of less-thans and greater-thans (sort of <<like this>>); and, weirdest of all, the spelling and checking of "grammaire et style" turn into French. Word works through the file checking every significant English word and rejecting it for insufficient francophonicity (with no suggestions for respelling), underlining them all in red, though most French words are accepted. The grammar check not only assumes that French is being checked but also reports its results and queries in French. Saving the file preserves the pseudo-Frenchness.

Read the rest of this entry »

Comments (89)

The protective bloom of ignorance

I have often stressed the point to my students: it is not your ignorance that interferes with your education in this subject; it's the very opposite. It's the fact that you are a highly intelligent human being and you know many things deeply and thoroughly that can prevent your learning. Of the things I teach, it is in phonetics that this comes out most vividly: the reason you can't learn to hear and produce the difference between Hindi dental [t] and retroflex [&#x0288], I tell them, is not that you are no good at this practical phonetics stuff, but that you have had twenty years of training in ignoring this contrast (so as to become an expert speaker of English or some other language), and you have done brilliantly at it. Well, there was an echo of the same line that popped up today in some news about the phishing industry. Dr Emily Finch, a University of Surrey criminologist, said:

The general public is more internet security-aware than it was five years ago. Malicious anti-virus scams are an indication that criminals are now tapping into this.

Rather than exploiting our ignorance – the basic premise of common scams such as phishing – they are actively using our knowledge and fear of online threats to their advantage.

Read the rest of this entry »

Comments off

Is "Character Amnesia" Here to Stay?

A little over a month ago, I wrote a blog about what I called "Character Amnesia." Today, half a dozen readers have called my attention to an Aug. 25th article by Judith Evans for Agence France-Presse entitled "Wired youth forget how to write in China and Japan" (and other titles) that refers to "character amnesia" and quotes from an interview with me on August 9.  The article is also being sent around on Facebook and other sharing services, so it is getting a lot of coverage.  I cannot guarantee that I coined the expression "character amnesia," but it does seem to be meeting a need.

Read the rest of this entry »

Comments (13)

وزارة-الأتصالات.مصر leads the non-Latin charge

The first Internet domain names using non-Latin characters are being rolled out, a plan put into motion after approval from the Internet Corporation for Assigned Names and Numbers (ICANN). Arabic-speaking nations are the first to reap the orthographic benefits, with new country codes available for Egypt (مصر), Saudi Arabia (السعودية), and the United Arab Emirates (امارات). The Egyptian Ministry of Communications and Information Technology, previously online at <http://www.mcit.gov.eg/>, is blazing the trail with its new URL:

<وزارة-الأتصالات.مصر>

Not everything is fully worked out with the new system, though. Browsers that aren't caught up to speed on the non-Latin domain names will see the addresses rendered as Latinized gobbledygook. The Egyptian Communication Ministry's Arabic-script URL, for instance, currently resolves to <http://xn—-rmckbbajlc6dj7bxne2c.xn--wgbh1c/>. That's not very communicative.

[Update: See the very helpful comments below for an explanation of the Latinized encoding.]

Comments (20)

Beowulf Burlington forever

Six of us — three philosophers, two linguists, and a mathematician — were having dinner the Café Noir in Providence last Thursday night, and when three of us decided on the excellent boeuf bourguignon, someone at the table told a story of a colleague who tried to include the phrase boeuf bourguignon in a word-processed file and found that the spell-checker recommended correcting the spelling to Beowulf Burlington.

Read the rest of this entry »

Comments (15)

Don't send me passwords

Keith Allan has bravely outed himself as editor of the journal from which I recently received a thoroughly discourteous message sequence. I thank him for responding to the discussion, and for confirming that it was not about him pressing the buttons in the wrong order. The reason his fine journal (the Australian Journal of Linguistics) sent me a message sequence I found annoying and presumptuous is the design of the stupid ScholarOne Manuscript software. Let me explain a little more about the nature of my life (perhaps my experiences will find an echo in yours), the part that involves those arbitrary strings of letters and digits we are all supposed to carry around in our heads like mental sets of keys.

Read the rest of this entry »

Comments (38)

Stupid message sequencing discourtesy

Picture this: that you receive two unexpected emails from me in quick succession. The first is a boilerplate pre-packaged message informing you that I have entered your address on my website as my temporary address for two or three days later this month, and I have let my employers know that people can call me or fax me at your house. I'm a complete stranger to you, except that you know my name from Language Log; I have obtained your email address from public sources, and pre-emptively set up arrangements to that assume I'll be staying with you.

The second of the two emails is personally addressed, and says that I'll be in your area later this month to give a lecture, and since I'm on a tight budget, would it be all right if I came to stay for two nights?

I take it you'd be somewhere between insulted and shocked, despite the fact that it is sort of flattering that a famous Language Log writer has singled you out as a person he would like to stay with. Well the equivalent not only happened to me today; it happens to me every couple of months.

Read the rest of this entry »

Comments (52)

The sliced raw fish shoes it wishes

The crash-blossom-y headline that Geoff Pullum just posted about, "Google's Computer Might Betters Translation Tool," has been changed in the online edition of The New York Times to something more sensible: "Google’s Computing Power Refines Translation Tool." The headline in the print edition, says LexisNexis, is "Google Can Now Say No to 'Raw Fish Shoes,' in 52 Languages." This is a typical example of the gap between oblique print headlines and their more straightforward online equivalents designed with search engines in mind. (See the April 2006 Times article, "This Boring Headline Is Written for Google.")

Read the rest of this entry »

Comments (36)

So many languages, so much technology…

Suppose you had 100 digital recorders and 800 small languages, all in a country the size of California, but in one of the remotest parts of the planet.  What would you do?  What would it take to identify and train a small army of language workers?  How could the recordings they collect be accessible to people who don't speak the language?  My answer to this question is linked below – but spend a moment thinking how you might do this before looking.  One inspiration for this work was Mark Liberman's talk The problems of scale in language documentation at the Texas Linguistics Society meeting in 2006, in a workshop on Computational Linguistics for Less-Studied Languages.  Another inspiration was observing the enthusiasm of the remaining speakers of the Usarufa language to maintain their language (see this earlier post).  About 9 months ago, I decided to ask Olympus if they would give me 100 of their latest model digital voice recorders.  They did, and the BOLD:PNG Project starts next week.  Please sign the guestbook on that site, or post a comment here, if you'd like to encourage the speakers of these languages who are getting involved in this new project.

Comments (13)

Sarcasm punctuation mark sure to succeed:-!

Via John Gruber at Daring Fireball, I've learned that a company called Sarcasm, Inc., is marketing a "Sarcasm punctuation mark" called SarcMark, which people are supposed to use to "emphasize a sarcastic phrase, sentence or message". John Gruber's pitch-perfect assessment:

What a great idea. I'm sure it'll be a huge hit.

Read the rest of this entry »

Comments (40)

Jingle bells, pedophile

Top story of the morning in the UK for the serious language scientist must surely be the report in The Sun concerning a children's toy mouse that is supposed to sing "Jingle bells, jingle bells" but instead sings "Pedophile, pedophile". Said one appalled mother who squeezed the mouse, "Luckily my children are too young to understand." The distributors, a company called Humatt, of Ferndown in Dorset, claims that the man in China who recorded the voice for the toy "could not pronounce certain sounds." And the singing that he recorded "was then speeded up to make it higher-pitched — distorting the result further." (A good MP3 of the result can be found here.) They have recalled the toy.

Shocked listeners to BBC Radio 4 this morning heard the presenters read this story out while collapsing with laughter. Language Log is not amused. If there was ever a more serious confluence of issues in speech technology, the Chinese language, freedom of speech, taboo language, and the protection of children, I don't know when.

Read the rest of this entry »

Comments (81)

Happy Web Day!

In my latest Word Routes column on the Visual Thesaurus, I consider the enormous linguistic impact of an internal memorandum published at the European Organization for Nuclear Research (CERN) on November 12, 1990. The memo, by Tim Berners-Lee and Robert Cailliau, was entitled "WorldWideWeb: Proposal for a HyperText Project," and needless to say, we've all been webified ever since. Read all about it here.

Comments (18)