Archive for Language and technology

The second life of a Language Log comment

More than four years ago, on Aug. 23, 2010, Doctor Science left the following comment on a post by Mark Liberman, "Cell phone cupertinos":

I'm pretty sure I saw something several years ago about a whole dialect (argot? jargon? slang?) that had developed among young people in Japan (or possibly some other Asian country), based on phone cupertinos. Basically, they used the first suggestion from the autocomplete function *instead* of the original target word, to create an argot that was reasonably opaque to outsiders.

Now that comment has been brought back from the dead, appearing in two different articles about autocorrect.

Read the rest of this entry »

Comments (27)

The state of the machine translation art

I don't know any Hebrew. So when I recently saw a comment in Hebrew on a Google Plus page of discussion about Gaza tunnel-building that I was looking at, I clicked (with some forebodings) on the "Translate" link to see what it meant. What I got was this:

Some grazing has hurt they Stands citizens Susan Hammer year

This does not even offer enough of an inkling to permit me to guess at what the writer of the original Hebrew might have been saying. It might as well have said "Grill tree ecumenical the fox Shove sample Quentin Garage plastic."

Read the rest of this entry »

Comments off

Bonfire beneficiaries

Subeditor Humphrey Evans points out to me that the grammar of phishing spam emails is getting worse and worse, rather than better. He recently saw one that contained this text:

The sum of (6.5M Euros only will be transfer into your account after the processing of all relevant legal documents with your name as the bonfire beneficiary, the transfer will be made by Draft or telegraphic Transfer (T/T), conformable in 3 working days as soon as you apply to the bank director.

That "bonfire beneficiary" bit is an eyebrow-raiser, isn't it? It seems to be an error for the Latin phrase bona fide "good faith".

Read the rest of this entry »

Comments off

PangramTweets

The Twitter API, beyond its great utility for corpus linguistics (see "On the front lines of Twitter linguistics," "The he's and she's of Twitter"), has made possible a lot of fun automated text-mining projects. One fertile area is algorithmic found poetry: there have been Twitter bots designed to find accidental haikus, and even more impressively, a bot named @Pentametron that finds rhyming tweets in iambic pentameter and fashions sonnets out of them.

And then there is found wordplay, which is its own kind of found poetry. I'm a big fan of @Anagramatron, which discovers paired tweets that form serendipitous anagrams of each other. (Example: "Last time I do anything" ⇔ "That's it. I'm dying alone.") Now, courtesy of Jesse Sheidlower, comes @PangramTweets, in which each tweet contains every letter of the alphabet at least once.

Read the rest of this entry »

Comments (8)

The future of Chinese language learning is now

When I began learning Mandarin nearly half a century ago, I knew exactly how I wanted to acquire proficiency in the language.  Nobody had to tell me how to do this; I knew it instinctively.  The main features of my desired regimen would be to:

1. pay little or no attention to memorizing characters (I would have been content with actively mastering 25 or so very high frequency characters and passively recognizing at most a hundred or so high frequency characters during the first year)

2. focus on pronunciation, vocabulary, grammar, particles, morphology, syntax, idioms, patterns, constructions, sentence structure, rhythm, prosody, and so forth — real language, not the script

3. read massive amounts of texts in Romanization and, if possible later on (after about half a year when I had the basics of the language nailed down), in character texts that would be phonetically annotated

Read the rest of this entry »

Comments (40)

Emojify the Web: "the next phase of linguistic evolution"

Today's announcement from the Google Chrome team (yes, note the date):


Read the rest of this entry »

Comments (8)

Swype and Voice Recognition for mobile device inputting

In late 2012, while visiting my son Tom in Dallas, I noticed that he was doing something very odd with his cell phone.  Most people enter text into their cell phone by pressing their thumbs (or their fingertip) on the letters of a small keyboard, whether virtual or actual.  But Tom was doing something altogether different:  he was sliding his finger over the glass surface of his phone and somehow, by so doing, he was able to enter text.  I was dumbfounded!  What amazed me most of all was how casual he was about it.  He'd be talking to me about something, then glance down at his cell phone, move his fingertip around on the glass, and — presto digito! — he'd have typed a message to someone and sent it off.

Read the rest of this entry »

Comments (42)

A fair-use victory for Google in these United States

US Circuit Judge Denny Chin has ruled in favor of Google in its long-running copyright litigation with the Authors Guild over the scanning and digitization of books. Chin ruled that the Google Books project constitutes fair use because it is "highly transformative" and "provides significant public benefits." In explaining those public benefits, Chin cited the use of Google Books data for Ngram queries, and pointed to a research example that we've discussed several times on Language Log.

Read the rest of this entry »

Comments (29)

Stupid FBI threat scam email

I recently heard of another friend-of-a-friend case in which people were taken in by one of the false email help-I'm-stranded scams, and actually sent money overseas in what they thought was a rescue for a relative who had been mugged in Spain. People really do respond to these scam emails, and they lose money, bigtime. Today I received the first Nigerian spam I have seen in which I am (purportedly) threatened by the FBI and Patriot Act government if I don't get in touch and hand over personal details that will permit the FBI to release my $3,500,000.

I wish there was more that people with basic common sense could do to spread the word about scamming detection to those who are somewhat lacking in it. The best I have been able to do is to write occasional Language Log posts pointing out the almost unbelievable degree of grammatical and orthographic incompetence in most scam emails. Sure, everyone makes the odd spelling mistake (childrens' for children's and the like), but it is simply astonishing that literate people do not notice the implausibility of customs officials or bank officers or police employees being as inarticulate as the typical scam email.

The one I just received is almost beyond belief (though see my afterthought at the end of this post). The worst thing I can think of to do to the senders is to publish the message here on Language Log, to warn the unwary, and perhaps permit those who are interested to track the culprit down. I reproduce the full content of the message source below, with nothing expurgated except for the x-ing out of my email address and local server names. I mark in red font the major errors in grammar and punctuation, plus a few nonlinguistic suspicious features.

Read the rest of this entry »

Comments off

Garakei: Galapagos cell phone

Recently I've been hearing about a Japanese electronic device called a "garakei ガラケイ". Mystified by this katakana word, which I assumed to be at least partially the transcription of some foreign term, I set about trying to find out more about it.

It wasn't hard to discover (here and here) that the word basically means "Galapagos cell phone". What a strange name for a kind of cell phone!

Read the rest of this entry »

Comments (14)

More on Juola's stylometry

Worth reading if you were interested in the computational stylometric analysis by Patrick Juola that helped to unmask J. K. Rowling as the author of The Cuckoo's Calling: an article in The Chronicle of Higher Education about Juola's work.

Read the rest of this entry »

Comments off

Rowling and "Galbraith": an authorial analysis

The Sunday (UK) Times recently revealed that J.K. Rowling wrote the detective novel The Cuckoo's Calling under the pen name Robert Galbraith. The newspaper explained that, as part of their investigation, they sought the assistance of two scholars who have developed software to help with authorship attribution: Peter Millican of Oxford University and Patrick Juola of Duquesne University. Given the public interest in the Rowling revelation, I asked Patrick to write a guest post describing the authorial analysis that he conducted. (For more on the story, see my post on the Wall Street Journal's Speakeasy blog.)

Read the rest of this entry »

Comments (17)

Cupertinos in the spotlight

About seven years ago, in March 2006, I wrote a Language Log post about "the Cupertino effect," a term to describe spellchecker-aided "miscorrections" that might turn, say, Pakistan's Muttahida Quami Movement into the Muttonhead Quail Movement. It owes its name to European Union translators who had noticed the word cooperation getting replaced with Cupertino by a spellchecker that lacked the unhyphenated form of the word in its dictionary. Since then, I've had occasion to hold forth on the Cupertino effect in various venues (OUPblog, Der Spiegel, Radiolab, the New York Times, etc.). Now, Cupertinos are getting yet another flurry of publicity, thanks to a new book by the British tech writer Tom Chatfield called Netymology.

Read the rest of this entry »

Comments (8)