Language Log

Archive for Language and technology

A record-setting pangrammatic window

October 3, 2014 @ 11:18 pm· Filed by Ben Zimmer under Language and technology, Language on the internets, Language play

A few months ago, I posted here (and on Slate's Lexicon Valley blog) about PangramTweets, a bot created by Jesse Sheidlower that combs Twitter for tweets that include all 26 letters of the alphabet. I mentioned that it would be interesting to see if PangramTweets turns up any particularly short "pangrammatic windows," i.e., pangrammatic strings in naturally occurring text. At the time, the shortest known example was 42 letters long, in a passage from Piers Anthony's Cube Route:

"We are all from Xanth," Cube said quickly. "Just visiting Phaze. We just want to find the dragon."

My post inspired Malcolm Rowe, a software engineer at Google, to set about finding short pangrammatic windows in an automated fashion, first on the Project Gutenberg corpus and then on the megacorpus of web pages indexed by Google. (Let's hear it for Google's 20 percent time!) On his blog, Malcolm now reports on his findings, including the discovery of a 36-letter pangrammatic window that appeared in a review of the movie Magnolia on PopMatters:

Further, fractal geometries are replicated on a human level in the production of certain “types” of subjectivity: for example, aging kid quiz show whiz Donnie Smith (William H. Macy) and up and coming kid quiz show whiz Stanley Spector (Jeremy Blackman) are connected (or, perhaps, being cloned) in ways they couldn’t possibly imagine.

Read the rest of this entry »

Permalink Comments (14)

The second life of a Language Log comment

September 8, 2014 @ 3:00 pm· Filed by Ben Zimmer under Language and technology, Writing systems

More than four years ago, on Aug. 23, 2010, Doctor Science left the following comment on a post by Mark Liberman, "Cell phone cupertinos":

I'm pretty sure I saw something several years ago about a whole dialect (argot? jargon? slang?) that had developed among young people in Japan (or possibly some other Asian country), based on phone cupertinos. Basically, they used the first suggestion from the autocomplete function *instead* of the original target word, to create an argot that was reasonably opaque to outsiders.

Now that comment has been brought back from the dead, appearing in two different articles about autocorrect.

Read the rest of this entry »

Permalink Comments (27)

The state of the machine translation art

July 31, 2014 @ 5:17 am· Filed by Geoffrey K. Pullum under Ignorance of linguistics, Language and technology, Translation

I don't know any Hebrew. So when I recently saw a comment in Hebrew on a Google Plus page of discussion about Gaza tunnel-building that I was looking at, I clicked (with some forebodings) on the "Translate" link to see what it meant. What I got was this:

Some grazing has hurt they Stands citizens Susan Hammer year

This does not even offer enough of an inkling to permit me to guess at what the writer of the original Hebrew might have been saying. It might as well have said "Grill tree ecumenical the fox Shove sample Quentin Garage plastic."

Read the rest of this entry »

Permalink Comments off

Bonfire beneficiaries

May 22, 2014 @ 2:49 pm· Filed by Geoffrey K. Pullum under Language and technology, Language on the internets

Subeditor Humphrey Evans points out to me that the grammar of phishing spam emails is getting worse and worse, rather than better. He recently saw one that contained this text:

The sum of (6.5M Euros only will be transfer into your account after the processing of all relevant legal documents with your name as the bonfire beneficiary, the transfer will be made by Draft or telegraphic Transfer (T/T), conformable in 3 working days as soon as you apply to the bank director.

That "bonfire beneficiary" bit is an eyebrow-raiser, isn't it? It seems to be an error for the Latin phrase bona fide "good faith".

Read the rest of this entry »

Permalink Comments off

PangramTweets

May 19, 2014 @ 5:07 pm· Filed by Ben Zimmer under Language and technology, Language on the internets, Language play

The Twitter API, beyond its great utility for corpus linguistics (see "On the front lines of Twitter linguistics," "The he's and she's of Twitter"), has made possible a lot of fun automated text-mining projects. One fertile area is algorithmic found poetry: there have been Twitter bots designed to find accidental haikus, and even more impressively, a bot named @Pentametron that finds rhyming tweets in iambic pentameter and fashions sonnets out of them.

And then there is found wordplay, which is its own kind of found poetry. I'm a big fan of @Anagramatron, which discovers paired tweets that form serendipitous anagrams of each other. (Example: "Last time I do anything" ⇔ "That's it. I'm dying alone.") Now, courtesy of Jesse Sheidlower, comes @PangramTweets, in which each tweet contains every letter of the alphabet at least once.

Read the rest of this entry »

Permalink Comments (8)

The future of Chinese language learning is now

April 5, 2014 @ 3:14 pm· Filed by Victor Mair under Dictionaries, Information technology, Language acquisition, Language and computers, Language and education, Language and technology, Language teaching and learning, Pedagogy

When I began learning Mandarin nearly half a century ago, I knew exactly how I wanted to acquire proficiency in the language. Nobody had to tell me how to do this; I knew it instinctively. The main features of my desired regimen would be to:

1. pay little or no attention to memorizing characters (I would have been content with actively mastering 25 or so very high frequency characters and passively recognizing at most a hundred or so high frequency characters during the first year)

2. focus on pronunciation, vocabulary, grammar, particles, morphology, syntax, idioms, patterns, constructions, sentence structure, rhythm, prosody, and so forth — real language, not the script

3. read massive amounts of texts in Romanization and, if possible later on (after about half a year when I had the basics of the language nailed down), in character texts that would be phonetically annotated

Read the rest of this entry »

Permalink Comments (40)

Emojify the Web: "the next phase of linguistic evolution"

April 1, 2014 @ 12:17 pm· Filed by Ben Zimmer under Ideography, Language and technology, Pragmatics, Silliness, Writing systems

Today's announcement from the Google Chrome team (yes, note the date):

Read the rest of this entry »

Permalink Comments (8)

Swype and Voice Recognition for mobile device inputting

January 22, 2014 @ 2:14 pm· Filed by Victor Mair under Information technology, Language and computers, Language and technology, Speech technology, Writing systems

In late 2012, while visiting my son Tom in Dallas, I noticed that he was doing something very odd with his cell phone. Most people enter text into their cell phone by pressing their thumbs (or their fingertip) on the letters of a small keyboard, whether virtual or actual. But Tom was doing something altogether different: he was sliding his finger over the glass surface of his phone and somehow, by so doing, he was able to enter text. I was dumbfounded! What amazed me most of all was how casual he was about it. He'd be talking to me about something, then glance down at his cell phone, move his fingertip around on the glass, and — presto digito! — he'd have typed a message to someone and sent it off.

Read the rest of this entry »

Permalink Comments (42)

A fair-use victory for Google in these United States

November 14, 2013 @ 11:56 am· Filed by Ben Zimmer under Language and computers, Language and technology, Language and the law

US Circuit Judge Denny Chin has ruled in favor of Google in its long-running copyright litigation with the Authors Guild over the scanning and digitization of books. Chin ruled that the Google Books project constitutes fair use because it is "highly transformative" and "provides significant public benefits." In explaining those public benefits, Chin cited the use of Google Books data for Ngram queries, and pointed to a research example that we've discussed several times on Language Log.

Read the rest of this entry »

Permalink Comments (29)

Stupid FBI threat scam email

August 19, 2013 @ 4:30 am· Filed by Geoffrey K. Pullum under Errors, Information technology, Language and technology, Literacy, Logic, Morphology, Punctuation, Spelling, Syntax, Writing

I recently heard of another friend-of-a-friend case in which people were taken in by one of the false email help-I'm-stranded scams, and actually sent money overseas in what they thought was a rescue for a relative who had been mugged in Spain. People really do respond to these scam emails, and they lose money, bigtime. Today I received the first Nigerian spam I have seen in which I am (purportedly) threatened by the FBI and Patriot Act government if I don't get in touch and hand over personal details that will permit the FBI to release my $3,500,000.

I wish there was more that people with basic common sense could do to spread the word about scamming detection to those who are somewhat lacking in it. The best I have been able to do is to write occasional Language Log posts pointing out the almost unbelievable degree of grammatical and orthographic incompetence in most scam emails. Sure, everyone makes the odd spelling mistake (childrens' for children's and the like), but it is simply astonishing that literate people do not notice the implausibility of customs officials or bank officers or police employees being as inarticulate as the typical scam email.

The one I just received is almost beyond belief (though see my afterthought at the end of this post). The worst thing I can think of to do to the senders is to publish the message here on Language Log, to warn the unwary, and perhaps permit those who are interested to track the culprit down. I reproduce the full content of the message source below, with nothing expurgated except for the x-ing out of my email address and local server names. I mark in red font the major errors in grammar and punctuation, plus a few nonlinguistic suspicious features.

Read the rest of this entry »

Permalink Comments off

Garakei: Galapagos cell phone

August 9, 2013 @ 8:49 pm· Filed by Victor Mair under Borrowing, Language and technology, Words words words

Recently I've been hearing about a Japanese electronic device called a "garakei ガラケイ". Mystified by this katakana word, which I assumed to be at least partially the transcription of some foreign term, I set about trying to find out more about it.

It wasn't hard to discover (here and here) that the word basically means "Galapagos cell phone". What a strange name for a kind of cell phone!

Read the rest of this entry »

Permalink Comments (14)

More on Juola's stylometry

July 29, 2013 @ 6:22 am· Filed by Geoffrey K. Pullum under Computational linguistics, Language and technology, Style and register, Writing

Worth reading if you were interested in the computational stylometric analysis by Patrick Juola that helped to unmask J. K. Rowling as the author of The Cuckoo's Calling: an article in The Chronicle of Higher Education about Juola's work.

Read the rest of this entry »

Permalink Comments off

Rowling and "Galbraith": an authorial analysis

July 16, 2013 @ 7:35 am· Filed by Ben Zimmer under Computational linguistics, Language and technology, Linguistics in the news

The Sunday (UK) Times recently revealed that J.K. Rowling wrote the detective novel The Cuckoo's Calling under the pen name Robert Galbraith. The newspaper explained that, as part of their investigation, they sought the assistance of two scholars who have developed software to help with authorship attribution: Peter Millican of Oxford University and Patrick Juola of Duquesne University. Given the public interest in the Rowling revelation, I asked Patrick to write a guest post describing the authorial analysis that he conducted. (For more on the story, see my post on the Wall Street Journal's Speakeasy blog.)

Read the rest of this entry »

Permalink Comments (17)

« Previous Page — « Previous Entries

Next Entries » — Next Page »

Archive for Language and technology

A record-setting pangrammatic window

The second life of a Language Log comment

The state of the machine translation art

Bonfire beneficiaries

PangramTweets

The future of Chinese language learning is now

Emojify the Web: "the next phase of linguistic evolution"

Swype and Voice Recognition for mobile device inputting

A fair-use victory for Google in these United States

Stupid FBI threat scam email

Garakei: Galapagos cell phone

More on Juola's stylometry

Rowling and "Galbraith": an authorial analysis

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta