Archive for Language and technology

Paperless reading

Just a little over a year ago, I made the following post:

"The future of Chinese language learning is now"  (4/5/14)

The second half of that post consisted of an account of a lecture that David Moser (of Beijing Capital Normal University and Academic Director of Chinese Studies at CET Beijing) had delivered a few days earlier (on 4/1/14) at Penn:  "Is Character Writing Still a Basic Skill?  The New Digital Chinese Tools and their Implications for Chinese Learning".

Read the rest of this entry »

Comments (1)

A succor born every minute

Great news (if you're a pompous idiot)! There is news from the UK's Daily Mail of an app that will ruin your SMS messages and make you sound like someone who went through a matter transporter with a thesaurus!

So in case you should want to completely wreck your chances of ever getting another date with anyone normal, the Mail's screenshots show that the app will replace "Hey!" in your texts by "Salutations!"; it will replace "help me with my homework" by "succor me with my homework"; "smart girl" will be changed to "luminous girl"; "meet at my place" will become "meet at my residence"; "sounds good" will come out as "sounds euphonic"; and "have a good time" will morph into "have a congenial time".

Read the rest of this entry »

Comments off

Smartisan T1

Video for a new Chinese electronic watch, submitted by Stephen Hart:


Read the rest of this entry »

Comments (23)

Autocomplete strikes again

I think I know how an unsuitable but immensely rich desert peninsula got chosen by FIFA (the international governing body for major soccer tournaments) to host the soccer World Cup in 2022.

First, a personal anecdote that triggered my hypothesis about the decision. I recently sent a text message from my smartphone and then carelessly slipped it into my pocket without making sure it had gone to sleep.

Read the rest of this entry »

Comments off

Error-laden phishing attempts

Phishers trawling for email account names are generally smart enough to pull all sorts of programming tricks, forging headers and obtaining lists of spammable addresses and setting up arrangements to capture login names and passwords obediently typed in by the gullible; but then they give themselves away with errors of grammar and punctuation that are just too gross to be perpetrated by the authorized guys at the communications and technology services unit.

I received a phishing spam today that had no To-line at all (none of that "undisclosed recipients" stuff, and no mention of my email address in it anywhere). It looked sort of convincing in its announcement that webmail account holders would have to take certain steps to ensure the preservation of their address books after being "upgraded to a new enhanced Outlook interface". (My own university has, tragically, been induced to do an upgrade of this kind to its employee email services.) But the linguistic errors in the message begin with the 13th character in the From line (that second comma is wrong). I reproduce below the raw text of what I received, stripping out only the locally generated receipt and spam-checking headers (and by the way, this message—spam though it is—succeeded in getting a spam score of 0).

Read the rest of this entry »

Comments off

A record-setting pangrammatic window

A few months ago, I posted here (and on Slate's Lexicon Valley blog) about PangramTweets, a bot created by Jesse Sheidlower that combs Twitter for tweets that include all 26 letters of the alphabet. I mentioned that it would be interesting to see if PangramTweets turns up any particularly short "pangrammatic windows," i.e., pangrammatic strings in naturally occurring text. At the time, the shortest known example was 42 letters long, in a passage from Piers Anthony's Cube Route:

"We are all from Xanth," Cube said quickly. "Just visiting Phaze. We just want to find the dragon."

My post inspired Malcolm Rowe, a software engineer at Google, to set about finding short pangrammatic windows in an automated fashion, first on the Project Gutenberg corpus and then on the megacorpus of web pages indexed by Google. (Let's hear it for Google's 20 percent time!) On his blog, Malcolm now reports on his findings, including the discovery of a 36-letter pangrammatic window that appeared in a review of the movie Magnolia on PopMatters:

Further, fractal geometries are replicated on a human level in the production of certain “types” of subjectivity: for example, aging kid quiz show whiz Donnie Smith (William H. Macy) and up and coming kid quiz show whiz Stanley Spector (Jeremy Blackman) are connected (or, perhaps, being cloned) in ways they couldn’t possibly imagine.

Read the rest of this entry »

Comments (14)

The second life of a Language Log comment

More than four years ago, on Aug. 23, 2010, Doctor Science left the following comment on a post by Mark Liberman, "Cell phone cupertinos":

I'm pretty sure I saw something several years ago about a whole dialect (argot? jargon? slang?) that had developed among young people in Japan (or possibly some other Asian country), based on phone cupertinos. Basically, they used the first suggestion from the autocomplete function *instead* of the original target word, to create an argot that was reasonably opaque to outsiders.

Now that comment has been brought back from the dead, appearing in two different articles about autocorrect.

Read the rest of this entry »

Comments (27)

The state of the machine translation art

I don't know any Hebrew. So when I recently saw a comment in Hebrew on a Google Plus page of discussion about Gaza tunnel-building that I was looking at, I clicked (with some forebodings) on the "Translate" link to see what it meant. What I got was this:

Some grazing has hurt they Stands citizens Susan Hammer year

This does not even offer enough of an inkling to permit me to guess at what the writer of the original Hebrew might have been saying. It might as well have said "Grill tree ecumenical the fox Shove sample Quentin Garage plastic."

Read the rest of this entry »

Comments off

Bonfire beneficiaries

Subeditor Humphrey Evans points out to me that the grammar of phishing spam emails is getting worse and worse, rather than better. He recently saw one that contained this text:

The sum of (6.5M Euros only will be transfer into your account after the processing of all relevant legal documents with your name as the bonfire beneficiary, the transfer will be made by Draft or telegraphic Transfer (T/T), conformable in 3 working days as soon as you apply to the bank director.

That "bonfire beneficiary" bit is an eyebrow-raiser, isn't it? It seems to be an error for the Latin phrase bona fide "good faith".

Read the rest of this entry »

Comments off

PangramTweets

The Twitter API, beyond its great utility for corpus linguistics (see "On the front lines of Twitter linguistics," "The he's and she's of Twitter"), has made possible a lot of fun automated text-mining projects. One fertile area is algorithmic found poetry: there have been Twitter bots designed to find accidental haikus, and even more impressively, a bot named @Pentametron that finds rhyming tweets in iambic pentameter and fashions sonnets out of them.

And then there is found wordplay, which is its own kind of found poetry. I'm a big fan of @Anagramatron, which discovers paired tweets that form serendipitous anagrams of each other. (Example: "Last time I do anything" ⇔ "That's it. I'm dying alone.") Now, courtesy of Jesse Sheidlower, comes @PangramTweets, in which each tweet contains every letter of the alphabet at least once.

Read the rest of this entry »

Comments (8)

The future of Chinese language learning is now

When I began learning Mandarin nearly half a century ago, I knew exactly how I wanted to acquire proficiency in the language.  Nobody had to tell me how to do this; I knew it instinctively.  The main features of my desired regimen would be to:

1. pay little or no attention to memorizing characters (I would have been content with actively mastering 25 or so very high frequency characters and passively recognizing at most a hundred or so high frequency characters during the first year)

2. focus on pronunciation, vocabulary, grammar, particles, morphology, syntax, idioms, patterns, constructions, sentence structure, rhythm, prosody, and so forth — real language, not the script

3. read massive amounts of texts in Romanization and, if possible later on (after about half a year when I had the basics of the language nailed down), in character texts that would be phonetically annotated

Read the rest of this entry »

Comments (40)

Emojify the Web: "the next phase of linguistic evolution"

Today's announcement from the Google Chrome team (yes, note the date):


Read the rest of this entry »

Comments (8)

Swype and Voice Recognition for mobile device inputting

In late 2012, while visiting my son Tom in Dallas, I noticed that he was doing something very odd with his cell phone.  Most people enter text into their cell phone by pressing their thumbs (or their fingertip) on the letters of a small keyboard, whether virtual or actual.  But Tom was doing something altogether different:  he was sliding his finger over the glass surface of his phone and somehow, by so doing, he was able to enter text.  I was dumbfounded!  What amazed me most of all was how casual he was about it.  He'd be talking to me about something, then glance down at his cell phone, move his fingertip around on the glass, and — presto digito! — he'd have typed a message to someone and sent it off.

Read the rest of this entry »

Comments (42)