Archive for Language on the internets

A record-setting pangrammatic window

A few months ago, I posted here (and on Slate's Lexicon Valley blog) about PangramTweets, a bot created by Jesse Sheidlower that combs Twitter for tweets that include all 26 letters of the alphabet. I mentioned that it would be interesting to see if PangramTweets turns up any particularly short "pangrammatic windows," i.e., pangrammatic strings in naturally occurring text. At the time, the shortest known example was 42 letters long, in a passage from Piers Anthony's Cube Route:

"We are all from Xanth," Cube said quickly. "Just visiting Phaze. We just want to find the dragon."

My post inspired Malcolm Rowe, a software engineer at Google, to set about finding short pangrammatic windows in an automated fashion, first on the Project Gutenberg corpus and then on the megacorpus of web pages indexed by Google. (Let's hear it for Google's 20 percent time!) On his blog, Malcolm now reports on his findings, including the discovery of a 36-letter pangrammatic window that appeared in a review of the movie Magnolia on PopMatters:

Further, fractal geometries are replicated on a human level in the production of certain “types” of subjectivity: for example, aging kid quiz show whiz Donnie Smith (William H. Macy) and up and coming kid quiz show whiz Stanley Spector (Jeremy Blackman) are connected (or, perhaps, being cloned) in ways they couldn’t possibly imagine.

Read the rest of this entry »

Comments (14)

Emotional contagion

As usual, xkcd nails it:

Mouseover title: "I mean, it's not like we could just demand to see the code that's governing our lives. What right do we have to poke around in Facebook's private affairs like that?"

Read the rest of this entry »

Comments (12)

Is the Urdu script on the verge of dying?

Hindi-Urdu, also referred to as Hindustani, is the classic case of a digraphia, so much so that there has been a long-standing controversy over whether they are one language or two.  Their colloquial spoken forms are nearly identical, but when written down, the one in the Devanāgarī script, the other in the Nastaʿlīq script, they have a very different look and "feel".

Read the rest of this entry »

Comments (56)

Banned in Beijing

Everyone knows that the Chinese government goes to extraordinary lengths to police the internet (see: "Blocked on Weibo").

And most sentient beings are aware of the awesome fame of the Grass-Mud Horse, the notorious Franco-Croatian Squid, and and the mysterious River Crab.  You can find all of them in "Grass-Mud Horse Lexicon Classics".

Sometimes, the censors begin to look pretty ridiculous, as when they outlawed the word "jasmine" in 2011, particularly since it refers not just to the Jasmine Revolution, but also to a favorite flower, tea, and folk song.

mòlì 茉莉 ("jasmine")

mòlì chá 茉莉茶 ("jasmine tea") OR mòlìhuā chá 茉莉花茶 ("jasmine tea") OR xiāngpiàn 香片 ("scented [usually with jasmine] tea")

mòlìhuā 茉莉花 ("jasmine flower", name of a popular folk song; presidents Jiang Zemin and Hu Jintao were both excessively fond of this song, and there are videos of them singing it, so it becomes especially awkward to try to forbid citizens to use the word mòlì 茉莉 ("jasmine")

Read the rest of this entry »

Comments (28)

Bonfire beneficiaries

Subeditor Humphrey Evans points out to me that the grammar of phishing spam emails is getting worse and worse, rather than better. He recently saw one that contained this text:

The sum of (6.5M Euros only will be transfer into your account after the processing of all relevant legal documents with your name as the bonfire beneficiary, the transfer will be made by Draft or telegraphic Transfer (T/T), conformable in 3 working days as soon as you apply to the bank director.

That "bonfire beneficiary" bit is an eyebrow-raiser, isn't it? It seems to be an error for the Latin phrase bona fide "good faith".

Read the rest of this entry »

Comments off

PangramTweets

The Twitter API, beyond its great utility for corpus linguistics (see "On the front lines of Twitter linguistics," "The he's and she's of Twitter"), has made possible a lot of fun automated text-mining projects. One fertile area is algorithmic found poetry: there have been Twitter bots designed to find accidental haikus, and even more impressively, a bot named @Pentametron that finds rhyming tweets in iambic pentameter and fashions sonnets out of them.

And then there is found wordplay, which is its own kind of found poetry. I'm a big fan of @Anagramatron, which discovers paired tweets that form serendipitous anagrams of each other. (Example: "Last time I do anything" ⇔ "That's it. I'm dying alone.") Now, courtesy of Jesse Sheidlower, comes @PangramTweets, in which each tweet contains every letter of the alphabet at least once.

Read the rest of this entry »

Comments (8)

Cantonese poetry recitation

A recent issue (1/7/14) of the South China Morning Post (SCMP) carried an article by a staff reporter entitled "Hong Kong student's poem recital goes viral in the mainland ". The article features this amazing video of a Hong Kong high school student reciting a couple of Classical Chinese poems:


Read the rest of this entry »

Comments (19)

"People mountain, people sea" and "let's play"

Stephan Stiller says that my post on "Good good study; day day up" reminds him of "people mountain, people sea" (rénshānrénhǎi 人山人海), i.e., "crowded; packed; a sea of people".  This is another fairly complex Chinglishism that has entered the vocabulary of many English speakers who know no Chinese.  It was popularized by a Hong Kong music production company that took this expression as its name, and there was also a Hong Kong film that used this expression as its title.

Read the rest of this entry »

Comments (31)

Tyrant's bling

Arguably the hottest term on the Chinese internet these days is tǔháo 土豪 ("[local] tyrant / despot"), but transformed to mean "bling", and with a sharply satirical edge.  How did tǔháo 土豪 ("[local] tyrant / despot") morph into "bling"?  The story is told in "#BBCtrending: Tuhao and the rise of Chinese bling".

Read the rest of this entry »

Comments (3)

The English language's Twitter feed

I have a piece on Fresh Air today, behind the curve as usual, on the discussion that followed the Oxford Dictionary Online's inclusion of twerk, which Ben Zimmer covered in a post a couple of weeks ago ("Getting worked up over 'twerk'"). Actually I don't care much about twerk, whose coolness and credentials Ben defended definitively. But I think it's worth looking at the whole list of new words that appeared on the ODO blog post announcing the quarterly update, headed "Buzzworthy words added to Oxford Dictionaries Online – squee!":

apols, A/W (“autumn/winter”), babymoon, balayage (“a technique for highlighting hair”), bitcoin, blondie (small cake), buzzworthy, BYOD (“bring your own device”), cake pop, chandelier earring, child’s pose (yoga), click and collect, dad dancing, dappy, derp, digital detox, double denim, emoji, fauxhawk, FIL (“father-in-law”), flatform (shoe), FOMO (“Fear Of Missing Out”), food baby (“a protruding stomach caused by eating a large quantity of food”), geek chic, girl crush, grats, guac, hackerspace, Internet of things, jorts, LDR, me time, michelada (“drink made with beer, lime juice…”), MOOC, Nordic noir, omnishambles, pear cider[see comment below], phablet, pixie cut, prep (v. “prepare”), selfie, space tourism, squee, srsly, street food, TL;DR, trolly dash (UK supermarket promotion), twerk, unlike (v.), vom (“vomit”)

I’ve bolded the ones that seem to me to have a chance of being still current by the end of the decade, including a few that have been around for quite a while. Some of this is pure guesswork (if you have inside knowledge about bitcoin, let me know) and others may scrape by, but it's a fair bet that the vast majority are not going to survive your hamster.

Read the rest of this entry »

Comments (38)

Grass-Mud Horse Lexicon Classics

China Digital Times (CDT) Grass-Mud Horse Lexicon is the premier place to go for Chinese netizen language designed to avoid the censors and to poke fun at the political system.

Over the years, CDT has accumulated 273 entries in its Grass-Mud Horse Lexicon.  From these, the CDT editors have selected 71 essential items for inclusion in The Grass-Mud Horse Lexicon: Classic Netizen Language, which has just been published.

Here's the Kindle edition on Amazon.

Read the rest of this entry »

Comments (1)

Subversion at the spam factory?

So this is new, at least for me — the latest batch of a few thousand spam comments (adding to the pile of 5,095,703 caught so far) pretends to come from people using negatively-evaluated pseudonyms in Spanish, like caca, ladrones, or indecentes:

Read the rest of this entry »

Comments (14)

Anatomy of a spambot

We've often had occasion to wonder how spammy blog comments are linguistically constructed. (See, most recently, Mark Liberman's post, "Numerous upon the written content material," in which he refers to spam comments as "aleatoric sub-poetry.") Now, on Quartz, David Yanofsky and Zachary M. Seward expose how spam comments are engineered:

Comment spam follows a formula, which was made plain the other day when a spambot accidentally posted its entire template on the blog of programmer Scott Hanselman. With his permission, we’ve reproduced some of the spam comment recipes here and added colorful formatting to make it readable. The spambot constructs new, vaguely unique comments by selecting from each set of options. We hope you find it wonderful | terrific | brilliant | amazing | great | excellent | fantastic | outstanding | superb.

Read the rest of this entry »

Comments (27)

My country

Sima (long-term resident in China) from www.sinoglot.com writes:

I've been a regular Sina Weibo [VHM:  PRC clone of Twitter] user for some time and enjoy default news updates on my phone. Each update usually has two stories and, of late, almost invariably, one is about the outing of a corrupt official (cash, apartments, mistresses) and the second is about the latest 'play' over those rocks in the sea near Taiwan.

My latest update says:

我海监船再入钓岛拒绝日本抗议

[VHM: wǒ hǎi jiān chuán zài rù Diàodǎo jùjué Rìběn kàngyì
literal rendering of each syllable or word:  I / We sea surveillance ship(s) again enter Fishing Island reject Japan protest]

Whilst I'm used to expressions like 我国 [VHM:  wǒguó {"my / our country"}], which I wilfully employ when talking about 'my England', much to some people's disgust, and 我校 [VHM:  wǒxiào {"my / our school"}], which I actually write in articles and official documents relating to the school cricket team [VHM:  in China] (which I may have bored you about at some time), I'm not accustomed to such flexible employment of 我.

Do you know whether this use of 我校, 我国, etc. has a long history (i.e., pre-1949, or pre-1919)? Can 我 be freely applied? Is there a name for this phenomenon?

It reminds me a little of Western attitudes to sports teams; 'we won the world cup', when obviously said cup was won by eleven or so over-paid men who kick balls for a living, and not (usually) by the speaker himself.

Read the rest of this entry »

Comments (19)

Perhaps now more than ever, ain't nobody got time fo that

Philosophy and the Poetic Imagination
by E. Lepore & M. Stone, 2012

Perhaps now
More than
Ever
We spend our days
Immersed in
Language

Read the rest of this entry »

Comments (18)