Archive for Language on the internets

Anatomy of a spambot

We've often had occasion to wonder how spammy blog comments are linguistically constructed. (See, most recently, Mark Liberman's post, "Numerous upon the written content material," in which he refers to spam comments as "aleatoric sub-poetry.") Now, on Quartz, David Yanofsky and Zachary M. Seward expose how spam comments are engineered:

Comment spam follows a formula, which was made plain the other day when a spambot accidentally posted its entire template on the blog of programmer Scott Hanselman. With his permission, we’ve reproduced some of the spam comment recipes here and added colorful formatting to make it readable. The spambot constructs new, vaguely unique comments by selecting from each set of options. We hope you find it wonderful | terrific | brilliant | amazing | great | excellent | fantastic | outstanding | superb.

Read the rest of this entry »

Comments (27)

My country

Sima (long-term resident in China) from www.sinoglot.com writes:

I've been a regular Sina Weibo [VHM:  PRC clone of Twitter] user for some time and enjoy default news updates on my phone. Each update usually has two stories and, of late, almost invariably, one is about the outing of a corrupt official (cash, apartments, mistresses) and the second is about the latest 'play' over those rocks in the sea near Taiwan.

My latest update says:

我海监船再入钓岛拒绝日本抗议

[VHM: wǒ hǎi jiān chuán zài rù Diàodǎo jùjué Rìběn kàngyì
literal rendering of each syllable or word:  I / We sea surveillance ship(s) again enter Fishing Island reject Japan protest]

Whilst I'm used to expressions like 我国 [VHM:  wǒguó {"my / our country"}], which I wilfully employ when talking about 'my England', much to some people's disgust, and 我校 [VHM:  wǒxiào {"my / our school"}], which I actually write in articles and official documents relating to the school cricket team [VHM:  in China] (which I may have bored you about at some time), I'm not accustomed to such flexible employment of 我.

Do you know whether this use of 我校, 我国, etc. has a long history (i.e., pre-1949, or pre-1919)? Can 我 be freely applied? Is there a name for this phenomenon?

It reminds me a little of Western attitudes to sports teams; 'we won the world cup', when obviously said cup was won by eleven or so over-paid men who kick balls for a living, and not (usually) by the speaker himself.

Read the rest of this entry »

Comments (19)

Perhaps now more than ever, ain't nobody got time fo that

Philosophy and the Poetic Imagination
by E. Lepore & M. Stone, 2012

Perhaps now
More than
Ever
We spend our days
Immersed in
Language

Read the rest of this entry »

Comments (18)

Where's Xi?

Supposing Mitt Romney cancelled all of his appearances and meetings and went missing for a week. Furthermore, neither the Republican National Committee nor the Secret Service would make any statements or answer any questions concerning his whereabouts. Naturally, we would all be alarmed and wondering what had happened to the Republican candidate for the presidency of the United States of America. But imagine, if you can, that it would be illegal to search for Romney's name on the internet. All searches for "Romney" and "Mitt Romney" would be decisively blocked by the United States Government, and one might well be arrested for complaining about this. Out of frustration, citizens would search for "Room Knee eh?", "Glove ROM leg joint", and the like.

Read the rest of this entry »

Comments (33)

On the front lines of Twitter linguistics

I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.

Read the rest of this entry »

Comments (14)

Censoring "Occupy" in China

Last weekend I was on the NPR show "On the Media" to talk about how the word occupy has evolved since the beginning of the Occupy Wall Street movement in mid-September. I reiterated a point I had made in my Word Routes column the previous week, namely that the success of the movement has been helped along by the modular nature of the Occupy slogan, allowing any place name to fill the "Occupy ___" template. That template has shown up in protests around the world, from Frankfurt to Tokyo, with English Occupy generally left intact (perhaps for maximum media impact). In China, meanwhile, Occupy has a translation-equivalent that is being censored online.

Read the rest of this entry »

Comments (25)

The Mock Spanglish of @ElBloombito

If nothing else, Hurricane Irene leaves us with the legacy of a fine fake-Twitter account, @ElBloombito (aka "Miguel Bloombito"), which takes satirical aim at the Spanish-language announcements that New York City Mayor Mike Bloomberg appended to the end of his many hurricane-related press conferences. Bloomberg has been working on his Spanish public speaking for years (and has even received intensive tutoring sessions), but his very Bloombergian enunciation was too good a target to pass up for Rachel Figueroa-Levin, the creator of the @ElBloombito Twitter account.

Read the rest of this entry »

Comments (28)

Text Message Language Is Everywhere

Those who hate text message abbreviations will be dismayed to learn of how far they have spread. Here is the sign at the gas station on the Gitksan reservation in Hazelton, British Columbia.
The gas station on the reservation in Hazelton, BC.

Comments (33)

Dear [Epithet] spamference organizer [Name]

The most unsuccessful piece of pseudo-personal spam I received this week must surely be the falsely flattering invitation that began as follows:

Dear Professor [Name][Name1],

We would like to invite you as Invited Speaker on the area of Social Sciences, Law, Finances and Humanities in the Conferences

Vouliagmeni Beach, Athens, Greece, December 29-31, 2010

Organized by the European Society for Environmental Research and Sustainable Development / EUROPMENT, www.europment.org in collaboration with the WSEAS...

Dear Professor [Name][Name1]? Come on, spamsters! Can't you even do a standard mail merge? Isn't that the core of your goddamn lousy trade?

Would it be OK with you if I gave an invited talk entitled "[Title][Subtitle]"?

Read the rest of this entry »

Comments (16)

Facebook Absolutely Must Die

The official name of Facebook in China, as it appears on the Chinese version of its Website, is simply "Facebook."  It is unofficially, but commonly, referred to as Liǎnshū 臉書 (lit., "face book").

Lately, however, Fēisǐbùkě 非死不可 has become a popular way of transcribing the name "Facebook."

Read the rest of this entry »

Comments (15)

وزارة-الأتصالات.مصر leads the non-Latin charge

The first Internet domain names using non-Latin characters are being rolled out, a plan put into motion after approval from the Internet Corporation for Assigned Names and Numbers (ICANN). Arabic-speaking nations are the first to reap the orthographic benefits, with new country codes available for Egypt (مصر), Saudi Arabia (السعودية), and the United Arab Emirates (امارات). The Egyptian Ministry of Communications and Information Technology, previously online at <http://www.mcit.gov.eg/>, is blazing the trail with its new URL:

<وزارة-الأتصالات.مصر>

Not everything is fully worked out with the new system, though. Browsers that aren't caught up to speed on the non-Latin domain names will see the addresses rendered as Latinized gobbledygook. The Egyptian Communication Ministry's Arabic-script URL, for instance, currently resolves to <http://xn—-rmckbbajlc6dj7bxne2c.xn--wgbh1c/>. That's not very communicative.

[Update: See the very helpful comments below for an explanation of the Latinized encoding.]

Comments (20)

Translate at your own risk

Last month I posted a link to a Schott's Vocab Q&A with Claude Hagège on endangered languages. Some commenters immediately picked up on one of Hagège's statements about translation:

However, there exists an important activity which clearly shows that even though the ways languages grasp the world may vary widely from one language to another, they all build, in fact, the same contents, and equivalent conceptions of the world. This activity is translation. Any text in any language can be translated into a text in another language. These two texts express the same meaning. We can therefore conclude that despite the differences between the ways languages grasp the world, all languages are easily convertible into one another, because humans interpret the world along the same, or comparable, semantic lines.

Barbara Partee contributed this comment:

Emmon Bach has put it nicely: The best argument in favor of the universality of natural language expressive power is the possibility of translation. The best argument against universality is the impossibility of translation (i.e. that we often can't really translate exactly). [link added–EB]

Translation ain't easy, even for skilled humans — and (especially) for machines. Google Translate appears to be among the better tools out there, but as the comments section of what (I believe) was Language Log's first reference to Google's translation tool shows, you can have quite a bit of fun breaking it. Moreover, breaking it is easy and can happen completely inadvertently, a lesson that (from what I hear, anyway) is quite often learned too late by desperate students trying to take shortcuts while doing their homeworks for beginning language classes.

Read the rest of this entry »

Comments (32)

Spamalot

In my recent go rogue posting, I reported a comment on an earlier posting from Daniel Gustav Anderson on go rogue as a sexual euphemism, saying that at first I suspected the comment of being spam, but decided it was legit. Then Jake Townhead commented on my posting, questioning my use of the word spam and suggesting that Anderson's comment was merely "bespoke mischief". So now some words on spam.

Read the rest of this entry »

Comments (22)