Language Log

Archive for Language on the internets

Anatomy of a spambot

April 23, 2013 @ 10:23 pm· Filed by Ben Zimmer under Computational linguistics, Language on the internets

We've often had occasion to wonder how spammy blog comments are linguistically constructed. (See, most recently, Mark Liberman's post, "Numerous upon the written content material," in which he refers to spam comments as "aleatoric sub-poetry.") Now, on Quartz, David Yanofsky and Zachary M. Seward expose how spam comments are engineered:

Comment spam follows a formula, which was made plain the other day when a spambot accidentally posted its entire template on the blog of programmer Scott Hanselman. With his permission, we’ve reproduced some of the spam comment recipes here and added colorful formatting to make it readable. The spambot constructs new, vaguely unique comments by selecting from each set of options. We hope you find it wonderful | terrific | brilliant | amazing | great | excellent | fantastic | outstanding | superb.

Read the rest of this entry »

Permalink Comments (27)

My country

January 23, 2013 @ 9:27 pm· Filed by Victor Mair under Idioms, Language and the media, Language on the internets

Sima (long-term resident in China) from www.sinoglot.com writes:

I've been a regular Sina Weibo [VHM: PRC clone of Twitter] user for some time and enjoy default news updates on my phone. Each update usually has two stories and, of late, almost invariably, one is about the outing of a corrupt official (cash, apartments, mistresses) and the second is about the latest 'play' over those rocks in the sea near Taiwan.

My latest update says:

我海监船再入钓岛拒绝日本抗议

[VHM: wǒ hǎi jiān chuán zài rù Diàodǎo jùjué Rìběn kàngyì
literal rendering of each syllable or word: I / We sea surveillance ship(s) again enter Fishing Island reject Japan protest]

Whilst I'm used to expressions like 我国 [VHM: wǒguó {"my / our country"}], which I wilfully employ when talking about 'my England', much to some people's disgust, and 我校 [VHM: wǒxiào {"my / our school"}], which I actually write in articles and official documents relating to the school cricket team [VHM: in China] (which I may have bored you about at some time), I'm not accustomed to such flexible employment of 我.

Do you know whether this use of 我校, 我国, etc. has a long history (i.e., pre-1949, or pre-1919)? Can 我 be freely applied? Is there a name for this phenomenon?

It reminds me a little of Western attitudes to sports teams; 'we won the world cup', when obviously said cup was won by eleven or so over-paid men who kick balls for a living, and not (usually) by the speaker himself.

Read the rest of this entry »

Permalink Comments (19)

Perhaps now more than ever, ain't nobody got time fo that

December 5, 2012 @ 2:43 am· Filed by David Beaver under Language on the internets, Linguistics in the news, Philosophy of Language, Pragmatics

Philosophy and the Poetic Imagination
by E. Lepore & M. Stone, 2012

Perhaps now
More than
Ever
We spend our days
Immersed in
Language

Read the rest of this entry »

Permalink Comments (18)

Where's Xi?

September 10, 2012 @ 6:08 pm· Filed by Victor Mair under Language and politics, Language and the media, Language on the internets

Supposing Mitt Romney cancelled all of his appearances and meetings and went missing for a week. Furthermore, neither the Republican National Committee nor the Secret Service would make any statements or answer any questions concerning his whereabouts. Naturally, we would all be alarmed and wondering what had happened to the Republican candidate for the presidency of the United States of America. But imagine, if you can, that it would be illegal to search for Romney's name on the internet. All searches for "Romney" and "Mitt Romney" would be decisively blocked by the United States Government, and one might well be arrested for complaining about this. Out of frustration, citizens would search for "Room Knee eh?", "Glove ROM leg joint", and the like.

Read the rest of this entry »

Permalink Comments (33)

On the front lines of Twitter linguistics

October 30, 2011 @ 6:27 pm· Filed by Ben Zimmer under Computational linguistics, Language and technology, Language on the internets

I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.

Read the rest of this entry »

Permalink Comments (14)

Censoring "Occupy" in China

October 24, 2011 @ 11:35 pm· Filed by Ben Zimmer under Language and politics, Language on the internets, Lost in translation

Last weekend I was on the NPR show "On the Media" to talk about how the word occupy has evolved since the beginning of the Occupy Wall Street movement in mid-September. I reiterated a point I had made in my Word Routes column the previous week, namely that the success of the movement has been helped along by the modular nature of the Occupy slogan, allowing any place name to fill the "Occupy ___" template. That template has shown up in protests around the world, from Frankfurt to Tokyo, with English Occupy generally left intact (perhaps for maximum media impact). In China, meanwhile, Occupy has a translation-equivalent that is being censored online.

Read the rest of this entry »

Permalink Comments (25)

The Mock Spanglish of @ElBloombito

August 29, 2011 @ 3:37 pm· Filed by Ben Zimmer under Humor, Language and politics, Language contact, Language on the internets

If nothing else, Hurricane Irene leaves us with the legacy of a fine fake-Twitter account, @ElBloombito (aka "Miguel Bloombito"), which takes satirical aim at the Spanish-language announcements that New York City Mayor Mike Bloomberg appended to the end of his many hurricane-related press conferences. Bloomberg has been working on his Spanish public speaking for years (and has even received intensive tutoring sessions), but his very Bloombergian enunciation was too good a target to pass up for Rachel Figueroa-Levin, the creator of the @ElBloombito Twitter account.

Read the rest of this entry »

Permalink Comments (28)

Text Message Language Is Everywhere

March 7, 2011 @ 5:40 pm· Filed by Bill Poser under Humor, Language on the internets

Those who hate text message abbreviations will be dismayed to learn of how far they have spread. Here is the sign at the gas station on the Gitksan reservation in Hazelton, British Columbia.
The gas station on the reservation in Hazelton, BC.

Permalink Comments (33)

Dear [Epithet] spamference organizer [Name]

October 6, 2010 @ 12:52 pm· Filed by Geoffrey K. Pullum under Language on the internets, The academic scene

The most unsuccessful piece of pseudo-personal spam I received this week must surely be the falsely flattering invitation that began as follows:

Dear Professor [Name][Name1],
We would like to invite you as Invited Speaker on the area of Social Sciences, Law, Finances and Humanities in the Conferences Vouliagmeni Beach, Athens, Greece, December 29-31, 2010
Organized by the European Society for Environmental Research and Sustainable Development / EUROPMENT, www.europment.org in collaboration with the WSEAS...

Dear Professor [Name][Name1]? Come on, spamsters! Can't you even do a standard mail merge? Isn't that the core of your goddamn lousy trade?

Would it be OK with you if I gave an invited talk entitled "[Title][Subtitle]"?

Read the rest of this entry »

Permalink Comments (16)

Facebook Absolutely Must Die

May 22, 2010 @ 9:21 pm· Filed by Victor Mair under Humor, Language on the internets

The official name of Facebook in China, as it appears on the Chinese version of its Website, is simply "Facebook." It is unofficially, but commonly, referred to as Liǎnshū 臉書 (lit., "face book").

Lately, however, Fēisǐbùkě 非死不可 has become a popular way of transcribing the name "Facebook."

Read the rest of this entry »

Permalink Comments (15)

وزارة-الأتصالات.مصر leads the non-Latin charge

May 6, 2010 @ 1:06 pm· Filed by Ben Zimmer under Language and technology, Language on the internets, Writing systems

The first Internet domain names using non-Latin characters are being rolled out, a plan put into motion after approval from the Internet Corporation for Assigned Names and Numbers (ICANN). Arabic-speaking nations are the first to reap the orthographic benefits, with new country codes available for Egypt (مصر), Saudi Arabia (السعودية), and the United Arab Emirates (امارات). The Egyptian Ministry of Communications and Information Technology, previously online at <http://www.mcit.gov.eg/>, is blazing the trail with its new URL:

<وزارة-الأتصالات.مصر>

Not everything is fully worked out with the new system, though. Browsers that aren't caught up to speed on the non-Latin domain names will see the addresses rendered as Latinized gobbledygook. The Egyptian Communication Ministry's Arabic-script URL, for instance, currently resolves to <http://xn—-rmckbbajlc6dj7bxne2c.xn--wgbh1c/>. That's not very communicative.

[Update: See the very helpful comments below for an explanation of the Latinized encoding.]

Permalink Comments (20)

Translate at your own risk

January 7, 2010 @ 7:18 pm· Filed by Eric Baković under Language on the internets, Lost in translation

Last month I posted a link to a Schott's Vocab Q&A with Claude Hagège on endangered languages. Some commenters immediately picked up on one of Hagège's statements about translation:

However, there exists an important activity which clearly shows that even though the ways languages grasp the world may vary widely from one language to another, they all build, in fact, the same contents, and equivalent conceptions of the world. This activity is translation. Any text in any language can be translated into a text in another language. These two texts express the same meaning. We can therefore conclude that despite the differences between the ways languages grasp the world, all languages are easily convertible into one another, because humans interpret the world along the same, or comparable, semantic lines.

Barbara Partee contributed this comment:

Emmon Bach has put it nicely: The best argument in favor of the universality of natural language expressive power is the possibility of translation. The best argument against universality is the impossibility of translation (i.e. that we often can't really translate exactly). [link added–EB]

Translation ain't easy, even for skilled humans — and (especially) for machines. Google Translate appears to be among the better tools out there, but as the comments section of what (I believe) was Language Log's first reference to Google's translation tool shows, you can have quite a bit of fun breaking it. Moreover, breaking it is easy and can happen completely inadvertently, a lesson that (from what I hear, anyway) is quite often learned too late by desperate students trying to take shortcuts while doing their homeworks for beginning language classes.

Read the rest of this entry »

Permalink Comments (32)

Spamalot

November 17, 2009 @ 12:16 pm· Filed by Arnold Zwicky under Language on the internets, Words words words

In my recent go rogue posting, I reported a comment on an earlier posting from Daniel Gustav Anderson on go rogue as a sexual euphemism, saying that at first I suspected the comment of being spam, but decided it was legit. Then Jake Townhead commented on my posting, questioning my use of the word spam and suggesting that Anderson's comment was merely "bespoke mischief". So now some words on spam.

Read the rest of this entry »

Permalink Comments (22)

« Previous Page — « Previous Entries

Next Entries » — Next Page »

Archive for Language on the internets

Anatomy of a spambot

My country

Perhaps now more than ever, ain't nobody got time fo that

Where's Xi?

On the front lines of Twitter linguistics

Censoring "Occupy" in China

The Mock Spanglish of @ElBloombito

Text Message Language Is Everywhere

Dear [Epithet] spamference organizer [Name]

Facebook Absolutely Must Die

وزارة-الأتصالات.مصر leads the non-Latin charge

Translate at your own risk

Spamalot

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta