Corpus-Wide Association Studies

I've spent the past couple of days at GURT 2012, and one of the interesting talks that I've heard was Julian Brooke and Sali Tagliamonte, "Hunting the linguistic variable: using computational techniques for data exploration and analysis". Their abstract (all that's available of the work so far) explains that:

The selection of an appropriate linguistic variable is typically the first step of a variationist analysis whose ultimate goal is to identify and explain social patterns. In this work, we invert the usual approach, starting with the sociolinguistic metadata associated with a large scale socially stratified corpus, and then testing the utility of computational tools for finding good variables to study. In particular, we use the 'information gain' metric included in data mining software to automatically filter a huge set of potential variables, and then apply our own corpus reader software to facilitate further human inspection. Finally, we subject a small set of particularly interesting features to a more traditional variationist analysis.

This type of data-mining for interesting patterns is likely to become a trend in sociolinguistics, as it is in other areas of the social and behavioral sciences, and so it's worth giving some thought to potential problems as well as opportunities.

Read the rest of this entry »

Comments (12)


Queen of the World

Cindy, who works in my favorite barber shop next to the Penn campus, has the following symbols tattooed on her back:

I instantly recognized the first and last as two quite well-formed Chinese characters.  After two or three seconds of puzzling, I realized that the third symbol is another Chinese character written upside down and backwards (how the tattoo artist achieved that is a bit of a mystery, especially since he / she got the first and fourth one in their correct orientation).  The second character was more refractory.

Read the rest of this entry »

Comments (44)


"Passive voice" in the comics

Panels two and three (of six) from David Malki's most recent Illustrated Jocularity, "The Wish of the Starhorse":

Read the rest of this entry »

Comments (30)


The QWERTY effect

Rebecca Rosen, "The QWERTY Effect: The Keyboards Are Changing Our Language!", The Atlantic:

It's long been thought that how a word sounds — it's very phonemes — can be related in some ways to what that word means. But language is no longer solely oral. Much of our word production happens not in our throats and mouths but on our keyboards. Could that process shape a word's meaning as well?

That's the contention of an intriguing new paper by linguists Kyle Jasmin and Daniel Casasanto. They argue that because of the QWERTY keyboard's asymmetrical shape (more letters on the left than the right), words dominated by right-side letters "acquire more positive valences" — that is to say, they become more likable. Their argument is that because its easier for your fingers to find the correct letters for typing right-side dominated words, the words subtly gain favor in your mind.

There's a lot of media uptake for this work: Rachel Zimmerman, "Typing and the meaning of words", Common Health; "QWERTY Keyboard Leads to Feelings about Words", Scientific American; Rob Waugh, "Why just typing 'LOL' makes you happy: People like words made of letters from the right-hand side of the QWERTY keyboard", Daily Mail; Alasdair Williams, "The 'QWERTY Effect' is changing what words mean to us", io9; "The right type of words", e! Science News; Dave Mosher "The QWERTY Effect: How Typing May Shape the Meaning of Words", Wired News; Rebecca Rosen "The QWERTY Effect: The Keyboards Are Changing Our Language", The Atlantic, etc.

Read the rest of this entry »

Comments (60)


Burlesques, parodies, playful allusions

On my personal blog, here, an inventory of postings on these topics — at the moment, only postings on my blog.

Comments off


Shedding and casting doubt and light

Philip Spaelti writes:

I am having one of those moments. Correcting a student's paper I came across:  "This behavior seems to shed doubt on treatments which always regard V2 as head."  "Shed light",  "cast doubt (on)", OK, but "shed doubt (on)" doesn't quite compute for me. Or have I just been in Japan too long?

Read the rest of this entry »

Comments (24)


No Arabic word for bluff?

Those familiar with our history of "No word for X" posts will appreciate Haider Ala Hamoudi's essay "The Dangers of Pop Linguistics: Arab Bluffs and Arab Compromise", posted at Islamic Law in Our Times, 3/6/2012. Some useful background is provided by Geoff Nunberg's Fresh Air commentary "Meetings of the minds", 5/29/2003, which discusses the original claim that Arabic has no word for compromise.  Prof. Hamoudi muses:

I wouldn't spend time on something so silly except in reading the Arabic papers today I saw a rather striking set of translations of Barack Obama's interview in the Atlantic monthly with I think Jeffrey Goldberg, the substance of which I had already read in English. But Obama says in it something to the effect of "as President, I don't bluff" and in Arabic media reports I read and heard, two verbs were used.  One was خدع which means to deceive, and one was مزح which means to joke around.  So "I'm not deceiving you" or "I'm not joking."

Yet of course as with "compromise" neither is perfect, and as I thought about it, I cannot think of an Arabic equivalent to "bluff" that works particularly well.  To bluster and threaten, that is, without much of an intent or an ability (only need one or the other) to carry forward on the threat.  So, we don't have compromise so we cannot compromise the theory goes, but then again we don't have "bluff" either, so do we not bluff?

Read the rest of this entry »

Comments (34)


When "taking out" means "putting in"

From Hyman R.:

I managed to confuse my (nearly twelve) son last night. We were talking politics, and I was explaining to him that there were Jewish Republicans who were going to be taking out ads in the Jewish Week newspaper to try to convince Jews to vote against Obama. He said that he didn't understand why they would do that, and I tried to explain, and we went round and round for a bit until I realized that he didn't know that "take out an ad" is the same thing as "put in an ad"! He thought that they were removing such ads, and so was justifiably confused.

Read the rest of this entry »

Comments (42)


"The victims are large and costly machines"

From Shay Cohen, via Lyle Ungar, some output from a PCFG ("probabilistic context-free grammar") trained on the Wall Street Journal part of the Penn Treebank:

A manager is a better value of well-polished desks .

I have been able to force to be more receptive to therapy , and to keep the committee informed , usually in advance , of covert actions : ; the victims are large and costly machines .

The purchase of all women is in September .

Their museum had been dumping their securities for comment .

It can remember one million truly inspiring teachers from Rainbow Technologies .

Read the rest of this entry »

Comments (18)


Ask Language Log: "will have had gone"?

Lori Levin writes:

What is going on with "will have had gone"?   It gets 122,000,000 hits in Google.   I thought there could only be one auxiliary "have" per clause.    Did the English auxiliary verb system change while I wasn't looking?

Some of my students say "will have had gone" sounds completely normal to them, and some won't accept it at all.

Read the rest of this entry »

Comments (52)


Mr Justice Eady rules for free speech, for once

Mr Justice Eady does not often give me cause for joy in his rulings: he is notorious for upholding English libel judgments, where common sense and any reasonable notion of freedom of expression would (in my humble opinion) suggest that the plaintiff should be sent from the courtroom with an admonition not to be so silly ringing in his ears. But he has at least ruled against Payam Tamiz, a law student and unsuccessful Conservative Party candidate, who wanted Google to be held responsible for the content of comments on a blog that accused him of things (clearly defamatory things, it should be acknowledged).

Read the rest of this entry »

Comments off


Academic decisions

Murray Smith writes:

Friday afternoon in the car I heard a radio news report about the closing of an art gallery on Boston's tony Newbury Street.  The reporter had interviewed the gallery owner and learned that due to economic conditions gallery sales had been down forty percent the last two years.  Now the landlord was imposing a thirty percent rent increase, and the owner was throwing in the towel.  The reporter concluded, "The decision was academic; she had to do it."  The intonational profile showed that the second clause was a gloss on the first.  I was surprised, not having heard this use of "academic" before.  I have always understood "academic" in this sort of context to mean something like "without significant consequences".

Read the rest of this entry »

Comments (23)


Trent Reznor Prize contender

Via Rick Rubenstein, a nomination for the Trent Reznor Prize for Tricky Embedding: Josh Fruhlinger, The Comics Curmudgeon 3/3/2012:

Seriously, I assume that whoever hacked into servers of the market research company that’s asking newspaper readers about what they want to see in Apartment 3-G and replaced all the survey responses with “PREGO PORN” is one of my readers, and I just want you to know that you’re my hero.

Comments (4)