Let me count the words

« previous post | next post »

I was delighted to see this article at the NYT profiling a friend and colleague of mine, Jamie Pennebaker. You might also like to check out this website where he and his students analyze language use in a little preznitential contest thing that appears to happen for about two years of every four in the country I call home. (In return, it calls me a resident alien.)

If you're a linguist, I'm guessing you'll either love Pennebaker's work or hate it. Why might you hate it? Because he's a social psychologist who looks at at language in the most superficial way possible, eschewing all the tools of modern linguistic theory in favor of word counts. Not a tree in sight.

But why might some linguists love Pennebaker's work? I can think of two reasons. First, he keeps finding significant effects linking these superficial word counts to things people care about, like probability of  depression, rates at which medical interventions are needed, how long relationships last, whether someone is lying, or which member of a pair of speakers has higher social status. Second, many of those significant effects involve relative frequencies of function words (e.g. a vs the, or I vs you). And surely all linguists, even those that would rather see the trees than the wood, love function words.

Funny thing about Jamie Pennebaker: he invented computational linguistics. Well, not really. He invented his own brand of computational linguistics about 15 years ago completely independently of the mainstream. His own brand involves using a program called LIWC to automatically assess frequencies of various words and word classes in a bunch of languages (11 at last count, I think).

Here's a little story Jamie told me about why he started doing computational linguistics. He was working on a bunch of studies on depression and on how people dealt with catastrophic upheavals in their lives. So he had graduate students working through thousands and thousands of diary entries written by severely unhappy individuals, so as to analyze their language. Well, as you can imagine, Jamie quickly realized that if he followed that line of research for much longer, he'd end up with no graduate students at all. So his only option was to create a program to read the diary entries automatically. And thus was computational linguistics (Pennebaker style) born.


Update: Links to earlier Language Log posts on Jamie Pennebaker's work

Language Log: Women and discourse markers

Language Log: Language in the social and behavorial sciences

Language Log: Language Log changes personality

Language Log: Male and Female College Students are Equally Talkative

Language Log: The first time?

Language Log: What men and women blog about

Language Log: They just don't care


  1. Jean-Sébastien Girard said,

    October 14, 2008 @ 2:07 pm

    Err, you accidentally left two duplicates in your list of previous posts.

  2. David Beaver said,

    October 14, 2008 @ 10:26 pm

    @ Jean-Sebastien Girard: Thanks! Now corrected.

  3. Angus Grieve-Smith said,

    October 22, 2008 @ 11:30 pm

    I would call that corpus linguistics, not computational linguistics. I'm not trying to be nitpicky: there's a big difference. Pennebaker isn't working on machine translation or speech synthesis, he's automatically analyzing texts to find patterns. There are some parts of computational linguistics that are close to what he does, but it really falls squarely under (most) definitions of corpus linguistics.

RSS feed for comments on this post