On the front lines of Twitter linguistics

« previous post | next post »

I have a piece in today's New York Times Sunday Review section, "Twitterology: A New Science?" In the limited space I had, I tried to give a taste of what research is currently out there using Twitter to build various types of linguistic corpora. Obviously, there's a lot more that could be said about these projects and other fascinating ones currently underway. Herewith a few notes.

  • Fellow Language Logger David Beaver and research assistants Joey Frazee and Christopher Brown at the University of Texas were extremely generous with their time and energy when I asked for some insta-analysis of tweets from Libya after the news broke of Qaddafi's death. Since then, they've been connecting up their new analysis with the work they did on tweets from Libya earlier in the year. But I'll let David talk more about this research, and how it fits into the larger project, "Modeling Discourse and Social Dynamics in Authoritarian Regimes." [Update: David follows up here.]
  • Twitter-based sentiment analysis first got some attention a couple of years ago when James Pennebaker, Roger Booth, Teal Pennebaker, and Chris Wilson created the website AnalyzeWords, which provides on-the-fly analysis of a person's Twitter feed by using the text analysis program Linguistic Inquiry and Word Count (LIWC). The work of Pennebaker and his colleagues with LIWC has been discussed on Language Log in the past (here, here, here, here, and here). LIWC is also explored in great detail in Pennebaker's book The Secret Life of Pronouns, which I reviewed for The New York Times Book Review. (The book opens with some discussion of Twitter and AnalyzeWords, but it goes on to consider a wide array of corpora analyzed with LIWC.)
  • My look at dialectal variation on Twitter was based on work done by the Carnegie Mellon researchers Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. You can check out their EMNLP 2010 paper, "A Latent Variable Model for Geographic Lexical Variation," and the slides from their LSA 2011 presentation, "Statistical Exploration of Geographical Lexical Variation in Social Media." Eisenstein is headed to a teaching job at Georgia Tech's School of Interactive Computing, but Twitter-based studies at CMU are sure to continue. A new addition to Carnegie Mellon's stable of Twitterologists is David Bamman, who Language Log readers may know from the Lexicalist project he undertook to create map visualizations of American English variation on Twitter. (See here for a guest post by Bamman about Lexicalist.)
  • Eisenstein and Bamman are currently conducting research with Tyler Schnoebelen of Stanford University that looks at how gender plays a role in language variation on Twitter. But they're going well beyond simply analyzing which language forms are associated with women and which are associated with men. Using information on people's Twitter followers, they can also take into consideration the gender makeup of people's networks. Thus, a man with a predominantly female network may show different linguistic patterns compared to a man with a male or mixed network. Earlier today, at NWAV 40, Schnoebelen presented some of his research on one aspect of Twitter discourse, emoticons. The abstract of his paper includes this great line: "Emoticons with noses are historically older." It's true! Not only that, but emoticons with noses, like :-), show distinctly different patterns of distribution than the noseless kind, like :) . Noseless emoticons tend to be used by younger Twitter users and are associated with more informal discourse. Women use them more than men, too, but women use more of all types of emoticons. I'll be looking forward to the definitive study of emoticon nosedness. [Update: Slides from Schnoebelen's NWAV talk are here.]


  1. Brett Bobley said,

    October 30, 2011 @ 7:55 pm

    Great piece in the Times, Ben. Some really interesting work. And thanks for this additional information.

  2. grackle said,

    October 30, 2011 @ 8:58 pm

    Thanks for an interesting article. I guess I would have to count myself among those who cannot fathom the popularity of Twitter. I can understand linguistic study of it, but can in no way understand why people participate in it or what it means to them that they do.

  3. David Y. said,

    October 30, 2011 @ 11:27 pm

    Fascinating. I've been on the net since the mid-80s, and I've always used the noseless emoticons and can't figure out what to do with the nosed variety.

    Perhaps I should say, "I *believe* that I have…," given the frequency of error in such self-observation oft noted on this blog. But I'm pretty confident about this.

    I don't make frequent use of "omg" or "Bieber" in my posts, either.

  4. Chad Nilep said,

    October 31, 2011 @ 2:15 am

    "many of Mr. Chomsky’s fellow linguists are discovering that Twitter can help uncover truths about our social interactions"

    Surely it's Dr. Chomsky or Prof. Chomsky. I assume a NYT copy editor 'corrected' this from "Chomsky's".

    [(bgz) The New York Times style manual does not require Dr. or Prof. for PhD holders. (Note Randal E. Bryant is referred to as "Mr. Bryant.") Here's the stylebook entry for Dr. as per Philip Corbett:

    Dr. should be used in all references for physicians and dentists whose practice is their primary current occupation, or who work in a closely related field, like medical writing, research or pharmaceutical manufacturing: Dr. Alex E. Baranek; Dr. Baranek; the doctor. (Those who practice only incidentally, or not at all, should be called Mr., Ms., Miss or Mrs.)

    Anyone else with an earned doctorate, like a Ph.D. degree, may request the title, but only if it is germane to the holder’s primary current occupation (academic, for example, or laboratory research). For a Ph.D., the title should appear only in second and later references. The holder of a Ph.D. or equivalent degree may also choose not to use the title.

    Do not use the title for someone whose doctorate is honorary.

    Mr. Chomsky has evidently not requested the title Dr., as he's Mr. elsewhere as well.]

  5. Ken Brown said,

    October 31, 2011 @ 8:05 am

    @David Y, I started using email heavily in the mid-80s and the Internet in the very late 80s and I am definitely of the "nosed" generation.

    Though I have become productively bilingual in the last 8 years or so as I have been using web forums that will recognise ":)" and so on and replace whtm with appropriate graphics, but not ":-)" so I vary what I write depending on where I am posting it.

    But for me as a reader :-) is a smilely face and the other just isn't.

  6. Rod Johnson said,

    October 31, 2011 @ 9:28 am

    I've also been on the net since the 80s and, although I started out in the :-) camp, have moved into the :) camp. The "nosed" version just seems like over-egging the pudding to me.

    My kids, however, when they're not rejecting smileys altogether, treat the discussion as irrelevant, being much more in the Japanese ^_^ camp.

  7. Mr Punch said,

    October 31, 2011 @ 10:56 am

    When I was a student in the '60s, I was told that [male, as almost all were] professors were to be addressed as "Mister" except in the medical school – where they were "Doctor" no matter what doctorate they held

  8. David Y. said,

    October 31, 2011 @ 11:06 am

    There are regional variations in Dr./Prof., too. I interviewed at a wide variety of colleges in the northeast and Midwest mainly. In the northeast, at almost every institution I visited, students called their professors "Doctor." In the Midwest, it's almost invariably "Professor" in my experience.

  9. Tyler Schnoebelen said,

    October 31, 2011 @ 11:32 am

    Thanks for the mention, Ben. My NWAV presentation on Twitter noses, winks, and more can be found here: http://bit.ly/tyleremo (it's kind of big at 4 MB, though). Folks should feel free to cruise around other emotion-stuff here: http://bit.ly/tyleremotion.

  10. Rod Johnson said,

    November 1, 2011 @ 12:54 pm

    By the way, Ben, I realize this is popular journalism, but don't you think suggesting that this is "a new science" is a little much? It's a new way of gathering a certain type of data, sure, but that's all.

    [(bgz) It may not surprise you to know that I am not in charge of writing the headlines that appear over my writing in The Times.]

  11. Rod Johnson said,

    November 1, 2011 @ 1:01 pm

    Also by the way, re Chomsky: if there ever was a person whose rhetorical style Twitter would be uncongenial to, it's him. His arguments can require an almost superhuman attention span to understand. I remember as a grad student reading Lectures on Government and Binding and repeatedly thinking "oh, that makes sense," only to have Chomsky demolish that argument and replace it with a more complex one, again and again and again. Try fitting that into 140 characters.

  12. Troy Spier said,

    November 1, 2011 @ 3:40 pm

    It would also be worthwhile to check out Indigenous Tweets, which is a tool created by Kevin Scannell to identify and keep track of "tweets" in many lesser-known languages. The page can be found here: http://indigenoustweets.com/

  13. Jerry Friedman said,

    November 2, 2011 @ 11:53 am

    On the rare occasions when I use a smiley, it's nosed, because a noseless one would deny a very plain fact about me.

    @David Y.: My experience is of far fewer places, but I seem to recall hearing "Professor" all of the time in the Northeast and most but not all of the time in the Midwest. "Doctor" was usually for post-docs.

  14. Chad Nilep said,

    February 28, 2012 @ 4:06 am

    @bgz @Chad Nilep

    NYT appears to use "Mr. Chomsky" but "Dr. Liberman" (also Dr. Eckert, Dr. Crystal, and Dr. Fought). I'm just sayin'.


RSS feed for comments on this post