Linguistic analysis in social science

It's a strange fact about social scientists that hardly any of them, in recent years, have paid any analytic attention to language, which is the main medium of human social interaction.  At schools of "communication", you'll generally find that neither the curriculum nor the faculty's research publications feature much if any analysis of speech and language. In other disciplines — sociology, social psychology, economics, history — you'll find even less of it. (The main systematic exception, Linguistic Anthropology, deserves a separate discussion — but the conclusion of such a discussion, I believe, would note a steep decline in empirical linguistic analysis. And of course I'm leaving out sociolinguistics, which is healthy enough but largely alienated from the rest of the social sciences.)

There are notable exceptions of several kinds, such as Erving Goffman, Manny Schegloff, or Jamie Pennebaker. But such work emphasizes the paradox, since it shows that we can't blame the effect on a lack of intellectual opportunity.

It's not only in the social sciences where linguistic anemia is evident, of course. Over the past generation, the amount of language-related teaching and research in "language departments" (including departments of English) has declined to an unprecedented level. It's common to find highly-ranked English departments where neither undergraduates nor graduate students are trained in any sort of linguistic analysis at all, except perhaps by accident (see this earlier post for a more specific discussion).

But climate change is coming, in my opinion. And in this case, the driving force is not carbon emissions, but digital technology.

To state the obvious: Traditional mass media are now nearly all digital; new media are documenting (and creating) social interactions at extraordinary scale and depth; more and more historical records are available in digital form.  The digital shadow-universe is a more and more complete proxy for the real one. And in the areas that matter to the social sciences, much of the content of this digital universe exists in the form of digital text and speech.

A future social scientist who wants to use this proxy universe to learn about the real one had therefore better know how to analyze the form and meaning of large digital archives of text and speech. And future social scientists who choose not to do this will work under a significant competitive disadvantage. (Numerical data, video recordings, and various kinds of relationship graphs are of course important too, but without analysis of speech and text, their value is lower.)

The required tools include a good deal of computer science and statistics, but you also need to know what to program and what to model.  As a result, the basic concepts and skills of speech and text analysis are an important part of the future social science tool kit.

There's an increasing amount of research along these lines, mostly by computer scientists and computational linguists, along with a few rogue social scientists like Jamie Pennebaker. We've blogged about quite a few examples over the years. But I suspect that most social scientists don't see most of this stuff, because it appears in conference proceedings and journals that they don't read.

All the same, change is sure to come. I predict that over the next 20 years or so, this work will go mainstream. (I know that 20 years in internet time is a millennium or two, but Academia is culturally conservative to a degree that would turn Pashtun village elders green with envy.)

One symptom (and cause) of corpus-based social science going mainstream is that individual pieces of research will increasingly break out into the old media (or go viral in new media). This happened a few days ago to Peter Sheridan Dodds and Christopher M. Danforth, whose paper "Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents" (Journal of Happiness Studies, published online 7/17/2009) was covered in the New York Times (Benedict Carey, "Does a Nation's Mood Lurk in Its Songs and Blogs?", 8/3/2009).

Here's the paper's abstract:

The importance of quantifying the nature and intensity of emotional states at the level of populations is evident: we would like to know how, when, and why individuals feel as they do if we wish, for example, to better construct public policy, build more successful organizations, and, from a scientific perspective, more fully understand economic and social phenomena. Here, by incorporating direct human assessment of words, we quantify happiness levels on a continuous scale for a diverse set of large-scale texts: song titles and lyrics, weblogs, and State of the Union addresses. Our method is transparent, improvable, capable of rapidly processing Web-scale texts, and moves beyond approaches based on coarse categorization. Among a number of observations, we find that the happiness of song lyrics trends downward from the 1960s to the mid 1990s while remaining stable within genres, and that the happiness of blogs has steadily increased from 2005 to 2009, exhibiting a striking rise and fall with blogger age and distance from the Earth’s equator.

Here's the figure showing the secular trend in song-lyric happiness:

Here's the figure showing the recent trend in emotional valence estimated from aspects of blog posts:

And finally, the effects of age, latitude, and day of the week (phase of the moon is not pictured):

Like most work of this type, the linguistic analysis involved is pretty simple — but it's still more than you'll now find in the collected works of the faculty of the communications schools that I've looked at.

And you could raise various questions about their methods and their conclusions, as always in science (though the work seems basically sound to me). But the nice thing about this kind of research is that all of their data is published — their paper gives the URLs that they got it from. (In fact, they doubtless undertook this study in large part because the basic data is easily available.) And they could easily publish their code as well (though the algorithms seem simple and easy to replicate).

So if you have an idea about how to qualify, modify or extend their findings, go to it!

[I'll note in passing that linguistics was left out of the publicity in this case: thus the NYT article quotes Prof. Pennebaker to the effect that “The new approach that these researchers are taking is part of movement that is really exciting, a cross-pollination of computer science, engineering and psychology. [...] And it’s going to change the social sciences; that to me is very clear.”  From Jamie's mouth to God's ear; but let's recognize that this type of work will not reach its full potential unless the researchers involved also understand something about how speech and language work.]



  john riemann soong said,

    August 9, 2009 @ 5:18 pm

    I remember in high school math teachers would be puzzled at how calculus and linear algebra could be applied to language; and for that matter biology and language teachers too. And engineering students would pick fights with me when I mentioned that signals and systems coursework wasn't just useful for electronics, but also for linguistics (and vice versa — linguistics is useful for communications-based engineering, e.g. voice compression.)


    cross-disciplinary vision is so lacking.

  Nassira Nicola said,

    August 9, 2009 @ 6:26 pm

    I'm more than a little curious how you arrive at your conclusion that certain fields pay "[no] analytic attention to language," or that if they do, it's in "steep decline."

    The cocktail-party definition of linguistics I've always operated under is something like "the formal study of the structure and function of language," and it seems to me that it's not so much that fewer people are doing real linguistics as that the definition of real linguistics is becoming progressively narrower. Sure, if you're only counting structure, then all the work on function does tend to fly under the radar a bit.

    [(myl) I chose the Dodds and Danforth paper, not only because it got play in the NYT, but also because the role of linguistic analysis in it is pretty minimal -- they just looked at simple functions of the "emotional valence" of words as estimated by ANEW ("Affective Norms for English Words"). For the purposes of current discussion, I'm happy to count any research that looks empirically at any aspect of what people say or write, or how they say or write it. But the thing is, if you go to the course list and the faculty profiles on the web site of a random program in "communication" or sociology or political science or whatever, I believe that you will not generally find many courses, or much (if any) research that qualifies under that very broad definition.

    I happen (for other reasons) to have done such a search recently with three "communications" programs, two sociology departments, and one political science department. So I can't claim to have made an exhaustive analysis -- but I feel reasonably confident in claiming that there is generally very little speech or language analysis taught in such programs, and very little faculty research of that kind as well.

    I'd be very happy to be shown to be wrong about a program that I haven't looked at -- do you have some in mind?

    (I should add that for the past few years, I've been co-PI on a multi-site grant looking at U.S. Supreme Court oral arguments, with political scientists and psychologists from three other universities. So I'm not prejudiced against social scientists -- I just wish there were more of them who took advantage of the opportunities now available.]

  Nassira Nicola said,

    August 9, 2009 @ 7:51 pm

    What spring immediately to mind, without actually looking around much, are:

    - UCSD's Department of Communications (home of Carol Padden and Tom Humphries)

    - the Departments of Anthropology, Psychology, and Comparative Human Development here at U Chicago (featuring, respectively, Susan Gal, Robin Shoaps, and Michael Silverstein; Susan Goldin-Meadow, Marie Coppola, and the rest of the SG-M lab; and Cécile Vigouroux)

    - the Department of Anthropology at the University of Michigan (numerous fine data-oriented researchers of language, including Judith Irvine)

    Clicking around haphazardly in the Google results you link to yields the following:

    - Northwestern's School of Communication is the academic home of bilingualism researcher Viorica Marian and analyst of political discourse David Zarefsky.

    Revising the search terms to {"department of communications" university} (instead of "school of communications," which is frankly often a synonym for J-school), gives you things like:

    - the University of Maryland, which seems to have a number of people doing data-driven discourse-analysis-type work

    - the University of Iowa, with people like David Depew, Steve Duck, Kristine Fitch, etc. [I stopped clicking there].

    And so forth. Now, these may not all be linguists-writ-large, but it's pretty hard to argue that none of 'em are engaged in "any research that looks empirically at any aspect of what people say or write, or how they say or write it."

    And I'm still baffled by the dig at ling anth. (Though I'm with you on the silliness lack of substantive linguistic analysis from D&D – who are, after all, from a department of mathematics and statistics.)

    [(myl) Boy, I sure failed to connect on this one. My aim was to promote the Dodds and Danforth paper as the sort of thing that we will (and ought to) see more of -- its linguistic analysis is pretty simple, true enough, but simple is better than complex if it does the job.

    As for your list of examples, let's remove the anthropologists (which is a different discussion, as I said) and most of the psychologists (the field of psycholinguistics is enormous, but it counts as a kind of linguistic analysis, and -- well, sociology and political science and history departments are not any more likely to (say) require their students to learn psycholinguistics than to learn computational linguistics or historical linguistics or any other kind of speech and language analysis).

    For what's left, the argument is not that "none of 'em" do any speech-and-language analysis, but that "hardly any of them" do "much if any" of it. And in many cases, the use of linguistic data is limited to the presentation of illustrative quotations, which is not really the sort of analysis that I had in mind. Taking a few of your examples in order:

    Viorica Marian is a psycholinguist whose home is the Department of Communication Sciences and Disorders. This is a clinical (not social science) department which (at NW) happens to be located in the School of Communication. I should have explicitly exempted the clinical programs (audiology, speech pathology, etc.) that sometimes have "communication" in their name -- they're not relevant to this discussion.

    David Zarefsky's work on the rhetorical analysis of "American Public Discourse" certainly does count as someone interested in the use of language. However, in reading a couple of his papers, the only language-based analysis that I see is the presentation of quotes from particular documents in order to illustrate general statements about public-policy arguments. (Though perhaps I haven't read extensively enough.) Let me try to prevent misunderstanding by stressing that I'm not trying to argue against the value of such work, any more than I'm trying to diss social scientists who study voting patterns or crime statistics.

    In the Maryland program, I'm not sure who to look at, but I don't see much of the sort of thing that I'm looking for.

    At Iowa, the POROI ("Project on the Rhetoric of Inquiry") program is an interesting case in point. On one hand, it's focused on "rhetoric", which traditionally deals with how language is deployed to persuade; but paging through the articles in the latest issue of the associated journal, I don't find much if any empirical analysis of speech or language. There are some quotes from particular documents, but again, that's not really what I had in mind.

    A better counter-example might be Donal Carbaugh, whose work was brought up in a comment on an earlier post. Let me observe again, I'm not claiming that no social scientists analyze linguistic behavior, just that not many of them do. ]

  Eleni said,

    August 9, 2009 @ 7:57 pm

    The words in ANEW are categorized on several dimensions of emotionality (more here: The ANEW was developed by Margaret Bradley and Peter Lang, who are psychologists at the University of Florida. They also developed other "standardized" (i.e., subjective ratings collected across many studies) materials for studies of emotion and attention, which are available to other investigators. As emotion theorists and experimentalists, these materials were developed to provide a way of comparing results obtained within and across laboratories, and for furthering our understanding of emotion processes such as subjective experience, physiological responses, and behavior. I am intrigued by the idea of using the ANEW for a population level analysis since it was developed for smaller (i.e., typical psychology sample is less than 100 people) studies. Given the aims of this paper (to understand "happiness") it seems very relevant (to me) to analyze the emotion content of words/text rather than any other aspect.

    Full disclosure: I obtained a masters and doctorate in clinical psychology at University of Florida. My masters research was conducted with Drs. Bradley and Lang (I used the IAPS, which is the comparable picture set with normative ratings). I have no formal training in linguistics– just one of my side interests. Given my bias towards emotion research, I am curious how a linguistic approach would have been more appropriate in this study?

    [(myl) For purposes of the current discussion, I'm assuming that the Dodds and Danforth paper *is* an example of a "linguistic approach", in the sense that it looks at properties of the language that people use (in writing song lyrics, blog posts, and presidential addresses). And I tried to make two points: first, that few social scientists these days do work that is "linguistic" in this sense, or train their students to do such work; and second, that the increasing availability of linguistic data and analytic techniques (such as those used in this paper) means that the social sciences are likely to change in the direction of doing more work of this general kind.]

  elinar said,

    August 10, 2009 @ 3:58 am

    Some people would say that it is the other way round: linguistics doesn’t pay enough attention to the methodology and insights of other social sciences, because it focuses on narrow structuralist and mentalist aspects of language, often ignoring the normative and reflexive character of linguistic communication, and the wider social and cultural world in which communication takes place.

    Expecting social scientists to carry out mainstream linguistic analysis means expecting them to adopt the kind of philosophy of language and methodology that many of them don’t believe in.

    [(myl) Gee, either I didn't write as clearly as I thought I did, or people are reading their prejudices rather than my words.

    (1) This post makes an observation about the role in the social sciences of the analysis of speech and language. I claim that this role is currently small, and that in the future it will be larger, because of the opportunity afforded by large-scale archives of digital communication. It's also true, in my opinion, that the social aspects of language use now play a very small role in linguistic (and psycholinguistic) analysis, and will play a larger role in the future, partly for the same reason; but that's a different discussion. Bringing it up in this context is an instance of the rhetorical trope of "yo mama", which is no more logically relevant or conversationally helpful here than it ever is.

    (2) Let me repeat what I said several times in the body of the post and in the comments: I'm *not* trying to promote the application in the social sciences of "mainstream linguistic analysis", or any other specific flavor of linguistic analysis. The point has to do with any and every kind of analysis whatsoever, as long as it deals with the specific facts of what people say or write, or how they say or write it. ]

    I think the last thing we need is more statistical data throwing light on the elusive nature of happiness, etc.

    [(myl) Your opinion is duly noted; but with respect, why should the rest of us care about it? Can you offer an argument of some sort? Do you believe, for example, that quantitative social-science analysis of public happiness is attempting to answer a badly-posed question? Or that the answers to questions about happiness in the General Social Survey are (and always will be) a better source of information about the public mood than the distribution of words in texts could ever be? The rest of us might or might not agree with you, but at least these would be gestures in the direction of an argument. As it is, all that you've done is to make a face.]

  john riemann soong said,

    August 10, 2009 @ 4:57 am

    Linguistics is a science, with a rigour approaching that of the natural sciences. (Indeed, language processing and articulatory phonetics are arguably part of the natural sciences.)

    We wouldn't want to lower ourselves to the unfalsifiable nature of literary criticism now would we?

    [(myl) Hey, take it outside, OK? There's plenty of non-linguistic research in the social sciences that's as "scientific" as anyone could want; and plenty of linguistic research that's hard to falsify.]

    "often ignoring the normative and reflexive character of linguistic communication, and the wider social and cultural world in which communication takes place."

    What do you mean by the "normative" or "reflexive" character of linguistic communication?

    Why in a science must you "believe" or "disbelieve" in the ideology of mentalism, etc. whatever? This is science, not religion or politics. The school that provides the best answers is the school in vogue; Skinner provided very keen insights for his time, but then his school stopped providing any further useful answers (because of its reluctance to examine underlying mechanisms) and thus was superseded by the cognitive school

    I don't know what you mean by the "normative" or "reflexive" aspects of communication, but there are definitely areas pertinent to linguistics that researchers haven't been able to do too much work on. I believe we still don't have any clear or promising explanation (yet) for the mechanism of regular sound change — we just observe it. Nor do we have any commonly-accepted theories for the mechanism underlying creolisation and the spontaneous generation of languages and corrective rules by children — which you have to admit is pretty amazing.

    These processes of course start requiring the use of sociology to examine, because they concern interactions between individuals. But they also concern /emergent/ interactions, which suddenly makes evolutionary biology and game theory potentially useful. Structuralism gives only incomplete answers — it is capable of analysing the patterns of a language, and even the structural relationships of a language to other languages, but it doesn't give very enlightening answers on say, the evolutionary forces that gave rise to those rules.

    Such forces may sometimes come down to a small group of children (unconsciously) deciding amongst themselves that to use rule X for linguistic need A is cooler than rule Y, creating a bottleneck that suppresses Y and allows X to spread to the rest of their peers. I don't know what exactly you mean by the normative aspects of communication, but if that includes a way for a children to signal their aesthetic preference for a rule or a word, then that's something linguistics seems to have ignored but probably must eventually pay attention to.

    And of course you'd need quantitative methods to model such interactions, which is where the linear algebra, statistics and game theory would come in. You could try to do it entirely verbally …. but note that the field of evolutionary biology exploded after Maynard Smith and Price (Nature, 1973) — because they gave the scientific community a method by which to quantitatively evaluate the theoretical merits of competing explanations.

    [(myl) For the purposes of this discussion, it doesn't matter to me what aspects of speech and language scholars look at, or what modeling techniques they use. The point is a much more basic one, which I repeat again: These days, a very small proportion of social science research is based on an analysis what people say or write, or how they say or write it; in the future, this proportion will be greater, because of the opportunities afforded by the fact that a larger and larger sample of both public and private discourse is available in the form of digital archives.

    As a result, coursework in the various social science disciplines, which now teaches little or nothing about the description and analysis of speech and language, will teach more of such concepts and skills. And I hasten to add that this prediction is completely neutral about what kinds of description and analysis will -- or should -- be taught. For example, the toolkit is quite likely to include things in the general class of ANEW ("Affective Norms for English Words"), or LSA ("Latent Semantic Analysis"), or future replacements for them. ]

  Nassira Nicola said,

    August 10, 2009 @ 10:18 am

    Why in a science must you "believe" or "disbelieve" in the ideology of mentalism, etc. whatever? This is science, not religion or politics. The school that provides the best answers is the school in vogue;

    What counts as the "best answer" and why is itself a matter of ideology. *grin* Even "pure science" is ideological turtles all the way down.

    [(myl) Jeez, I try to make a simple, factual prediction about a trend in social science research, and everybody starts arguing about religion and ideology.]

  Venu said,

    August 10, 2009 @ 11:04 am

    This year's ACM SIGKDD conference also featured a paper that seems to fit this genre:
    J. Leskovec, L. Backstrom, J. Kleinberg. "Meme-tracking and the Dynamics of the News Cycle."

    This research was also featured in the NYTimes recently:

  peter said,

    August 10, 2009 @ 11:14 am

    Mark said:

    "These days, a very small proportion of social science research is based on an analysis what people say or write . . . "

    Mark, your original post exempts Linguistic Anthropology, but I would have thought that almost all social anthropology research is based on an analysis of what people say – both to the anthropologist, and to each other. It is still a standard requirement for PhDs in social anthropology to undertake fieldwork in a host community, and to write a reflective report of that fieldwork, which must perforce include an assessment of communicative interactions experienced and not.

    Or do you perhaps not consider social anthropology to be a social science? I know there are people who object to the research methodology of immersive-fieldwork-plus-reflective-introspection as somehow being "unrigorous", but this technique is used in other disciplines besides social anthropology (eg, in marketing, design engineering, and AI).

    [(myl) Lots of social-science research uses ethnographic techniques, which certainly start with talking. And then there are quantitative studies based on subjective coding of subjects' responses in interview or focus-group transcripts; and more "objective" multiple-choice questionnaires where the wording of questions and answer-alternatives is known to make a big difference.

    In my opinion, it's relatively rare for research of this type to be based on "linguistic data" in the sense that the Dodds and Danforth paper is. But however we classify such research, and however common various sorts of linguistic analysis now are in various corners of the social sciences, all of these sorts of work are likely to be transformed by the forces under discussion. If you can code digital interview or focus-group transcripts automatically in a way that correlates as well with human coding as different human coders do with one another, some kinds of research questions become easier to ask and answer. If you can dispense with running and transcribing the interviews and focus groups altogether, and use data from FaceBook walls, LiveJournal entries, biomedical research papers, or Amazon reviews, the pace of your work increases again. Similarly if you can do your ethnography on a database of a few billion tweets...

    There are manifold methodological problems with research of this type -- but there are methodological problems with surveys, focus groups, and ethnographies. In general, social scientists have traditionally seen such problems as an opportunity (for methodological research) rather than a barrier.

    My general point is that the growth of archives of networked digital communication transforms the economics of social science research, resulting in a new ecological landscape that will modify the behavior of existing species of researchers, and perhaps create new some new species as well.]

  Nassira Nicola said,

    August 10, 2009 @ 11:16 am

    @MYL – I see your point, though I'll note that Susan G-M might faint dead away (or give you the Eyebrow of Did-You-Really-Say-That) if you called her a psycholinguist to her face. *smile*

    So, would it be fair to re-summarize your argument as follows?

    1) Not as much linguistic analysis is happening in departments of social science as could be happening;
    2) "Linguistic analysis" is that which has as its *primary object* linguistic data: either types or tokens of linguistic forms; it does not these forms as illustration or proof of other phenomena (even those which may be language-related, such as language ideology or identity);

    [(myl) Not exactly. I'm ready to count as "linguistic analysis" any work that looks at sentence length, or spoken pitch contours, or pause durations, or any other concrete aspect of speech-or-text-based communication (including of course standard stuff like lexical choice or morphology or syntax or vowel quality). And I take it that any social scientists who use such analyses will want to draw conclusions about some social (and typically non-linguistic) issue -- things like language ideology would certainly count among them, though I think there are much bigger opportunities for work in areas that have no obvious connection to language.]

    3) "Departments of social science" include mathematics and statistics, political science, and sociology; they exclude the departments which do typically deal with language, such as psychology and anthropology, as well as schools and departments of communication(s), unless the latter are basically media-studies departments.

    [(myl) I think that I'm using "social science" in more or less the standard way. I propose to discuss the case of linguistic anthropology separately, though I think that similar forces are at work there. Within psychology, "social psychology" counts as "social science"; most "psycholinguistics" doesn't, in my view and (I think) in the view of most psycholinguists, since it focuses entirely on what happens within individual brains and minds. I'm also excluding many sorts of clinical disciplines, such as clinical psychology, audiology, speech pathology, and so on -- again, I think this is standard usage. Mathematics and statistics and computer science and so on are *not* generally viewed as "social sciences", although people in those disciplines are of course free to work on social-science problems, and sometimes do.

    The "social sciences" relevant to this discussion include not only sociology and political science, but also legal studies, parts of economics and history, some of what goes on in business schools (e.g. Marketing), and most of the non-clinical stuff that goes on under the heading of "Communication".]

    In other words, although language is a social behavior, the use of language as data hasn't spread beyond its traditional fields into more of the places which also purport to be concerned with social behavior. Or, at least, hasn't spread as far and as deeply as it could (and should – because, hey, we're linguists, of course we believe everyone should recognize the importance of language).

    [(myl) I'd go beyond the statement that "language is a social behavior" -- for Homo sapiens, it's the primary medium of social behavior.

    But my point about speech-and-language-as-data is not so much that it's a Good Thing for social scientists to use it -- though that's true, of course -- but that the economics of using it in social science research is undergoing a massive change. Rather than spending hundreds of thousands of hours making transcripts and coding texts -- or millions of dollars hiring students to do it -- you can just load up some data that has been created for other reasons (often just as a by-product of people's normal activities), and run a program to count things in it. (Or to find illustrative anecdotes in it, if that's your preference.)

    Work of this kind can be good, bad, or indifferent, just like any other work; but the existence of this new digital shadow universe has transformed the ecological landscape of the social sciences, and I predict that the inhabitants will (slowly but inevitably) adapt.]

    If that's your argument, then you'll get no argument from me. :c)

  J. W. Brewer said,

    August 10, 2009 @ 4:30 pm

    I would be interested, at an appropriate time, in a separate post fleshing out Prof. Liberman's passing observation on the decline of linguistic analysis in linguistic anthropology, and perhaps (maybe it's an obvious story to insiders but not to outsiders?) how that subfield got separated, if in fact it did, from what non-specialists generally think of as sociolinguistics. Maybe the fact that the undergraduate class I took in sociolinguistics back in 85 or 86 (covering I think a pretty standard range of stuff, hypercorrection, Labov comparing the pronunciation of "fourth floor" in different NYC department stores, isoglosses, Trudgill on the difference between pidgins & creoles etc etc) was offered in the Anthro dep't rather than the Ling dep't was just a quirk of local university politics or some contingent historical event about how a particular professor got hired? Or has there been a nationwide separation over the intervening decades?

    More provocatively (though this again should probably be its own post and thread), is philosophy of language as standardly done these days in Anglophone university philosophy departments a social science? Whether or not it is, could it benefit from more rigorous linguistic analysis?

  Bob Kennedy said,

    August 10, 2009 @ 5:00 pm

    I have some random thoughts that bear on this … from what I know of the field, Comm scholars focus very much on meaning and interpretation; akin to speech perception, but looking for ways to detect whether viewers/interlocutors etc. get the idea (and behave accordingly) as opposed to whether they perceive manipulations in structure at syntactic, prosodic, or phonetic levels. I say this not on the basis of having read much Comm literature as much as having overheard many theoretical discussions. In general they do not focus on linguistic structural analysis, because if they are implementing some sort of experimental intervention, the variables (whether independent or dependent) tend not to be linguistic units. And their data tends not to be corpus-based – it's more likely to be response measures for some stimulus manipulation.

    That said, some Comm research does require some level of linguistic analysis, but I can't say whether there is a trend for such analysis to be well-grounded or not. A hypothetical example, in Language Log land, could be a study measuring how much voters trust a politician based on his or her frequency of first-person pronoun usage. I do believe any Comm journal would require such a study to establish an empirical basis for coding and quantifying first person in addition to trustworthiness. But, if the linguistic variable were noun vs verb usage, or passive vs active structures, I would not be surprised if the empirical measure were less air-tight (because of what we know about how non-linguists construe these dimensions). I think my comment submission here would be more helpful if I could cite real Comm articles, but I just wanted to offer an example of what a Comm study would look like if it did incorporate linguistic analysis.

    As Mark mentions, Pennebaker stands among Comm scholars for his use of corpus research and linguistic analysis. He gave a talk last fall at UCSB, to the Comm department … I was the lone linguist in the audience as it had not been well advertised generally on campus. Some claims in his talk, if I recall correctly, would be that pro-drop languages are more likely to be used by collectivist societies than by non-collectivist societies (I forget the details of the relationship), and that men tend to use more definite and indefinite articles than women. The latter point was quantitatively established, but the reason for it is debatable. Pennebaker suggested it follows from a higher relative usage of concrete vs abstract nouns among men compared to women.

    I'm not here to evaluate those claims, though I suspect most linguists would not immediately agree. Indeed, though the audience was quite receptive to these ideas, Pennebaker mentioned that linguists tend to object to them. I chatted with him later and pointed out that pro-drop systems still encode the person hierarchy, so it's hard to paint any given language as structurally more collectivist. The article point could be supported empirically, but requires searching the same corpora to look for higher frequencies of concrete nouns in male speech vs. female. (which then requires tagging all nouns as one or the other, not a simple task). I should add he was quite open to the feedback.

    Anyway, the sense I got from the experience was that social scientists are not so skeptical of claims about relationships between linguistic dimensions and society-level effects, possibly because the linguistic dimensions are taken for granted without enough scrutiny. More positively, regardless of whether Pennebaker's specific ideas hold up, the idea of computational and corpus-based research methods being applied in other disciplines is hopefully something that will grow.

  elinar said,

    August 11, 2009 @ 3:48 am

    Sorry – if I misunderstood you, it was because, like Nicola, I was puzzled by your claim that hardly any social scientists pay analytic attention to language and speech. I know that Discourse Analysis is an integral part of social science (sociology, politics, education) programmes in most UK universities. (And I’m sure that the same is true of other countries as well.) So I assumed that you didn’t consider this kind of approach to be “proper” linguistic analysis, and, therefore, were talking about something much more mainstream(ish).

    [(myl) I don't believe that this is true here in the U.S. For example, the Penn sociology department mentions nothing of the sort in either its undergraduate or its graduate program. The same is true of political science. ]

    My guess is that most social scientists will carry on doing DA and other related research, taking full advantage of the internet and other new technologies. But whether the new “digital shadow universe” will transform social science research in the way you suggest is another matter.

    [(myl) The world in which "most social scientists" are doing discourse analysis is certainly not the one where I seem to be living.]

    My “no more statistical data please” remark was tongue-in-cheek – I should have put a smiley at the end. But I was also making a serious point about the dangers of scienticism. The question is: what are we going to do with all this data; how is it going influence social policy, and who has the right to decide this? And it is not me (the nutter) against the rest of you (all the rational people): lots of people feel uneasy about scienticism, the medicalisation of life, etc.

    [(myl) I also predict that people will increasingly be doing ethnographies on digital communities, or even on the archives of digital communities, as well as on digital records of "real life" interactions. And if you think that "scientism" includes using computational techniques to look for interesting examples in such materials, or even to look for patterns, then I feel you're being a bit narrow-minded.]

  tyrone slothrop said,

    August 12, 2009 @ 9:37 am

    I am curious to see the evidence for the claim that linguistic anthropology has had a "steep decline" in "empirical linguistic analysis." As a linguistic anthropologist, who decided on a PhD in linguistic anthropology and not in linguistics, because the idle speculations of much of formal linguistics seemed utterly detached from human beings using languages, I find your claim, at best, odd. Of course, I was trained in linguistic anthropology in the mid to late 90s (taking courses in both linguistics and anthropology) and perhaps you are claiming that it is my work and the work of my peers that lack "empirical linguistic analysis?" However, as I flip through the current issues of the Journal of Linguistic Anthropology or Anthropological Linguistics, I see repeated examples of "empirical linguistic analysis" (while I might quibble with the current theoretical direction of some linguistic anthropology, I do not doubt its grounding in empirical research). Likewise, as I sit at the AAA meetings at the various linguistic anthropology sessions, I hear talks based on empirical linguistic analysis. I also see linguistic anthropologists publishing articles in journals like International Journal of American Linguistics, which seem to be based on "empirical linguistic analysis." But then, perhaps, you have evidence that I am not aware of?

    A more interesting question, is how much anthropology or other social science courses do linguistic graduate students take? As an anthropology graduate student, I took linguistics classes. As a faculty today, I encourage my students to take linguistics classes. Very few linguistics students here take courses in social theory (they are all sociolinguistic students). When I was a graduate student, there were no linguistics students in the social theory courses I took in anthropology and the only students that I knew that had taken social theory, were sociolinguistics students. Much of linguistics still seems utterly divorced from social theory.

  Hal Schiffman said,

    August 12, 2009 @ 11:35 am

    I'd like to respond in support of what Mark has claimed. I am a linguist
    who joined the faculty at Penn in 1995; I study the languages of South Asia,and have a strong research interest in the field of "language policy" which is an interdisciplinary topic that is populated by linguists, political scientists, sociologists, demographers, geographers, and educationists.

    After arriving at Penn, I discovered a research center that sponsored weekly discussions on the topic of ethnic conflict, and began attending. After a while, I was invited to make a presentation on language policy, which seemed to me well-received. I was subsequently invited to present during one of the summer institutes sponsored by the center, but I got off on the wrong foot by declaring early on that language policy did not involve quantitative data, because there was little that could be quantified. Instead, I claimed, other kinds of data, some of it hard to analyze, needed to be considered, because language loyalty was often emotional and involved very little rational choice.

    When I made this statement, a number of students present put down their pens and pencils and stared at me in disbelief. They spent the rest of the morning with their arms crossed, staring blankly at me, or rolling their eyes. Somehow I finished my presentation, but I knew that I had bombed.

    After that, I was never asked to present at the summer institutes of this center, and it has since shut down because of funding problems, though a successor has been started up at a local suburban college.

    I continued attending the weekly meetings of this center until its demise, but one could count the number of presentations that even mentioned language on the fingers of one hand. This despite that fact that earlier approaches to the study of ethnicity (e.g. Geertz 1972) counted language as one of the basic "primordial" features of ethnicity. As an example of just how outrageous this became, one researcher who had spent a year consulting with the government of Kurdistan in Iraq made a presentation in which he failed to even mention that the Kurds spoke a different language, and that their grievances over language are well-known in Turkey, which suppresses it. When I tried to make a comment about the language issue, he ignored me and called on other people.

    Just before this center died, I made a presentation on the topic of why language is now systematically ignored in the study of ethnic conflict. One of the reasons I gave was that the quantitative approach finds linguistic data inconvenient, if not just useless. In other words, if you can't quantify it, it's not data, but junk. It also seemed to me that in order to really understand language policy, one must understand the language in question, and study it deeply, or one will just be baffled by what goes on. This, of course, is inconvenient, and would take years of a researcher's time.

    The strategy, then, is to ignore language completely, and grasp at any other social issue that might be studied when there is a situation of ethnic conflict that involves language. Thus, for Chechnya, concentrate on Islam; for SriLanka, focus on religion, and ignore the language issue. For South Africa, cast the issue of Apartheid as totally "racial" and ignore the fact that the Soweto Uprising was over the issue of requiring South African Blacks and Coloureds to receive their education in Afrikaans.

    At the end of my talk I did not get much feedback; nobody attacked me or vociferously denied my allegations. But did much change? When this center moved to the suburbs, a new governing board was set up, and it now publishes a journal. A look at the editorial board reveals that absolutely no-one on it has any concern for language issues. In other words, plus ca change, plus c'est la meme chose.

  Mark Liberman said,

    August 12, 2009 @ 12:38 pm

    Tyrone Slothrop: I am curious to see the evidence for the claim that linguistic anthropology has had a "steep decline" in "empirical linguistic analysis."

    As a very small down payment on this discussion, consider the trajectory of one of the best-known linguistic anthropologists of the past decades, Michael Silverstein. His writings in the first 10 years of his career included many papers that drew general conclusions from a systematic analysis of detailed and specific linguistic description: "Hierarchy of features and ergativity", "Case marking and the nature of language", "Grammatical categories in Australian languages", "The culture of language in Chinookan narrative texts", "Penutian: an assessment". In the past decade, Michael has continued to be productive, and to produce interesting papers, but these are markedly less, well, grounded. Thus the abstract of "Axes of Evals", J. Ling. Anthro. 2008:

    Any discursive event of communication can invoke (index) one or more other events in the nontrivial sense that focal aspects of the ongoing entextualization presuppose that the indexing and indexed lie within some chronotope of "-eval"ness. Varied processes in distinct institutional sites in the macrosociological communicative economy shed light on the contingent varieties of such interdiscursivity. Token-sourced interdiscursivity implies a reconstruction of a specific, historically contingent communicative event as an entextualization/contextualization structure, complete in all its essentials as drawn upon. Type-sourced interdiscursivity implies normativities of form and function, such as rhetorical norms, genres, et cetera. Token-targeted and type-targeted interdiscursivities concern the characteristics of the indexing discursive event(s) as contingent happenings or normativities.

    No specific linguistic example comes up until the 8th page — and then it's a couple of passages from Shakespeare's Taming of the Shrew. (I'm not dissing this paper, just pointing out that it's much more like current literary theory than like traditional linguistic anthropology.)

    If I look over recent issues of the Journal of Linguistic Anthropology, this sort of thing is fairly typical of current work. In terms of what's declined, "precipitously" or otherwise, I skimmed a half a dozen issues without finding (for example) any articles containing an IPA transcription, or a morphologically-analyzed word, or a syntactically-analyzed sentence. I'm not trying to suggest that linguistic anthropology should be all about phonetics or morphology or syntax — but in the olden days (of Boas and Sapir or even the younger Michael Silverstein) you saw quite a bit of such stuff (what I mainly meant by "empirical linguistic analysis") as part of the package. Now, not so much — it's now much more about "token-source interdiscursivity" and the like.

  tyrone slothrop said,

    August 12, 2009 @ 2:52 pm

    Dear Mark,

    In looking at the last half dozen issues of JLA (16(2)-19(1)), I see that Andrew Cowell has a very nice article on the construction of Arapaho imperatives in JLA 17(1) (2007) (with morphological analysis). In JLA 17(2), Patrick Moore and Daniel Tlen discuss Athabaskan directionals, including a long transcript with much detail on morphology. Ellen Basso's article on Kalapalo affinal civility register (JLA 17(2)) has a number of morphologically analyzed Kalapalo forms. Many linguistic anthropologists trained in the United States use the Americanist system for documenting sounds, not the IPA. Most linguistic anthropologists that work with Indigenous groups tend to use a practical orthography (see JLA articles by Cowell 17(1), Meek 17(1), Kroskrity 19(1), Basso 17(2), Moore and Tlen 17(2), Daveluy and Ferguson 19(1)). In looking at the last half dozen issues of Anthropological Linguistics (48(4)-50(2); 49(3-4) was a double issue), I see strong articles by Alan Rumsey, Janis Nuckolls, Paul Kockelman, Sean O'Neill, Patrick Moore, Brian Stross, etc. all with empirical linguistic analysis, from morphology to prosody (Rumsey's piece on Ku Waru Tom Yaya being an exemplary piece of work).

    Again, I might quibble with the theoretical perspectives that have taken hold of much linguistic anthropology, but as far as I can tell, it is still deeply rooted in understanding language as something that people do, and that means empirical research, transcripts and analysis of people actually talking, writing, or signing.

    [(myl) Thanks -- I obviously didn't look carefully enough. But I continue to believe that a longitudinal comparison over the decades would show a decrease in the proportion of such work, and (compared to the olden days) a pretty large one.

    In any case, I'm predicting that the increased availability of digital archives of linguistic interaction will result in future increases in the proportion of linguistic-anthropology studies that are empirically grounded in concrete linguistic patterns -- though perhaps in somewhat new ways. ]

  J. W. Brewer said,

    August 12, 2009 @ 3:51 pm

    Interested readers who like me are outside the academy can get tables of contents and abstracts for recent issues of the JLA via anthrosource. My own take from the most recent issue (19:1) was that about half the articles were on subjects that might be of interest to the sort of people who read LL, but the particular jargon of the descriptions mostly suggested that reading these particular scholars would not necessarily make reading the article a productive way to learn about that interesting subject matter. Your mileage may vary. (I don't always object to professional jargon and Lord knows lots of linguistics scholarship is jargony: but this stuff was written in a particular style of jargon that I personally view as a signal of bogusity — or unproductive "theoretical perspectives".) It was a little hard to figure out from the abstracts how much "analysis" of the sort myl was talking about had gone on, but it wasn't completely clear that it hadn't.

    I don't know if the Ellen Basso Prof. "slothrop" references is related to Keith Basso, who was mentioned favorably by myl in a previous post. I took an Anthro-Dep't for Ling-major-credit seminar with the latter Basso as a undergrad, which I remember favorably but pretty hazily as to substance. I don't recall it particularly cohering with or informing my other linguistics classes though — certainly not to the extent the sociolinguistics class I took in the Anthro Dep't did. And this would have been 23 or 24 years ago, so perhaps before the decline and fall perceived by myl. I do recall that the most theoretical-perspectivy book we read in the class was Tedlock's then-recent The Spoken Word and the Work of Interpretation.

    As to Prof. Schiffman, I really don't understand how anyone can deal with ethnic conflict while ignoring language issues. But he probably should have just taken some random anecdotal stuff and put it into tabular form and claimed it was quantitative. How hard can political scientists be to bluff?

  J. W. Brewer said,

    August 12, 2009 @ 4:00 pm

    Attn future scholars doing descriptive linguistics via internet searches: "bogusity" in the prior comment was a typo for "bogosity," not a different word or potentially interesting variant spelling.

  tyrone slothrop said,

    August 12, 2009 @ 4:31 pm

    Dear J.W. Brewer,

    Keith Basso and Ellen Basso were married. But that was many years ago. Keith Basso has written about the semantics of handling verbs in Western Apache, as well as the poetics of Western Apache placenames. His work, I for one, have always thought of as a model of linguistic anthropology. For the abstracts of articles in Anthropological Linguistics, see the following link:

    Note that linguistic anthropologist like Alan Rumsey, Pamela Bunte, Paul Kroskrity, Paul Kockelman, Janis Nuckolls and others have published in both Anthropological Linguistics and Journal of Linguistic Anthropology.

  JD Boy said,

    August 17, 2009 @ 3:55 am

    Interesting post. However, I agree with previous commenters that your description of the status quo in the social sciences is not quite accurate—and as a result, your prediction of future dynamics is questionable as well.

    Linguistics, especially Saussurian linguistics, had a huge impact on social theory through the structural approach formulated by Claude Lévi-Strauss. Structuralism influenced major thinkers like Jacques Lacan and Louis Althusser, to name just two towering figures. Additionally, this broad trend informed major approaches in sociology, anthropology, and political science. Without the concept of structure, no structural functionalism. Also, the wider “linguistic turn” influenced social thought profoundly. Consider Jürgen Habermas abandoning a marxian account of social struggle for a theory of “communicative action.” Post-structuralism, though critical of certain assumptions of structuralism, maintained that social reality is essentially textual and needs to be read with the tools of textual analysis. In addition to de Saussure, the Bakhtin Circle was a big influence, as was the tradition of hermeneutics (“objective hermeneutics” is the continental counterpart to Conversation Analysis).

    Of course it is wrong to say that “most social scientists” use discourse analysis. Many methods classes in undergraduate programs don't mention discourse analysis and certainly don't teach its practice. But the mainstream is not necessarily what one should go by. The mainstream is occupied with small empirical problems and doesn't do much in terms of shaping the overall paradigm. The larger problematics social scientists work on are dictated by many contingent factors—including historical/political events, funding opportunities, data availability, and new theoretical orientations—which is why I don't agree with your twenty-year prediction. While data availability may speak for your prognosis, I think many other things speak against it, including political developments and theoretical trends. Social theory is presently moving away from the post-structuralist preoccupation with language/discourse and is becoming more interested in what may be called the material (and pre-linguistic) substrate of sociality, such as affect or bodies. The reason is a political trend towards the governance of “life itself,” or what Michel Foucault in his later lectures called biopolitics. So even though social interaction increasingly seems to take place in computer-mediated environments, the critical questions (so many social theorists think) pertain to the bodies involved in the interaction. See EJST 12.1 for some representative contributions along these lines.

  kd said,

    August 27, 2009 @ 5:45 pm

    Excellent article, thanks for bring it to my attention.

    There's an interesting point in the article which I think they under-explore though. They claim that it's only suitable for "large scale text" due to the variety of human expression, but they don't explore this idea very well. So I think that you could assess the emotional content of smaller scale texts, say at the individual level, by measuring the emotional valence of a text relative to itself. That is, the emotional content at the sentence of paragraph level relative to the emotional content of the whole text. Anyway I'll be testing this out, and I wrote a software library to help do this kind of text analysis yesterday: It is a shame that the ANEW's redistribution conditions are rather restrictive – they should make it unrestricted like the International Personality Item Pool (IPIP

