Language Log

Corpus linguistics in a legal opinion

July 20, 2011 @ 5:49 am · Filed by Mark Liberman under Language and the law

Gordon Smith, "A Landmark Opinion: Corpus Linguistics in the Courts", The Conglomerate 7/19/2011:

Last month I blogged about the "best student comment ever," the first law review article to rely on corpus linguistics as the basis for analysis. As I have worked with corpus linguistics (through the comment's author, Stephen Mouritsen) over the past few months, I have come to conclude that it will revolutionize the study of law, at least insofar as we are attempting to understand word usages.

Today, my former colleage and current Utah Supreme Court Justice Tom Lee used corpus linguistics in a lengthy concurring opinion (the relevant section starts at page 34). In this opinion, Justice Lee is interpreting the word "custody," and he brings corpus linguistics to the fight. […] Justice Lee's collegues are not enamored with the approach, but you can read the opinions for yourself and see who gets the better of the argument.

This seems to be the first judicial opinion anywhere using corpus linguistics, but it will surely not be the last.

Is it true that there have been no previous law review articles using corpus linguistics? Amazing, if true.

Corpus-based surveys of usage have certainly figured in legal arguments before — I discussed Neal Goldfarb's use of corpus evidence in an amicus brief here, and Ben Zimmer cited the apparent effect of Neal's work in "The Corpus in the Court: 'Like Lexis on Steroids'", The Atlantic 3/4/2011. But it might be true that corpus evidence has never been directly cited in a judicial opinion. Again, a striking notion.

Whether or not these events are really without precedent, it's surely true that this is an idea whose future is much larger than its past.

Gordon Smith offers some reading suggestions:

If you are as intrigued by corpus linguistics as I am, you might be interested in this paper by Mark Davies, a BYU Professor of Corpus Linguistics who is a leader in this field, on how one might use the Corpus of Contemporary American English. I am told that a similar paper on the Corpus of Historical American English is forthcoming.

I've been curious for some time about why lawyers, who spend a good deal of their time arguing about the interpretation of words, phrases, and sentences, are not in general expected to learn anything about how to do this. It's as if medical schools had failed to notice that it would be useful for their graduates to know something about anatomy and physiology. So I see the opportunity for legal application of corpus linguistics as an instance of the opportunity for the legal application of linguistics more generally.

The legal profession has mostly managed to avoid linguistics for the past century — will corpus analysis be the thin edge of a wedge of change?

July 20, 2011 @ 5:49 am · Filed by Mark Liberman under Language and the law

Permalink

9 Comments

Moshe Koppel said,

July 20, 2011 @ 7:10 am

How did you resist naming this post "Habeas Corpus"? You are a better man than I.
Neal Goldfarb said,

July 20, 2011 @ 9:53 am

The legal profession has mostly managed to avoid linguistics for the past century — will corpus analysis be the thin edge of a wedge of change?

Possibly, but the more important question, I think, is whether lawyers are going to start drawing on linguistics in their arguments. The extent to which the analysis in court opinions is adapted from the briefs isn't generally appreciated, and the importance of the briefing is magnified here, because this isn't an area where most judges will feel comfortable on their own.

But lawyers (lawyers other than me, anyway) aren't going to start drawing on linguistics unless they think that doing so will help them win cases. Judge Lee's concurring opinion will be an important step in the process of bringing lawyers around to that conclusion.

Is it true that there have been no previous law review articles using corpus linguistics? Amazing, if true. […] [I]t might be true that corpus evidence has never been directly cited in a judicial opinion. Again, a striking notion.

At least with respect to the opinions available on Lexis, this is the first time that corpus linguistics has been mentioned in a legal opinion, much less relied on. And Stephen Mouritsen's article does indeed seem to be the first law review article to use corpus linguistics.

This isn't surprising. Not only has the legal profession been inattentive to linguistics, but before COCA there was no large corpus of American English that was publicly available. So hats off to Mark Davies.
Neal Goldfarb said,

July 20, 2011 @ 4:28 pm

D'oh!

I need to correct my statement that Stephen Mouritsen's piece is the first law-review article to rely on corpus linguistics. Charles Fillmore got there first.

In 1995, Fillmore co-authored (with Clark Cunningham) an article published in the Washington University Law Quarterly, entitled "Using Common Sense: A Linguistic Perspective on Judicial Interpretations of 'Use a Firearm'" (73 Wash. U. L.Q. 1159). The article dealt with the statute that was at issue in Smith v. United States (a case that was discussed on Language Log Classic here), and the article discussed the specific issue decided in Smith: whether a statute that imposed an enhanced penalty on anyone who "used a firearm" in connection with a drug crime applied in a case where the defendant traded a gun for drugs. In investigating how the word use is, um, used, Fillmore looked at data from the British National Corpus. (Unfortunately the article is not available on the web except through sources such as Lexis that you have to pay for.)

I'm ashamed of myself for forgetting about this article when I wrote my earlier post. Not only am I very familiar with the article, but I think that it's important and that it has been unjustly neglected in the literature. It's also an article that has inspired me in my efforts to use linguistics in doing law. So as long as I'm taking my hat off to Mark Davies, let me give Chuck his props, too.
Watch this space | LAWnLinguistics said,

July 20, 2011 @ 5:17 pm

[…] seem to be getting higher-than-usual traffic today, which I assume is do the mention in Language Log today of my brief in FCC v. AT&T. So this seems like a good time to say that […]
Gordon Smith said,

July 20, 2011 @ 7:32 pm

Hi Mark,

Thanks for reposting. Neal notes the Fillmore and Cunningham piece, so I just posted an update on my blog to give them credit, but it may be relevant to your readers that Fillmore and Cunningham were using their corpora in a different way than Mouritsen.

As noted in my update, Mouritsen’s comment differs from the Fillmore and Cunningham article both in its method and its claim. Fillmore and Cunningham use corpus linguistics to examine the word "use" in an attempt to understand what it might mean to "use a firearm." They use the British National Corpus to examine the range of possible meanings of that statutory term in much the same way that a lexicographer might rely on a citation file to find usage examples.

Rather than explore the range of possible uses of a statutory term, Mouritsen relies exclusively on corpus-based data to attempt to demonstrate the “ordinary meaning” of a statutory term in a particular context. His article is the first to do this. Thanks to Neal for raising the issue, causing me to make a more precise statement about the contribution of the Mouritsen piece.
Chad Nilep said,

July 21, 2011 @ 2:52 am

A footnote in the concurring opinion (fn. 21) notes, "A similar approach to statutory meaning—based on common usage as indicated by an electronic database—was employed by the United States Supreme Court in Muscarello v. United States, 524 U.S. 125, 129 (1997), and in FCC v. AT&T, No. 09-1279 (U.S. Mar. 1, 2011)."

I assume that some difference in the method of analysis disqualifies either of these cases as a "legal opinion using corpus linguistics", right? Or is it a difference of data ("an electronic database" versus a corpus created for linguistic analysis)?

[(myl) FCC v. AT&T is the case that Neal Goldfarb's amicus brief figured in — discussed here. Neal discusses the Muscarello v. United States case here.

Neal and Gordon will have better-informed opinions about this, but it appears to me that the reliance on corpus evidence in the FCC .v AT&T opinion is implicit, and in Muscarello v. United States, the cited usage examples are a helpful but unsystematic selection from the Bible, Moby Dick, the New York Times, etc.

From a certain point of view, this is a difference of degree, not of kind. From another perspective, it's an important difference: selected examples are useful in telling us how an expressing might be used, but don't provide a strong argument about central and typical uses. For that, we'd want to be able to say something like "the large and representative and publicly-checkable collection X includes N relevant instances of the phrase in question, and M of these have property P, so that we can conclude etc.".]
Rubrick said,

July 21, 2011 @ 3:49 am

Is the title of the post a play on words I'm not getting or merely a typo? I wouldn't have expected the latter to last this long.

[(myl) Typo. Always a good bet with me, alas.]
Neal Goldfarb said,

July 21, 2011 @ 7:54 am

…it appears to me that the reliance on corpus evidence in the FCC .v AT&T opinion is implicit, and in Muscarello v. United States, the cited usage examples are a helpful but unsystematic selection from the Bible, Moby Dick, the New York Times, etc.

I agree.
Stephen said,

July 21, 2011 @ 4:21 pm

On reflection, I am not sure that it is correct to say that the Cunningham/Fillmore article is the first law review article to rely on corpus linguistic methods. It is true that the article lists in its second appendix a number of sample uses of the phrase “use a firearm” from the British National Corpus. But Fillmore characterized the articles’ use of these samples as follows:

“For ordinary purposes of linguistic analysis, the native speaker’s acceptability judgments and interpretative abilities are central, and so the examples would be equally relevant if they had been the product of Fillmore’s linguistic introspection. Nevertheless we provide the data Fillmore used in order to illustrate one technique of linguistic analysis: searching data bases of naturally occurring texts to support and augment the linguist’s capacity to imagine and generate data for analysis and examples for illustrating particular types of linguistic phenomena." See Clark D. Cunningham & Charles J. Fillmore, Using Common Sense: A Linguistic Perspective on Judicial Interpretations of “Use a Firearm,” 73 Wash. U. L. Q. 1159, 1207 (1995).

Undoubtedly using a corpus to generate examples of usage is one important function of a corpus, but note that Fillmore himself claims that the corpus was not necessary to his analysis, that is, the samples would have been “equally relevant if they had been the product of Fillmore’s linguistic introspection.” Interestingly, as near as I can tell from my review of the article, the corpus data is not employed to perform the one task that Fillmore has elsewhere suggested is one of the key contributions of corpus linguistics—to teach us facts about language that we couldn’t find out about in any other way.

In one broad sense, the Cunningham/Fillmore article uses a brand of corpus linguistics in that they rely on usage samples from a collection of texts, for example, the US penal code. But their use of what we would think of as a modern linguistic corpus (in their case the BNC) is quite limited. This is not a fatal criticism (or even a criticism, really) of the article or its contribution. Some day when “Law & Linguistics” is taught as a regular law school course, the Cunningham articles, together with Professor Solan’s “The Language of Judges” will be foundational texts. I don’t think the Cunningham/Fillmore article is properly classified as a corpus linguistics article, but that doesn’t mean it isn’t remarkable.

The approach I articulate in my article is altogether different, and entirely dependent upon the corpus method. Cunningham and Fillmore use the corpus to perform a task that could have just as easily been performed by linguistic intuition. My approach to ask a question that introspection can’t answer, that is, as between two different senses of a given term in a given context, which is statistically the most frequent?

Judges have, for centuries, attempted to resolve cases based on the “ordinary meaning” of the words in the statute, but in determining what is “ordinary” they have routinely relied on their intuitions. This is why it is not uncommon to find a majority and dissenting opinion both arguing that their favored interpretation of a word or phrase in a statute is the “ordinary meaning.” I agree, perhaps obviously, with Professor Smith that corpus linguistics will revolutionize the way that judges and legal scholars think about these problems.

Since Neal has posted here, I wanted to add that if corpus linguistics is the “wedge” that gets nuanced discussion of linguistic issues into judicial opinions and law review articles, we will all have Neal to thank for it. THE corpus of American jurisprudence is (for better or worse) the decisions of the United States Supreme Court. With the success and influence of his brief in FCC v. AT&T, Neal has given the corpus linguist a vital calling card. We can now say corpus linguistics has an important and persuasive contribution to make to the analysis of difficult legal questions and we have a Supreme Court decision to prove it. The law-and-linguistics movement owes a great debt to Neal for his contribution.

RSS feed for comments on this post

Corpus linguistics in a legal opinion

9 Comments

Moshe Koppel said,

Neal Goldfarb said,

Neal Goldfarb said,

Watch this space | LAWnLinguistics said,

Gordon Smith said,

Chad Nilep said,

Rubrick said,

Neal Goldfarb said,

Stephen said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta