"The United States" as a subject at the Supreme Court

« previous post | next post »

In an earlier post, I observed that the phrase "the United States" — regardless of whether it is treated as singular or plural — seems to have become more likely, over time, to occur in subject position ("The United States as a subject", 10/6/2009).  My (admittedly slim) evidence for this hypothesis came from some searches in newspaper archives, where the process of gathering data is painfully slow, because I was forced to search interactively via a web interface, and to check out the grammatical status of hits by wearing out my eyes on the article images that are returned.

Historians may find this complaint churlish, since they're used to an even more painful process. Traditionally, scholars have needed to travel to the local of a physical archive, and to read every dusty document as a whole in order to find the relevant pages.  (Well, maybe in recent years the process might involve reading dusty microfiche cards in some slightly more convenient location.)  All I have to do is to open a web browser, run a text search to find the relevant articles, and examine the page images that are returned!

But yes, I'm still complaining.

That's because it's easy to speed the process up by several more orders of magnitude. Say that there are a thousand hits a year for each of 100 years, and it takes me a minute to scan each article returned for the characteristics of interest to me (here the grammatical role of the phrase "the united states"). That's 100,000 minutes, or 208 8-hour days, or about a person-year of work.

With full access to the underlying texts, a trivial program can pull out the relevant sentences or paragraphs (which already saves a lot of time).  A slightly less trivial program can categorize most if not all of the instances automatically, with an error rate that's likely to be better than that of human annotators doing the same tiresome task. And then, if I want to ask the same question about other words and phrases (e.g. France, Great Britain, Spain), or a related question about the same phrase (how does the distribution of prepositional uses change — of X, to X, by X, etc.?), this requires only a small change to the program and a little computer time, not another year of tiresome labor.

Unfortunately, I haven't yet managed to get my hands on the underlying text for any 19th-century newspaper archives.  But thanks to Jerry Goldman of oyez.org and Tim Stanley of justia.com, I recently got a nearly-complete archive of U.S. Supreme court opinions (and other related documents) in html form. After a bit of hacking, I wound up with 30,846 dated text files, from 1759 to 2005. (The documents in this collection from before 1789 are of course from other American courts. Those after 2005 are in a different format, which I haven't processed yet.)

The plan is to parse the collection, so that the correlations among grammatical and political histories can be conveniently explored.  Meanwhile, I decided to try a few small explorations where I classify the grammatical role of hits by hand, to evaluate both the plausibility of my grammatico-historical hypothesis  and the quality of my text preparation.  Note that this is already much less tiresome than reading web-archive hits, since I need only look at the relevant bits, which I present to myself in a "keyword in context" array that is relatively easy on the eyes.

So how does the United-States-as-subject hypothesis fare in the SCOTUS texts?

I started by checking out the 26 texts dated 1800, in which "the United States" occurs in 55 sentences (at least as my sentence-division algorithm judged things). One of these instances is in (conjoined) subject position, for a rate of 1.8 per 100 sentences:

4  37: The United States and the French republic are in a qualified state of hostility.

In the 39 texts dated 1810, "the United States" occurs in 144 sentences, of which 5 were in subject position, for a rate of 3.5 subjects per 100 sentences. For example:

10  53: The States of Virginia and Maryland having, in the year 1789, offered to the United States a cession of territory ten miles square for the permanent seat of government, the United States, by the Act of Congress of 16 July, 1790, vol. 1, p. 132, entitled "An act for establishing the temporary and permanent seat of the government of the United States," accepted the same and authorized the President to appoint certain commissioners for the purpose of carrying the act into effect.

In 1850, in 164 texts there were 1495 sentences containing "the United States".  I checked a random sample of 100 of these sentences, and found 7 instances of the phrase in subject position, for example:

49  451: Admitting that anything had occurred as you state, has not the United States received the same amount there from its land as it has elsewhere?

In 1900, in 232 texts there were 2154 sentences containing "the United States", and in a random sample of 100, I found 7 subjects, e.g.

179  494: And as the United States does not complain of the decree in favor of the latter Indians awarding to each 160 acres of land, the only question that remains to be considered arises on the appeal of the Wichita and Affiliated Bands — namely whether the court below erred in not decreeing those Indians to be entitled to the proceeds of the sale of such of the lands in question as may be left after making the allotments in severalty required by the act of Congress.

In 1950, in 102 texts there were 750 sentences containing "the United States", and in a random sample of 100, I found 12 subjects, e.g.

340 54: Standard answered that the United States, as insurer of the tanker, would, in view of the nature of the collision, have to reimburse Standard for any loss it sustained in the suit.

In 2000, in 85 texts there were 658 sentences containing "the United States", and in a random sample of 100, I found 19 subjects, e.g.

529  89: The United States did not participate in these cases until appeal, and resolution of the litigation would benefit from the development of a full record by all interested parties.

So the results from this rather sketchy sample are consistent with the hypothesis:

YEAR Rate per 100
1800
1.8
1810
3.5
1850
7
1900
7
1950
12
2000
19

(The "rates" represent the number of instances of "the United States" as the subject of a tensed clause, divided by the number of sentences in which this phrase occurs, all multiplied by 100. In the years 1850 through 2000, I checked a random sample of 100 such sentences — obviously a different random sample would have a different result. This being a Breakfast Experiment™, accuracy took second place to velocity.)

Of course, even if more complete and careful evidence continues to validate the hypothesis, this leaves open many alternative explanations. Perhaps, over time, the federal government has been doing more and more things that would naturally be described by referring to it in subject position. Perhaps the court has gradually shifted from longer and more specific phrases (e.g. "the government of the United States" or "the Solicitor General of the United States") to plain "the United States". Or perhaps, as I suggested in my earlier post, there's been an increasing tendency, even among careful legal thinkers and writers, to exhibit the grammatical consequences of considering "the United States" to be a quasi-animate agent.

My guess is that all of these explanations are likely to be simultaneously true, to some extent. Luckily, it's easy to imagine ways to test them — if you've got access to the full text archive.  And if everyone has access to the same text archive, then others can check, challenge or extend my results.



11 Comments

  1. Steve said,

    October 20, 2009 @ 8:19 am

    I wonder if metonymics like "Washington" and "The White House" have gone through the same trend, or if the increasing use of "The United States" as a subject is partially as a replacement for one of them. Of course, the results might be different in a news archive than in the Supreme Court archive…if only there were one available that were as nice as this Supreme Court one!

  2. marie-lucie said,

    October 20, 2009 @ 8:58 am

    Why should "the United States" not be a subject, like "the Low Countries" (= "the Netherlands") or "England" or "Italy"? I thought that the point was whether the term was used as singular or plural.

    [(myl) No, that's not the point currently under discussion. Please read the earlier posts, (here, here, here), and (if you want) try again. The rest of your comment raises (in any case obvious) issues that were discussed at greater length there, and in works referenced there. But today's point actually is the question of rates of occurrence in subject position. And the same question could certainly be asked about the names of other countries, and about other sorts of nouns. As for why rates of use in different permitted grammatical positions might be interesting, see here, or search Google Scholar for "animacy hierarchy".]

    It seems that the singular usage is quite old, since your quote is from 1850 (and since formal written usage tends to lag far behind oral usage, the phrase was probably used orally – or in less formal written contexts – much earlier). Also, the singular use is only noticeable if the verb is either "to be", or in the present tense and without a modal auxiliary, otherwise the number of the subject is not indicated beyond the noun phrase, which is itself ambiguous as "the" has only one form. These peculiarities of English mean that "the United States" can have ambiguous number in many or even most cases where it is the subject of the sentence, as in your examples from 1810, 1950 and 2000. In languages such as Spanish and French where the form of the verb, as well as the article, is much more obviously singular or plural, "los Estados Unidos" or "les Etats-Unis" are always unambiguously plural.

  3. Thomas Westgard said,

    October 20, 2009 @ 9:42 am

    Federal jurisdiction has constantly broadened from the founding of the nation to today. So there is an increasing likelihood that "The United States" will be a party to a case, and may have some other interest in a dispute that must be considered. So I posit that the increase is at least partly due to that. You could bluntly test my hypothesis by examining the rate of increase in usage of "The United States" globally versus the rate of increase in which "The United States" is either plaintiff or defendant.

    [(myl) I'll bet that this is true — in the SCOTUS context, it's the main category of what I referred to as "the federal government … doing more and more things that would naturally be described by referring to it in subject position". Given a properly indexed archive, it will be easy to control for this effect.]

  4. marie-lucie said,

    October 20, 2009 @ 11:05 am

    myl, thank you for your response, I am sorry if I seemed to be reinventing the wheel. I must have read the earlier post and comments too desultorily.

  5. Bob Ladd said,

    October 20, 2009 @ 2:40 pm

    Mark is correct that the issue re-raised by Marie-Lucie (whether the United States is grammatically singular or plural) is a separate question from whether the phrase is used in subject position or not. Yet it's plausible that there's an interaction between the two. If, in the early days of the republic, people were grammatically uncertain whether the republic's name was singular or plural, they might have tended to avoid using it in the principal grammatical context where one is obliged to declare a choice, namely subject position. (In the same way, I think many people will avoid using words like brainchild and mongoose in the plural, simply because all the possible plural forms sound weird.) Once the grammatical status of the United States was settled as singular, then the other factors that Mark and Thomas Westgard mention could exert their effect on usage unimpeded by grammatical awkwardness.

    [(myl) This is a cute idea. One apparent problem with it: it predicts a U-shaped time-function of subject-position use, which apparently didn't happen. In the period from 1783 to 1800 or so, when "the United States" was routinely and unproblematically plural, it apparently occurred even less often in subject position than it did some decades later, when its number was more variable. This could represent the superposition of a U-shaped "awkwardness" factor with sharply rising "agency" factor; and with enough data, maybe you could prove that; but it'll be tough.]

  6. Mark N. said,

    October 20, 2009 @ 6:06 pm

    Is this "nearly-complete archive of U.S. Supreme court opinions (and other related documents) in html form" available, or going to be available, anywhere publicly for other researchers to use? Since the opinions themselves are in the public domain, I assume at least a plaintext version could be posted?

    [(myl) I hope so — I'll ask the people that I got it from. The html files that I got are pretty much the ones that are publically accessible at justia.com, and you can read them all online there. But it would be great to have a clean plain-text version with good metadata. in an easily machine-indexable formm that the community of interested researchers could share.]

  7. J.W. Brewer said,

    October 21, 2009 @ 8:12 pm

    Following up on Thomas Westgard's point, the mix of types of cases in the Supreme Court's docket has changed considerably over time, sometimes in ways that broadly track larger political/historical trends, sometimes in ways driven by much more technical concerns one would not expect anyone outside the law business to know anything about (e.g. the overruling of the Swift v Tyson doctrine in the late 1930's meant de facto that a quite numerous class of cases would continue to litigated in the lower federal courts but virtually never be reviewed in the Supreme Court). So to the extent you might expect different types of language in different types of cases, some apparent usage trends might need to be disentangled from the effect of those changes in the docket. Unfortunately, I expect that would be a complicated process since it's not like there was one simple linear trend. E.g. not only has the expansion of federal law increased the types of cases in which the federal government is a party, it has also increased the types of cases in which a seemingly purely private dispute between A and B will be adjudicated in federal court under federal law and it's hard to know without actual empirical research which trend would outweigh the other.

    On the broader question addressed in previous posts, perhaps the 19th century transition needs to be undergone anew with each rising generation in an ontogeny-recapitulates-phylogeny way, at least based on the evidence of my (born-in-the-21st-century) third-grader's geography homework of this evening, in which she asserted that "The United States have more land than Mexico."

  8. John Bowlan said,

    October 23, 2009 @ 12:53 am

    I played around with this corpus a while ago – One curious thing I found is that the term frequency of 'the' went from about 10% in 1800 to 7% in 2000. I am pretty ignorant of linguistics so what could explain this transition?

    http://topicmodels.wordpress.com/2009/03/02/term-frequencies-in-the-supreme-court-corpus/

    I find this type of data to be very interesting – there is some evidence that legal language is especially challenging for ML techniques.

    [(myl) Interesting. 10% is unexpectedly high, I think. One place to look for a clue would be to track the relative frequency of the n-grams (bigrams, trigrams, etc.) involving the. That would help clarify whether this might be due to certain fixed phrases (e.g. "the court") being used in changing ways.]

  9. Graeme said,

    October 23, 2009 @ 8:41 am

    Perhaps it's a factor of habit? The more we hear an abstraction like a nation-state being referred to as an animated subject, the more it feels natural. Nationalism, as Benedict Anderson argued in Imagined Communities, nations are built on imagination as much as bricks.

    Conversely, as nation states grow bigger in their apparatus, it is not so much that they are doing more (everyone/thing is doing more – sport, commerce, entertainment). But that they become more abstracted from us, and hence easier to imagine them as a corporate Golem.

  10. Demian said,

    October 23, 2009 @ 1:05 pm

    Speaking about "The United States", I'm really curious about the use of the words "america" or "american" referring to U.S.A. and its natives only.

    At least in Latin-America and some other countries of the world, "America" refers to the whole continent, and Americans to its inhabitants, not only U.S.A. and its people.

    After all, the continent, was named after Americo Vespucio or Amerigo di Vespucci, an integrant of Cristopher Columbus' crew. Moreover, they arrived to the islands of the Caribbean sea and not to the region actually known as Unites States of America.

    There are severals misuses of adjectives to relate nouns to the U.S.A.. For instance in Argentina the current use of the adjective "northamerican" usually refers to U.S.A. only. Maybe this is due to the difficulty to derive a single-word adjective from the noun "United States of America".

    I'd find interesting if you could write a bit about the origins and reasons of this particular use of the word "america".

    Thanks and congratulations on your blog.

  11. Christy said,

    October 23, 2009 @ 4:25 pm

    I remember a story on NPR many years ago where the narrator was looking forward to the day when he could do online searches of archives. However, he said that the historian working next to him was smelling envelopes. He had been researching the Flu Pandemic. His colleague told him that they used to believe that vinegar would kill the flu germs so they poured it on the letters. They poured so much on some letters that very little of those letters were left, disintegrating from the acid content. Even after 80 or so years, the smell was still noticeable on the letters with otherwise pristine appearance. So, while something is definitely gained from online archives, not every piece of research involving old documents can be completed from a distance.

RSS feed for comments on this post