Less body in your lexicon?

« previous post | next post »

Answering a reader's question about somebody vs. someone, Arnold Zwicky speculated yesterday that "you'd find all sorts of interesting variation according to the location / age / sex / class etc. of the speaker, genre, formality of the context, date when the corpora were collected, and so on".  In the comments, Jerry Friedman suggested that "the -one words all sound more formal to me than the -body words", and he provided some evidence in the form of the ratio of Google Books counts for the words themselves and for the words combined with albeit.

This is a great topic for a Breakfast Experiment™, and despite several overdue work-related commitments, I couldn't resist.

Fairly direct evidence is available from Mark Davies' COCA corpus, which shows (for example) that the ratio of somebody and everybody to someone and everyone is much greater in spoken transcripts than in academic writing:

somebody
(per MW)
someone
(per MW)
ratio everybody
(per MW)
everyone
(per MW)
ratio
Spoken 280.22 292.02 0.96 335.79 15.25 22.0
Academic 10.03 82.43 0.12 199.99 74.66 2.68

The same thing is true in the British National Corpus:

somebody
(per MW)
someone
(per MW)
ratio everybody
(per MW)
everyone
(per MW)
ratio
Spoken 424.34 188.18 2.25 276.91 122.55 2.26
Academic 11.41 87.27 0.13 12.26 43.18 0.28

There also appears to be a secular trend in favor of the __one forms. We can see this in apparent time in the the counts from the LDC's collection of transcribed telephone conversations:

Thus younger people have lower __body/__one ratios than older ones. The data in detail for somebody/someone:

somebody someone somebody/someone ratio
Age 20-39 2,053 2,309 0.89
Age 40-59 10,732 7,108 1.51
Age 60-69 1,782 818 2.18

And for everybody/everyone:

everybody everyone everybody/everyone ratio
Age 20-39 2,050 1,591 1.29
Age 40-59 9,027 3,099 2.91
Age 60-69 1,312 253 5.19

As usual, it's not clear whether this represents a linguistic change in progress, or a stable fact of individual life-cycle development. But in this case, we have some real-time evidence: the Time Magazine corpus at BYU suggests that it's a culture-wide lexicographical drift, with the relative frequency of somebody and everybody decreasing, relative to someone and everyone, roughly since the end of WW II:

The data from the BYU COCA corpus is also generally consistent with a trend away from the __body, details aside:

Thus in this case, contrary to the usual "kids today" dynamic, it seems that the language as a whole is moving in the direction of more formal registers.

Returning to the LDC conversational transcripts, I note that the tendency to generalize (at least using the words in this set) decreases with age: in the younger group, the ratio of (somebody+someone)/(everybody+everyone) is about 1.20; in the middle-aged group, it's 1.47; in the older group, it's 1.66. But in this case, the real-time trend in the Time Magazine corpus goes in the opposite direction — the relative frequency of the some__ words has been increasing:

On the basis of a quick scan in the LDC conversational transcripts, I didn't find very large effects of sex, educational level, or region: for example, the overall somebody/something ratio for males was 1.44, and for females 1.46. But the various demographic features are by no means orthogonally sampled, so it would be better to do a large multi-level regression in order to check on these effects.

Summing up:

In both the U.S. and the U.K., someone and everyone are more formal than somebody and everybody. Despite this, in the U.S. at least, someone and everyone are gaining overall market share relative to somebody and everybody. This is indicated by both apparent-time evidence in conversation (older people in the LDC conversational transcripts have relatively higher rates of __body usage) and real-time evidence in text (__body/__one ratios decline with publication date in both the Time Magazine corpus (with the trend visible from 1950 onwards) and (for a shallower time depth) in the COCA corpus.



6 Comments

  1. Jorge said,

    November 11, 2009 @ 11:03 am

    Thus in this case, contrary to the usual "kids today" dynamic, it seems that the language as a whole is moving in the direction of more formal registers.

    Is the language moving towards more formal registers, or are the more formal forms losing their formality? (Or is that the same thing?) This is the same direction that "you" went, isn't it? Also the same direction "ustedes" went/is going in Spanish.

    My assumption would be that it is easier for formal forms to lose formality, than for informal forms to gain formality. Could this be a language universal?

    [(myl) I'm not sure of the answers to these questions, overall. In syntactic and phonological change, I think that things generally happen first in less formal registers, and move from there into the formal written language. And some lexical change also goes the same way — think of the history of a word like "mob", for example. ]

  2. Jerry Friedman said,

    November 11, 2009 @ 11:30 am

    Aha!

    I would never have guessed that "someone" was increasing compared to "somebody". How long till this becomes somebody's prescription?

  3. Jerry Friedman said,

    November 11, 2009 @ 11:39 am

    I should add that in the Linguist List post that Stan Carey cited, Jane A. Edwards found the same association with formality. She also found some association of everyone with his and everybody with they, as Ellen Prince had suggested.

  4. Aviatrix said,

    November 11, 2009 @ 5:32 pm

    This isn't a language I'm fluent in, so I may not have it exactly correct, but "sm1 is ezr 2 txt."

  5. david said,

    November 14, 2009 @ 11:17 am

    Aviatrix,
    sm1 is rly ez but so is sb.

  6. Li(n)keable 23 November - Lexiophiles said,

    November 23, 2009 @ 1:04 pm

    […] Less body in your lexicon? […]

RSS feed for comments on this post