The sneakiness of self-consciousness

As my friends and acquaintances know, I'm a rather unreliable correspondent. I write a lot of messages, and I make a lot of phone calls, but the list of messages and calls that I ought to make always grows larger.  In fact, there seems to be a sort of positive feedback principle at work, whereby every time I discharge a communicative obligation, that very action somehow pushes several new tasks onto the stack. A similar problem afflicts my To Blog list, which reliably expands in direct proportion to my attempts to reduce it. No doubt I'm Doing It Wrong.

A few days ago, Michael Ramscar sent me a fascinating series of email messages, in which he wove together several recent LL themes:  coffee cup sizes, difficulties with multiple negation, word order typology, and the Sapir-Whorf hypothesis. My contribution was limited to various forms of "you don't say!" and "tell me more", so I proposed, with his permission, to edit his emails together into a guest post.

But this morning, when I searched my email archive for the messages in question, I discovered a much earlier note from Michael that's almost equally interesting. So this one comes first. I'll get back before long to his theory about coffee drinks, modifier order, grammatical gender, and the cognitive processing of negation — really I will!

The context here is "Snuckward Ho!", 11/29/2009, "Snuck-gate", 6/18/2010, and "Graphically snuckward", 6/19/2010. Michael wrote:

Something you overlooked  — people are much more likely to say "snuck" and write "sneaked". A reflection of the influence of style manuals, perhaps?.

I was prompted to look at this by your comment

It's not clear whether this is a linguistic change (that is, a change the words that people choose to express a certain concept) or a cultural change (that is, a change in the concepts that people choose to write about).

since I doubt that most of the "choosing" of words in speech is as self-conscious as it is in writing.

Interestingly, the proportion of sneak : sneaked / snuck is consistent across the corpus (except in fiction, which is its own fictive thing), supporting the idea that some of this may be down to people choosing forms of words to express ideas — see Davidson's "A nice derangement of epitaphs" for the problems this raises for theories of comprehension.

Michael attached a graph based on the following data, from the COCA corpus, expressed as frequencies per million words:

sneak 6.93 8.64 6.49 0.99
sneaks 0.48 1.29 0.77 0.28
sneaked 0.64 1.63 1.77 0.30
snuck 1.87 1.27 0.68 0.16
SUM 9.92 12.83 9.71 1.73

The overall frequency of these forms of the lexeme sneak varies by a factor of about 7.5, from 1.73 per million in academic prose to 12.83 per million in COCA's magazine collection.

And the percentage of choosing snuck, given the choice between snuck and sneaked, varies from 35% in academic prose to 75% in COCA's spoken transcripts — that's the ratio of row 4 to the sum of rows 3 and 4.

But if we divide the sum of row 3 and row 4 by the sum of all the rows, we get a proportion of "sneaked" or "snuck" forms that is remarkably consistent across genres; and, of course, similarly for the complementary sum of rows 1 and 2:

snuck+sneaked 25.3% 22.6% 25.2% 26.6%
sneak+sneaks 74.7% 77.4% 74.8% 73.4%

In other words, the choice among abstract inflectional categories is much more consistent than either the choice among lexemes or the choice among inflectional variants.

As this morning's Breakfast Experiment™, I thought I'd check this in the LDC's conversational speech collection. The part of this collection that is indexed on line comprises 26,151,602 words of transcript. Adding this source to the previous tables, we get:

sneak 6.93 7.15 8.64 6.49 0.99
sneaks 0.48 0.73 1.29 0.77 0.28
sneaked 0.64 0.19 1.63 1.77 0.30
snuck 1.87 1.98 1.27 0.68 0.16
SUM 9.92 10.06 12.83 9.71 1.73

The snuck/(snuck+sneaked) percentage in this additional source is the highest of all, at 91%, compared to 75% in the COCA spoken category — as we expect, since the COCA "spoken" material is pretty formal in comparison. However, the snuck+sneaked and sneak+sneaks percentages remain rather consistent:

snuck+sneaked 25.3% 21.7% 22.6% 25.2% 26.6%
sneak+sneaks 74.7% 78.3% 77.4% 74.8% 73.4%

Michael's reference for the problem of "people choosing forms of words to express ideas" is to Donald Davidson's "A nice derangement of epitaphs", in Ernest Lepore, Ed., Truth and Interpretation, 1986. This paper is reprinted in Sharyn Clough, Ed., Siblings under the skin: Feminism, social justice, and analytic philosophy. Wherever you can find it, it's worth reading:

And now that I re-read it, I can see three or four topics worth taking up in future posts…


  1. Paul Kay said,

    February 2, 2011 @ 2:33 pm

    "In other words, the choice among abstract inflectional categories is much more consistent than either the choice among lexemes or the choice among inflectional variants." What are the lexemes being chosen among?

    [(myl) What I had in mind was a choice among (forms of) sneak vs. (say) creep, lurk, evade, enter discreetly, etc., as ways of talking about stealthy movement — or maybe the alternative choice to completely avoid the whole concept.

    And the whole quoted sentence was just summarizing the simple observation that the overall frequency of forms of sneak is very different in the different genres; and the proportions of "snuck" vs. "sneaked" is also very different; but the proportion of snuck+sneaked vs. sneak+sneaks is surprisingly constant.]

  2. michael ramscar said,

    February 2, 2011 @ 6:04 pm

    my memory is even more unreliable than yours, because i'd almost completely forgotten this! and on reading your post, i couldn't for the life of me recall why i thought it related to Davidson at all.

    but on second thoughts, i think i had this in mind:

    the consistency with which people tend to say "snuck" yet write "sneaked" begs a question: is it plausible to attribute all those instances of "sneaked" to people self-consciously correcting the colloquial spoken form as they write? (which was my first thought when i looked at the data.)

    another alternative — which is very much in the spirit of Davidson's "A nice derangement of epitaphs" — is that this might be a case where the "theories" people use in reading/writing and speaking/listening diverge slightly, so that rather than people unconsciously using "snuck" when speaking and deliberately choosing "sneaked" when writing, it may be that they simply tend to use "snuck" in context when speaking and "sneaked" in context when writing for the same reason: because this is what competent speakers and writers of English do.

    the Davidsonian question about all this would then be whether it makes more sense to think of people having a "theory" that underpins their use of language (which leaves one with the task of explaining why this "theory" seems to work differently when people speak or write), or whether it would make for less theoretical confusion to take this as evidence that people use (subtly) different "theories" when speaking and writing. (Uriel Cohen Priva at Stanford has some really nice results showing that people do the kinds of things that are often attributed to listener modeling even when they are copy typing, which i think are particularly fun to think about in this regard.)

    i think Davidson's overall attitude to whether we might get sensible answers to these questions is gloomier than it might be — probably because the kinds of things one can do with corpuses to get at (and model) "usage" were simply unimaginable when he was writing. (a thought that makes me appreciate the work of the people who curate and make available these amazing resources all the more.)

  The world is your lobster « Michael Ramscar said,

    February 2, 2011 @ 7:09 pm

    […] morning, Mark Lieberman over at Language Log wrote a fascinating post about how people use the word "snuck" in conversation, but […]

  4. Adrian Morgan said,

    February 3, 2011 @ 6:51 am

    I have a feeling, based entirely on introspection, that there is a sound-symbolic aspect to the "sneaked" vs "snuck" question.

    The short, sharp sound of the word "snuck" seems to better capture that instant when one notices that someone has sneaked/snuck away. Thus, "He must have sneaked out while I wasn't looking" seems to me less well-expressed than "He must have snuck out while I wasn't looking".

    The phonetically longer word "sneaked" might be argued as more expressive when the emphasis is on the time and care needed to sneak away. By this argument, "I carefully sneaked past the guard" is better than "I carefully snuck past the guard".

    As mentioned, this is entirely introspective, but it seems reasonable to hypothesise that sound-symbolic considerations play a part in some people's choice between "sneaked" and "snuck".

    As an idea on how to test this, one might expect that – among people with apparently free variation between the two forms – there'd be a slight tendency to use "sneaked" with a first person subject and "snuck" with a second or third person subject, simply because there's on average more cause to emphasis care when describing events from the sneak-er's perspective and more cause to emphasise the instantaniety of realisation when describing them from a someone else's.

    Or not.

  5. Ellen K. said,

    February 3, 2011 @ 10:35 am

    Dirk's question has me wondering too. I get that snuck is spelt like words it rhymes with, thus why a C in that particular word. But why a c at all? Why sneak, peak, beak, and bake, cake, rake, but snuck, pluck. duck, and back, pack, rack?

  6. Language And said,

    February 3, 2011 @ 12:02 pm

    @ Ellen K.

    Orthography is a cruel mistress. Alternate (more pronunciation-appropriate) spellings include sneke, sneek, and sneake. The OED seems to indicate these spellings were prominent throughout the great vowel shift, which explains the "graphy" that is so "non-ortho."

    As for "snuck": the whole word is a late-19th-century, American invention according to the OED, "1887. Lantern (New Orleans) 17 Dec. 3/3 'He grubbed ten dollars from de bums an den snuck home.'" The "c" specifically, comes from like influence of other words ending in /-ək/: duck, pluck, stuck, muck, suck, f… uh, etc. As for my guess on why the "c" sticks at all in those (other than historical precedent and Moderns thinking orthography was a desirable goal), English tends to use double consonants after a vowel to show that that vowel is short and it is hard to get any shorter than a "ə." Resonant consonants are an exception to the rule, which seems to go back through Latin, ancient Greek, and I would assume (but do not really know) all the way to PIE.

  7. Rod Johnson said,

    February 4, 2011 @ 7:33 pm

    Orthographically, I think there's a link somehow to the phonological idea of light and heavy syllables. Consider Latin, where "heavy" (bimoraic) syllables can be VCC or VVC (where VV represents a long vowel), and light syllables are VC. So vowels before consonant clusters have to be short. I think this was a fairly general phenomenon across many IE languages. This is probably not true any more in English, but a strong pattern was established, especially in monosyllabic words. So it's easy to find (the modern reflexes of) long vowels before single consonants (rice, rose, daze, please) and short vowels before clusters (whisk, cost, best), but not quite as easy to find long vowels before clusters. But not that difficult: heist, beast, waste, boost, post. In many of those cases, the long vowels are orthographically distinguished (the digraphs ei, ea, oo, ee or the "silent e" at the end). So it's not super clear-cut, but the presence of a single vowel letter before a consonant cluster suggests the the vowel is short.

    As Language And suggests, a lot of this got twisted around in the great vowel shift, so it's hard to make any claims about orthography that don't come loaded with exceptions.

  8. David said,

    February 7, 2011 @ 11:14 am

    @Language and I believe 'short' and 'long' vowels are relics of our Middle English past. /ʌ/ and /æ/ are lax vowels, and, yes, standard English tends to spell single-syllable words ending in /k/ as if the vowel is lax. Tense vowels are either digraphs ("sneak") or use the silent "e" ("fake").

    And people say there's nothing about pronunciation in English spelling!

    If people want to complain, check out Victor Mair's articles on 他妈的中文!

  9. Peter said,

    February 8, 2011 @ 12:40 am

    > And the percentage of choosing snuck, given the choice between snuck and sneaked, varies from 35% in academic prose to 75% in COCA's spoken transcripts — that's the ratio of row 4 to the sum of rows 3 and 4.

    Very minor correction: surely the low end was not 35% in academic prose, but 0.87/(0.68+1.77) ≈ 28% in newspapers?

  "Snuck" sneaked in « Sentence first said,

    October 16, 2011 @ 3:08 pm

    […] attractiveness of snuck, he concludes that "basically, sneaked is toast". See also his corpus analysis of snuck vs. sneaked in different […]

