One of the benefits of checking linguistic hypotheses in real-world data is that you sometimes stumble on unexpected and potentially interesting patterns. This morning's Breakfast Experiment™ provides an example.

Yesterday, as I prepared for a seminar on prosody and syntax, the following passage caught my eye (in Gerrit Kentner and Isabelle Franz, "No evidence for prosodic effects on the syntactic encoding of complement clauses in German", Glossa 2019):

A language production experiment by Lee & Gibbons (2007) suggests that speakers use the unstressed optional complementiser that to maximise rhythmic alternation of weak and strong syllables, as it is more often produced when the top of the complement clause starts in a stressed (Lucy) as opposed to unstressed (Louise) syllable (1).

(1) Ian guessed (that) {Louise, Lucy} signed the contract

Since Kentner and Franz found a contrary result in their experiment, I thought I'd see whether the effect that Lee & Gibbons found was replicated in a more natural dataset. So I turned to Shuang Li's INTERVIEW: NPR Media Dialog Transcripts dataset, which contains 3,199,859 transcribed turns.

I selected 100 turns at random containing the verb guess with a sentential complement, e.g.

80598 14 DAVID GREENE, HOST You're a musician. I guess I wonder if there's a song of yours that you feel speaks to the kinds of tension that we've been talking about.

This requires some non-trivial annotation, since guess can be a noun as well as a verb, and what follows the verb guess might not be a complement clause, and it's probably appropriate to exclude cases where the clause starts with the demonstrative that, and etc. But I figured that a sample of 100, which just takes a few minutes to code, should be a good place to start.

I classified each example as to whether the complementizer that was present or absent, and whether the first syllable of the complement clause was stressed or not (where non-focused pronouns, determiners etc. count as unstressed, as in the example above).

What I learned from this exercise is that the relevant features are very unevenly distributed in this material:

stressed not_stressed
that 0 2
no_that 7 91

It's clear that a valid statistical test of the hypothesis would require much more hand-coding than I felt like doing for a Breakfast Experiment™. So next I decided to select 100 examples of …guess that… (where guess is a verb and that is a complementizer introducing a complement clause), and code the stress status of the following syllable.  Combined with the previous set, the results were:

stressed not_stressed
that 25 77
no_that 7 91

This certainly suggests that avoidance of stress clashes might be playing a role in complementizer deletion.

But in doing the coding, I noticed something else. A substantial fraction of the …guess that… cases involved some quasi-modal stuff on the verb, e.g.

I would guess that a private criminal defense attorney doesn't handle a caseload anywhere near that.
I would guess that a decrease in temperature that was extreme might have, you know, some physiological impact,
And I would guess that they want to wait a little bit to make sure the security gains hold in Iraq.
I'm going to guess that The Bus, Jerome Bettis, their great running back, is going to run wild at home in Detroit
I'm going to guess that it was Oliver North in Virginia,
And so I'm going to guess that you're going to be good at this puzzle.
You can guess that the plaintiff's attorneys from that case probably wanted to bring that in
I can guess that of course it had to wash out the fireworks.
Would you guess that reality's a little different than what you get in a book?
But if you walked past John Blaufus on the streets of Portland, Oregon, or ran into him in one of the city's coffeehouses, you might never guess that this tattooed, shaggy-haired 26-year-old witnessed some of the worst the war had to offer.
I'd talked to him before a number of times, and you would never guess that he could have something like that in his past.
Today, sitting in her South Korean apartment, surrounded by puppies and Mickey Mouse dolls, you would never guess that she was a teenage smuggling kingpin.

Of the 100 …guess that… examples in my second sample, 45 are like those.
Of the 98 examples without that in the first sample, there's just one:

At a bare minimum, I would guess they're probably off by 20 percent.

What does this mean? Maybe plain that-less "I guess" has been syntactically bleached into a kind of sentence adverbial, for which complementizer deletion doesn't arise because the complementizer was never there? There are certainly examples where that seems to be the only analysis:

And the plan was I guess just around 2:00 in the afternoon to break into song?
But seriously, when I was I guess 12 or 13, I had the kinds of concerns I think every kid at that age has, why am I here, what's it all about and all that sort of stuff.
There is I guess an impression that the Geneva Conventions were written to apply to two countries in an armed conflict
This is I guess the oldest opposition party in Egypt.

This hypothesis could be tested by looked at guess with other subjects, and at other verbs (though a similar process might be underway with some of them). But those questions will have to wait for another morning.


  1. Ryan Lai said,

    February 20, 2021 @ 8:51 am

    That *I guess* has been grammaticalised as a kind of discourse formula and not really a subject and a complement-taking verb (and hence such sentences are well along the way to becoming monoclausal) is quite well-established in the literature on object complements on English syntax, especially Thompson and Mulac (1991), Thompson (2002) and Diessel and Tomasello (2001). Interestingly, I looked up some of the most important multifactorial work on the alternation (Tagliamonte and Smith 2005, Torres Cacoullos and Walker 2009, Jaeger 2010) and none of them seem to have mentioned stress clash; Shank and Plevots (2018), the latest I know of from this literature, doesn't seem to mention it either. Perhaps something to add to the multiple regressions …

    (On a more tangential point, I think even in cases of more full-fledged matrix clauses, it's best to call it a marked vs unmarked complementation alternation, rather than calling it deletion and thus treating the case with an explicit complementiser as more basic. Most of the literature frames this alternation as *that* vs zero complementiser, which also avoids treating the overtly marked alternative as basic, though I think this still runs into the conceptual problems with zero marking that Martin Haspelmath discussed on his blog sometime ago: https://dlc.hypotheses.org/1826 – and this case is actually more problematic than ones he discusses, because the zero is not signalling a different meaning from *that*.)

  2. Chris Button said,

    February 20, 2021 @ 10:17 am

    Stress clashes playing some role in complementizer deletion makes good sense.

    However, in addition to looking at other verbs, I would recommend comparing an American English corpus with a British one. The bleaching into a sentence adverbial strikes me as particularly American, especially when without the modal additions.

  3. Bob Ladd said,

    February 20, 2021 @ 11:17 am

    Chris Button is certainly right that bleached I guess sounds American in the UK, but for at least some UK speakers I reckon covers similar ground and may be almost as bleached. (And I reckon certainly doesn't have the same folksy sociolinguistic overtones in BrEng that it has in AmEng.) Unfortunately, you couldn't do a fair comparison to test the stress-clash idea, because reckon has its following unstressed syllable built in ahead of time.

    Kuijpers & van Donselaar (1998, behind a paywall at Language and Speech) found that the optional epenthesis of schwa to break up liquid-obstruent coda clusters in Dutch (e.g. "melluk" from canonical "melk") is influenced by the rhythmic context in a way very similar to the effect discussed here.

  4. mg said,

    February 20, 2021 @ 12:35 pm

    To be able to draw any valid conclusions, you really have to look at more than one verb. Otherwise, you have no idea if your results are particular to "guess" or generalize to other words.

    [(myl) True enough. But the point of my little exploration is that it's also a good idea to look at how people actually talk, not just how students respond to structured arrays of stimuli in the laboratory.]

  5. J.W. Brewer said,

    February 20, 2021 @ 6:20 pm

    That "grammaticalized" version of "I guess" can be freely inserted at various points in the sentence, many of which seem to block use of "that." So to take one of the examples above, "This is I guess the oldest opposition party in Egypt," you can't stick in a "that" after the "guess" (according to my native-speaker intuitions and those of others, I guess). But "I guess" is as far as I can tell serving exactly the same function in "I guess this is the oldest opposition party in Egypt," where you can stick in a "that" without it being ungrammatical, yet doing so sounds a bit weird to my ear, as if overly formal.

    But maybe the presence or absence of "that" in that context actually reveals two different uses of "guess." "I guess [NAME OF PARTY is the oldest opposition party" is a bit of a self-deprecating hedge on the proposition "[NAME OF PARTY] is the oldest," whereas if you wanted a more performative verb because you were actually guessing at the answer in response to the question "What is the oldest opposition party?" it might be more natural to use "I guess that [NAME OF PARTY] is." Are the "modal" examples in the original post consistent with that, where the modal construction maybe helps emphasize that the speaker is, in fact, guessing rather than just hedging a bit on something believed to be true just in case it turns out not to be true.

  6. Gregory Kusnick said,

    February 21, 2021 @ 12:47 pm

    For me, "I guess" means something like "apparently". For actual speculation, I generally use "I'm guessing".

    So if there's a foot of snow in the street, "I guess I'm working from home today," but "I'm guessing they'll have it plowed by lunchtime."

  7. Martha said,

    February 22, 2021 @ 12:43 pm

    "I guess that" sounds like a proclamation to me. I can't shake the feeling that it sounds like something you'd say in a Clue-like game where you're supposed to use the word "guess" as in, "I guess that it was Colonel Mustard in the Library…"

    "I would guess" and "I'm guessing" sound more like actual guessing to me, while "I guess" means "apparently," as stated above.

  8. Viseguy said,

    February 23, 2021 @ 8:47 pm

    The that-less "I guess" feels close in meaning to the "like" interjection that was prevalent in the '00s and 'teens (but maybe less so today?):

    And the plan was I guess just around 2:00 in the afternoon to break into song?

    And the plan was, like, just around 2:00 in the afternoon to break into song?

    But seriously, when I was I guess 12 or 13, I had the kinds of concerns I think every kid at that age has, why am I here, what's it all about and all that sort of stuff.

    But seriously, when I was, like, 12 or 13, I had the kinds of concerns I think every kid at that age has, why am I here, what's it all about and all that sort of stuff.

    There is I guess an impression that the Geneva Conventions were written to apply to two countries in an armed conflict

    (?) There is, like, an impression that the Geneva Conventions were written to apply to two countries in an armed conflict

    This is I guess the oldest opposition party in Egypt.

    This is, like, the oldest opposition party in Egypt.

    You can't just mechanically substitute "like" in the that-ful examples, I think.

