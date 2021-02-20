« previous post |

One of the benefits of checking linguistic hypotheses in real-world data is that you sometimes stumble on unexpected and potentially interesting patterns. This morning's Breakfast Experiment™ provides an example.

Yesterday, as I prepared for a seminar on prosody and syntax, the following passage caught my eye (in Gerrit Kentner and Isabelle Franz, "No evidence for prosodic effects on the syntactic encoding of complement clauses in German", Glossa 2019):

A language production experiment by Lee & Gibbons (2007) suggests that speakers use the unstressed optional complementiser that to maximise rhythmic alternation of weak and strong syllables, as it is more often produced when the top of the complement clause starts in a stressed (Lucy) as opposed to unstressed (Louise) syllable (1).

(1) Ian guessed (that) {Louise, Lucy} signed the contract

Since Kentner and Franz found a contrary result in their experiment, I thought I'd see whether the effect that Lee & Gibbons found was replicated in a more natural dataset. So I turned to Shuang Li's INTERVIEW: NPR Media Dialog Transcripts dataset, which contains 3,199,859 transcribed turns.

I selected 100 turns at random containing the verb guess with a sentential complement, e.g.

80598 14 DAVID GREENE, HOST You're a musician. I guess I wonder if there's a song of yours that you feel speaks to the kinds of tension that we've been talking about.

This requires some non-trivial annotation, since guess can be a noun as well as a verb, and what follows the verb guess might not be a complement clause, and it's probably appropriate to exclude cases where the clause starts with the demonstrative that, and etc. But I figured that a sample of 100, which just takes a few minutes to code, should be a good place to start.

I classified each example as to whether the complementizer that was present or absent, and whether the first syllable of the complement clause was stressed or not (where non-focused pronouns, determiners etc. count as unstressed, as in the example above).

What I learned from this exercise is that the relevant features are very unevenly distributed in this material:

stressed not_stressed that 0 2 no_that 7 91

It's clear that a valid statistical test of the hypothesis would require much more hand-coding than I felt like doing for a Breakfast Experiment™. So next I decided to select 100 examples of …guess that… (where guess is a verb and that is a complementizer introducing a complement clause), and code the stress status of the following syllable. Combined with the previous set, the results were:

stressed not_stressed that 25 77 no_that 7 91

This certainly suggests that avoidance of stress clashes might be playing a role in complementizer deletion.

But in doing the coding, I noticed something else. A substantial fraction of the …guess that… cases involved some quasi-modal stuff on the verb, e.g.

I would guess that a private criminal defense attorney doesn't handle a caseload anywhere near that.

I would guess that a decrease in temperature that was extreme might have, you know, some physiological impact,

And I would guess that they want to wait a little bit to make sure the security gains hold in Iraq.

I'm going to guess that The Bus, Jerome Bettis, their great running back, is going to run wild at home in Detroit

I'm going to guess that it was Oliver North in Virginia,

And so I'm going to guess that you're going to be good at this puzzle.

You can guess that the plaintiff's attorneys from that case probably wanted to bring that in

I can guess that of course it had to wash out the fireworks.

Would you guess that reality's a little different than what you get in a book?

But if you walked past John Blaufus on the streets of Portland, Oregon, or ran into him in one of the city's coffeehouses, you might never guess that this tattooed, shaggy-haired 26-year-old witnessed some of the worst the war had to offer.

I'd talked to him before a number of times, and you would never guess that he could have something like that in his past.

Today, sitting in her South Korean apartment, surrounded by puppies and Mickey Mouse dolls, you would never guess that she was a teenage smuggling kingpin.

Of the 100 …guess that… examples in my second sample, 45 are like those.

Of the 98 examples without that in the first sample, there's just one:

At a bare minimum, I would guess they're probably off by 20 percent.

What does this mean? Maybe plain that-less "I guess" has been syntactically bleached into a kind of sentence adverbial, for which complementizer deletion doesn't arise because the complementizer was never there? There are certainly examples where that seems to be the only analysis:

And the plan was I guess just around 2:00 in the afternoon to break into song?

But seriously, when I was I guess 12 or 13, I had the kinds of concerns I think every kid at that age has, why am I here, what's it all about and all that sort of stuff.

There is I guess an impression that the Geneva Conventions were written to apply to two countries in an armed conflict

This is I guess the oldest opposition party in Egypt.

This hypothesis could be tested by looked at guess with other subjects, and at other verbs (though a similar process might be underway with some of them). But those questions will have to wait for another morning.

