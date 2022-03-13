« previous post |

I traditionally start my phonetics courses with an "over-under bet", about how much randomly-selected audio we need to listen to (and look at), before we find a systematic, interesting, and essentially unstudied phenomenon. In the case of English, I generally offer 20 seconds as the threshold value — for less well-studied languages like French or Chinese, the threshold might be 10 seconds. For understudied languages, 3 seconds.

This came up a few weeks ago in my corpus phonetics course, and so we took a look at the most recent Fresh Air podcast at that point: "With a nod to 'Lolita,' 'Vladímír' makes a sly statement about sex and power", 2/22/2022.

Here's the first bit of the show (a little less than 12 seconds):

Your browser does not support the audio element.

This is Fresh Air.

Our book critic Maureen Corrigan says

Julia May Jonas's new first novel,

called Vladímír,

should spark a lot of heated discussions

on today's campuses.

And the first interesting-and-unstudied phenomenon turns up after about 6.2 seconds:

Your browser does not support the audio element.

What's interesting about it?

Well, basically there's no sign of the /t/ in "first". The tiny (less than 10 msec.) low-amplitude region between the [s] and the [n] is typical of fricative-nasal transitions, as in "this novel" or "Miss Nancy". Listen to the /s/ of "first" and the first syllable of "novel", which just sounds like "sna":

Your browser does not support the audio element.

Informed readers may object that this is just (an example of) the well-known phenomenon of "t/d deletion". We saw another case of vanishing /t/ in "On beyond the (International Phonetic) Alphabet", 4/19/2018. But as far as I know, no one has looked at what happens to /t/ in this particular context: (Vst#nV (VOWEL+/s/+t/+WORDBOUNDARY+VOWEL). As in the "ists" case discussed in the cited LLOG post, it's quite unclear whether this case of /t/ allophony (contextual variation in pronunciation) should be handled symbolically — i.e. viewed as deletion of a discrete segment or feature in some mental version of a phonological representation — or rather as the end state of a phonetic process of lenition (= weakening).

There are some relevant discussions in the literature: Jeffrey Kallen, "Internal and external factors in phonological convergence: the case of English/t/lenition", 2005; Lisa Davidson, "Characteristics of stop releases in American English spontaneous speech", 2011; Patrick Honeybone, "Lenition in English", 2012, etc. And I addressed the general problem in "Towards progress in theories of language sound structure", 2018, though without discussing the t/d lenition/deletion case.

All the same, this particular case counts as empirically unstudied, as far as I know. So for this morning's Breakfast Experiment™, I looked at 100 randomly-selected examples of the word sequence "first novel". As in some previous posts and some lecture notes from last spring's Syntax and Prosody seminar I took them Shuang Li's INTERVIEW: NPR Media Dialog Transcripts dataset, which contains 3,199,859 transcribed turns from 105,817 NPR podcasts, comprising more than 10,648 hours.

I don't have time now for a full discussion of the results, but here's a quick summary of the highlights.

There were just two (of 100) cases where the /t/ was released — I excluded those from further analysis.

In all the others, I measured the durations of

(a) the /ɚ/ vowel in "first"

(b) the /s/ frication in "first"

(c) the /t/ closure if any (duration 0 if absent)

(d) the /n/ nasal murmur from "novel" In 21 of the example, the /t/ closure duration was 0. Some duration correlations:

/s/,/t/: -0.531

/n/,/t/: -0.284

/ɚ/,/t/: -0.022

/ɚ/,/s/: 0.323

Here are some 2-d kernel density plots:

There's little evidence here for the kind of crisp bimodal distribution that we would expect if the underlying phenomenon was qualitative segment deletion. Rather, we see multivariate correlations of the type that we expect given gestural phasing among the tongue, the larynx, and the velum, along with some tricky physics creating the "quantal effects" that give us relatively sharp acoustic boundaries between [ɚ] and [s], [s] and [tcl] (= "t closure"), [tcl] and [n].

There are obviously many uncontrolled covariates here, among them the stress/focus pattern (is the speaker contrasting the "FIRST novel" as opposed to later novels?), the speaker ID (because a handful of different hosts are among the speakers), the novelty of the phrase in context, and so on. We should look at more word sequences (e.g. "lost money", "just not", "first name", "less money", …), and more diverse data sources.

But at least this is a start, and it's all I have time for this morning.

