Cynthia McLemore & Mark Liberman, Eds.

Proceedings of the IRCS Workshop on Prosody in Natural Speech
August 5-12, 1992
IRCS Technical Report No. 92-37

[Link to the whole proceedings; links to individual contributions are below]

Note: The summary following each paper is its abstract, where one was provided; in other cases, a short passage has been taken from the paper's introduction.


 Janet Bing, “The Episode in Narration: The Interaction of Prosody and Discourse Markers”, pp. 1-8

The proposed algorithm segments a narrative into episodes (units larger than sentences) using both discourse markers and prosodic cues: fundamental frequency, pause, and declination. The episodes identified are not merely the result of physiological needs, but are thematically unified. In addition to identifying episodes, a combination of prosodic cues and discourse markers also identifies the major divisions of the narrative.

Susan Brennan, “Intonation as a Presentational Resource in Conversation”, pp. 9-19

In this paper I present some data about the intonational resources that people can use for grounding the meanings of utterances. My claim is that intonation not only conveys information about syntactic constituents (Cruttendon, 1986) and the speaker's intention (Sag & Liberman, 1974; Liberman & Sag, 1974; Pierrehumbert & Hirshberg, 1990), but also can be used to manage the exchange of evidence between two people in conversation, en route to achieving mutual understanding. In particular, I examine phrase final rising intonation. It has been proposed by some that such intonation serves an interactional purpose (Brennan, 1990; McLemore, 1991), e.g. to elicit the attention of addressees, or to pursue a response. I will bring behavioral evidence to bear on the hypothesis that speakers use rising intonation to actively seek evidence of understanding from their addressees.

Janet Cahn, “An investigation into the correlation of cue phrases, unfilled pauses and the structuring of spoken discourse”, pp.19-30

Expectations about the correlation of cue phrases, the duration of unfilled pauses and the structuring of spoken discourse are framed in light of Grosz and Sidner's theory of discourse and are tested for a directions-giving dialogue. Toe results suggest that cue phrase and discourse structuring tasks may align, and show a correlation for pause length and some of the modifications that speakers can make to discourse structure.

Troi Carleton, “French Liaison in Natural Discourse”, pp. 31-40

The immediate goal in this study is to arrive at an observationally adequate account of liaison. This we achieve by an inductive analysis of the distribution of liaison in a corpus of natural discourse. Crucial to this study and others like this is the corpus of natural discourse. It is only by looking at natural discourse that we can begin to see where current proposals fall short and new approaches are in order. An inductive approach here will allow us to capture generalizations in the data that simpler more popular models may miss. It is only by inductively approaching a corpus of natural data that we may begin to understand what role such phonological phenomenon, such as French liaison, play in prosodic phrasing.

Wallace Chafe, “Intonation Units and Prominences in English Natural Discourse”, pp. 41-52

This paper emphasizes the need to understand prosody with relation to all of language functioning, and to take account of all observable properties of sound, including the basic dimensions of frequency, intensity, and duration, but also various derived properties such as tempo and voice quality, while at the same time attending to both the physical and perceptual manifestations of these properties. The discussion focuses (1) on the identification of intonation units, which are seen as reflecting a universal inability of consciousness to focus on more than a small amount of information at a time, and (2) on the complexly interrelated acoustic and perceptual manifestations of prominences within intonation units, as well as the functions of these prominences.

John Daly, “Phonetic Interpretation of Tone Features in Peñoles Mixtec”, pp. 53-62

Surface representations of tone in Peñoles Mixtec are derived from underlying representations which are determined largely by considerations of simplicity in the account of tone sandhi. There is tension between underlying tonal representations which simplify a description of tone sandhi and surface representations which lend themselves to straightforward phonetic interpretation. These demands are met within a theory of tone in which primary and register features are on separate planes and in which register has a cumulative effect.

Ronald Geluykens and Marc Swerts, “Prosodic Topic- and Turn-Finality Cues”, pp. 63-71

This paper describes an acoustic analysis of prosody in a variety of experimental types of dialogue. Subjects cooperatively had to perform a speaking task (i.e. describe simple rows of differently colored figures and signal their structure) and a listening task (i.e. respond to discourse boundaries in the speech produced by the interlocutor and take over as soon as the other had finished). It was found that the demarcation of discourse units by means of various intonation contours and accent shifts is largely dependent on the kind of discourse setting, in that speakers clearly take into account whether a conversational partner is likely to interrupt or not. Moreover, subjects appear not just to exploit local cues to signal the boundaries of larger-scale units. Our study reveals that they also have at their disposal: (1) a specific type of intonation contour (a level tone), occurring well before the actual end, that pre-signals that a unit will soon be rounded off; (2) topline-declination over the course of a topical unit that is different in final position than in non-final position; (3) a gradual shift in prominence in a NP from the adjective to the noun position over the course of a discourse unit.

Beth Ann Hockey, “Prosody and the interpretation of cue phrases”, pp. 72-78

Cue phrases such as okay and uh-huh are often multiply ambiguous. Native speakers' intuitions are that the various interpretations of these items are distinguished prosodically. Studies by Hirschberg and Litman [1, 2] confirm these intuitions for cue and non-cue uses of several items. This study shows that various cue uses of an item can also be distinguished prosodically. Based on data from task oriented dialogs, three recurring pitch contours were found to correlate with the presence or absence of two features of the discourse: pronominal anaphora and turn taking.

Jacqueline Kowtko, “Comparing Intonational Form with Discourse Function: A Study of Single Word Utterances”, pp. 79-82

Recent attempts to analyze the function of intonation in discourse (both monologue and dialogue) classify the data according to type of intonational tune [4, 7] and make a more or less general characterization of the discourse function associated with utterances containing the particular tunes [8, 5]. The literature shows convincingly that intonation signals boundaries in discourse structure, but lacks a clear specification of discourse function. A suitable discourse taxonomy is needed to fine-tune the relationship between intonation and discourse function. A recent analysis of dialogue [6] provides a framework of conversational games which allows more fine-grained examination of prosodic function. The current paper introduces an intonational analysis of single word utterances based upon such a framework and compares results in progress with previous work on intonation.

Mark Liberman, J. Michael Schultz, Soonhyun Hong, Vincent Okeke, “The Phonetic Interpretation of Tone in Igbo”, pp. 83-92

lgbo, a language of the Kwa branch of the Niger-Congo family, is spoken by about 15 million people in southeastern Nigeria. Its phonology, morphology and syntax have been widely studied (e.g. [1, 2]), especially with reference to the intricate patterning of lexical tone. This paper is a preliminary study of the phonetic interpretation of lgbo tone. We use an experimental method first applied to English ([4, 5]), in which a speaker varies pitch range orthogonally with variation in tonal material, and we compare the success of different models in characterizing the interaction of tone identity, phrasal position, tone sequence, and pitch range in determining patterns of measured FO values. From the statistical structure of these data, we draw several conclusions about lgbo tone and its phonetic interpretation.

Margaret Luebs, “A Prosodic Analysis of Two Earthquake Narratives”, pp. 93-102

This paper analyzes the prosody of two narratives of the 1989 San Francisco earthquake, in order to show that a consideration of prosody can be an important part of narrative analysis. The focus is on two aspects of the narratives: their structure, and the humor used in them. It is shown that prosody plays an important role in delineating the structure of a narrative, and perhaps should be used as a criterion when choosing a theory of narrative structure. It is also shown that prosody has an equally important but less easily described role in signalling attempts at humor.

Victor Manfredi, “The Limits of Downstep in Agbo Sentence-Prosody”, pp. 103-116

A recorded corpus 1 of some 80 nonspontaneous Agbo examples shows systematic resetting of downstepped pitch within the minimal sentence. As this phenomenon is not independent of a preceding downstep, and can never cumulate upward, it is precisely not 'upstep' (pace Meir et al. 1975; Snider 1990) but rather antidownstep or downstep-reset. Contra expectations of the reigning phonological model of downstep (e.g. Clements 1981), downstep-reset is limited neither to clausal boundaries (where trivially it does occur) nor to performance contexts of maintaining adequate pitch range. A first, impressionistic pass over the Agbo corpus readily identifies two linguistic contexts for downstep-reset:

  • After word final downstep before phrase boundary (tracks 2, 3, 13, 26, 28, 31, 33, 41, 48, 50, 52, 63, 70-72, 74, 79, 80). Most examples of this edge effect involve a PP or serial VP - neither type containing a pause.
  • After a verb in which lexical H and L are neutralized (tracks 21, 22, 28, 32-35, 37, 39J 41, 43, 45-47, 68-70, 72, 76, 77). This architone effect regularly occurs, inter alia, before the negative/relative suffix -ni.

In a framework of tone-metrical licensing (Bamba 1992, Manfredi 1992), the two downstep-reset contexts share one property: a H tone in a weak position., The configurations which predict weak H are found in surface syntax. Weak H also accounts for downstep-reset in the Abankelele dialect -- previously claimed to have a so-called 'upstep' juncture -- and in standard Igbo.

Cynthia McLemore, “Prosodic Variation Across Discourse Types”, pp. 117-128

In this study, I compare the frequency and distribution of a small set of prosodic features in two different types of discourses, or speech activities. The goals of this investigation are to refine methodologies for transcribing and characterizing intonational regularities in natural speech, and to uncover the ways in which intonational forms are used for particular, situated ends.

Corey Miller, “Prosodic Aspects of M.L. King’s ‘I Have A Dream’ Speech”, pp. 129-138

This research examines the prosodic characteristics of Martin Luther King's "I have a dream today" speech in an effort to better understand both the prosody of oratory and the prosodic qualities of King's speech that move people. The peroration of the speech was digitized and analyzed using the Waves program on a Sun SparcStation. Among the salient findings were King's sustained high pitch, several recurrent pitch patterns and various special effects. Many of these features are exemplified with reference to pitchtracks. Some discussion of the characterization of oratory with respect to speech and nonspeech modes of perception ensues.

Richard Oehrle & Malcah Yaeger-Dror, “Prosody and Information in Naturally-Occurring Discourse”, pp. 139-150

We wish to construct an account of prosody which is both coherent with principles of grammatical analysis and responsible to naturally-occurring, contextually-situated speech. This setting provides a domain in which it is possible to test, refine, and extend theoretical hypotheses in the light of empirical data.

Robin Queen, “Prosodic Organization in the Speeches of Martin Luther King”, pp. 151-160

The public speeches of Martin Luther King Jr. present an interesting juncture for the study of language as it pertains to culture (and vice versa) because of King's unique place within the cultural history of the United States and within the African-American community. There have been many studies of King's rhetorical style as well as the political and social implications of the content of his speeches; however, there has been very little work done in which the actual linguistic devices which he uses have been clearly identified and described with respect to both distribution and interpretation. This paper offers a first and preliminary account of certain aspects of King's language, with particular emphasis on his use of prosodic tools as a method of discourse organization and cultural reference.

Peter Roach, Nawal Ghali, & Simon Arnfield, “MARSEC: Design of a Machine-Readable Spoken English Corpus of British English”, pp. 161-170

In our attempts to make generalisations about intonation we depend heavily on the validity of the method used to represent prosodic information. In order to evaluate the relationship between intonation transcription and the physical properties in the speech signal we need a large sample of transcribed recordings; this paper describes work on such a corpus which also provides a considerable amount of grammatical information.

Stephan Schuetze-Coburn, “Prosodic Phrase as a Prototype”, pp. 171-180

The linguistic unit 'prosodic phrase' has an underlying if not overt syntactic basis in many phonological and descriptive accounts of prosodic structure. On the other hand, phonetically oriented definitions are usually too limited or vague, so that they fail in the analysis of natural, connected speech. The basis for avoiding phonetic substance or for not providing adequate phonetic detail is the apparent lack of a clear set of invariant phonetic cues with which the category 'prosodic phrase' may be defined. It is suggested that while this may indeed be the case, there are alternatives to searching for criteria! attributes. Viewing the category 'prosodic phrase' as a prototype is one way of shifting the perspective away from the expectation of necessary and sufficient conditions and towards a characterization of 'prosodic phrase' which more accurately reflects even the variation found in spontaneous speech. Properties of prototypes in linguistic theory are examined, and the implications of considering a prosodic phrase category as a prototype are explored in the context of a German conversational narrative which has been analyzed auditorily into 'intonation units'.

Shattuck-Hufnagel, M. Ostendorf, & K. Ross, “Pitch Accent Placement within Words”, pp. 181-192

Two aspects of prosodic structure have been suggested as factors contributing to the early placement of prominence, sometimes called 'stress shift', in late-main-stress words: these two factors are rhythmic regularity and onset location of the pitch accent in its prosodic phrase. This paper reports data from a corpus of FM radio news stories showing support for both factors. When listeners label phrase-level prominences for each syllable in an utterance, they tend to report that the speaker has placed a prominence early in a late-main-stress target word when either (a) the first syllable of the following word also bears a prominence (rhythmic clash) or (b) the target word carries the first prominence of the prosodic phrase (onset marking). We also examined some of the acoustic correlates of early prominence labeling in a subset of the same target words. When a syllable early in the word (i.e. before the main-stress syllable) is labeled with a phrase-level prominence, it shows a substantial FO movement compared with the same syllable in non-early-accent examples. These findings support the hypothesis that apparent stress shift is, at least in part, a matter of early pitch accent placement within the word.

Chilin Shih & Richard Sproat, “Variations of the Mandarin Rising Tone”, pp. 193-200

This paper uses the Mandarin rising tone (tone 2) as an example to illustrate the range of tonal variation in speech, from words read in isolation, words read in sentence frames, to words produced in conversation. It is shown that the 2nd Tone Sandhi Rule (2TS) described by Chao [l] is the result of a phonetic implementation rule, which applies to the low target of a rising tone in high tone context when the rising tone in question is in prosodically weak positions. The amount of pitch drop to the low target of a rising tone varies with the prosodic strength of the syllable. As a result, the pitch contour of an extremely weak rising tone in high tone context approaches the shape of a high level tone.

Antônio Simões, “The Phonetics of Discourse: Strong Syllable Positions in Mexican Spanish and Brazilian Portuguese”, pp. 211-220

The main objective of this investigation is the analysis of segment duration in discourse in order to provide a basis for the study of lexical and intonational stress in Spanish and Portuguese. The present study is limited to the study of lexical stress in discourse. Duration of syllable nuclei is studied acoustically and perceptually from spontaneous speech recordings.

Elizabeth Sriberg & Robin Lickley, “The Relationship of Filled-Pause F0 to Prosodic Context”, pp. 201-210

Filled pauses in spontaneous speech present problems for models of speech understanding and automatic speech recognition. A potentially important cue to their recognition by both humans and machines is their typically low FO [9, 7]. The current paper discusses results of a study [10] which sought to determine whether the FO of filled pauses is relative to, or independent of, the FO of surrounding lexical material. Clause-internal filled pauses and preceding peak FO values for speakers of American and British English were examined. Higher peaks were found to be systematically associated with higher filled-pause values within speakers, supporting the "relative" hypothesis. In modeling this relationship it was found that a linear model, in which filled-pause FO was expressed as an invariant (over speakers) proportion of the distance between the preceding peak FO and a speaker-dependent terminal low FO, produced results nearly identical to those of a two parameter model in which the coefficients of peak and terminal low FO were allowed to vary freely. Analyses of additional variables showed the model to be less appropriate for filled pauses after sentence-initial peaks, but unaffected by temporal variables. These results suggest that clause-internal filled pauses, while lower in FO than words in the message stream, nevertheless preserve information about the local prosodic context. Implications for psycholinguistics, speech recognition, and linguistic theory are discussed.

Marc Swerts & Ronald Geluykens, “The Prosodic Structuring of Information Flow in Spoken Discourse”, pp. 221-230

This paper describes a study on the prosodic demarcation of larger-scale topical units in spontaneous discourse, in terms of various melodic variables and pause structure. The research reported upon centers on a specific kind of, spontaneous language use, viz. so-called instruction monologues of three different Dutch speakers. These monologues are such that macro-units can easily be specified on the basis of criteria which are independent of supra-segmental information. It was found that, in order to indicate which stretches of discourse constitute meaningful units, the three speakers indeed exploit both melodic variables (boundary tones, variable height of F0 maxima, overall downward tendency in pitch over the course of a topic) and pause structure (important points in the flow of information are marked with long pauses, the lengths of which depend on the deepness of the boundary). However, we also observed some speaker variation, in that not each of our informants appeared to use each of the prosodic demarcation devices to the same extent.

Marilyn Walker, “When Given Information is Accented: Repetition, Paraphrase and Inference in Dialogue”, pp. 231-240

A classic function of intonation is to indicate the distribution of given and new information in an utterance. This paper defines given in two ways: known and salient. It then examines 63 utterances from a radio talk show corpus to determine whether either definition of given is predictive of the intonational contours found in the corpus. Given as salient is found to reliably predict one class of contour: the sustained tones.

Anthony Woodbury, “Prosodic Elements and Prosodic Structures in Natural Discourse”, pp. 241-254

Although usually taken for granted, it is anything but clear that prosodic elements are organized into autonomous prosodic structures such as intonational phrases. A framework is outlined within which the structural and communicative organization of prosodic elements in samples of natural discourse might be discovered inductively. The framework assumes that the structural organization of a stretch of speech consists of the set of recurrent patterns it contains (including prosodic patterns), and that such patterns are recognizable to speakers. It is further hypothesized that in the normal or usual case, logically independent patterns (e.g., the placement of pauses vs. the placement of intonational cadences) will converge or unify; and that if they do not unify, speakers may draw special pragmatic inferences from this fact. Three samples of natural speech are analyzed in order to present the approach and demonstrate three key properties of the prosodic structure that it uncovers: (a) the potential independence of prosodic patterns and thematic structure; (b) the potential for bundles of prosodic elements to recur as prosodic 'macrostructures,' often associated by speakers with particular styles, contexts, and social personas; (c) the potential for prosodic patterns (and elements) to carry meaning that is iconic in character, but regulated byculturally specific conventions and practices.

Nicola Woods, “It’s not what she says, it’s the way she says it: the influence of speaker-sex on pitch and intonation patterns”, pp. 255-264

In this paper I discuss the relationship between speaker-sex and pitch and intonational features of language. I examine the spontaneous speech of male and female adults and children and pay specific attention to (i) pitch movements on nuclear syllables (what Halliday (1966) refers to as tone); (ii) pitch range; and (iii) maximum pitch. Results show that particular patterns of tone and pitch are characteristic of male and female speech.