Stress, emphasis, pause, and meaning in Mandarin

« previous post | next post »

In "Mandarin Janus sentences" (11/4/17), there arose the question of whether duōshǎo 多少 ("how many") and duō shǎo 多少 ("how few") are spoken differently.  I'm very glad that, in the comments, Chris Button recognizes that Sinitic languages can have stress.  (The same is doubtless true of other tonal languages).

This is an aspect of Mandarin and the other Sinitic languages that most scholars completely ignore and even disavow.  I've written about it from time to time on Language Log, e.g.:

"When intonation overrides tone" (6/4/13)
"When intonation overrides tone, part 2" (5/11/17)
"Tones and the brain" (3/3/15)
"Dissimilation, stress, sandhi, and other tonal variations in Mandarin (8/26/14)
"slip(per)" (7/22/14)
"Mandarin by the numbers" (6/8/13)
"Where did Chinese tones come from and where are they going?" (6/25/13)
"Pinyin memoirs" (8/13/16)

In the next to the last post, I note that University of Oslo student Øystein Krogh Visted has recently (2012) written a very interesting M.A thesis entitled "Nuances of Pronunciation in Chinese:  Lexical Stress in Beijing Mandarin."  Here's a brief description of the thesis:

The pronunciation of Beijing Mandarin, which is the basis for Modern Standard Mandarin, is in reality not as straightforward as it is usually presented. General books on the language and common textbooks in English on the subject usually only give very basic, prescriptive (though supposedly descriptive) analyses of the basic features of pronunciation. Finer points are generally not discussed in any detail. The treatment of amongst other things the aspect of word stress (the parts of words that are emphasized in speech) in mastering and indeed properly understanding Chinese is thus neglected. It has not yet acquired the position in Chinese language-teaching it arguably needs, so that the language may begin to be taught and indeed learned in a more comprehensive manner. This book will take a basic analytical approach to the phenomenon of word stress in Beijing Mandarin. It compares and discusses available meta-information on the topic, as well as its theoretical underpinnings and practical applications, and from a pedagogical starting point aims to bring attention to these important nuances in the Chinese language.

See especially this comment to the last post:

Reading pinyin text for me is as easy as reading English, and I can skim-read it the way I do English. I prefer the texts not to have tone marks, because I have to make an effort to block them out, just as I would have to make an effort to block out accent and stress marks if they were included in normal English text. In this sense, what Wang Yujiang mentioned in several of his comments is true (see especially his excellent response to Cory Lubliner): when Chinese speak or read out a text, they do not enunciate the tones one by one as they are marked in a dictionary. Rather, they develop a rhythm in their reading / speech / singing (for that matter) in which emphasis, stress, and overall "feel" of a sentence / utterance become dominant, rather than the canonical dictionary entry tonal categories of individual characters. This is a phenomenon that a few Czech phoneticians have observed, and Christoph Harbsmeier (the German-Norwegian-Danish Sinologist) has paid particular attention to. The problem is that it's virtually impossible to predict how this will turn out ahead of time for discrete characters. The flow of a sentence or utterance only happens in real time and under the emotions of the moment. Of course, if one is anal about it, one could devise means for notating such spoken sentences once they were uttered, but I don't know how useful that information would be for pedagogical purposes, and to what purpose one would put it other than for phonological research.

Without mentioning names, I know non-native speakers who have astonishingly good mastery of tones for thousands of characters, some of them who even wag their fingers or bob their heads in the air when they pronounce the tones as they are speaking or reading Chinese (it's very painful to watch). The best speakers of Chinese that I know (and here again I'm not mentioning names, though it would be very easy to list a dozen or so of the best), almost uniformly, are not tied to the individual characters / syllables, but rather have developed the ability to grasp the overall sound pattern of whole sentences. It is very impressive (and satisfying) to listen to them do this, and some of them develop this ability very quickly, already within the first year of their study of Mandarin or Cantonese or Taiwanese, or whichever Sinitic language they are studying. In no case are such masters of spoken Chinese languages fixated on the characters.

Nor, I would add, are they fixated on the tones.  Real speakers of Mandarin (and other Sinitic languages) are not robots.  They do not utter sentences and paragraphs as though they were matching the canonical, citation tones listed for characters in dictionaries mechanically one after another to the syllables of their speech.  Rather, human speech has a rhythm and a flow through which it imparts meaning and emotion.

Phoneticians, psycholinguists, and other specialists have studied the phenomenon of pitch at the lexical level and at the sentence level, but the results of their research are not well known (or known at all) to Sinologists and Chinese language teachers.

A couple of citations:

"Jie Liang and Vincent J. van Heuven, "Chinese tone and intonation perceived by L1 and L2 listeners", in Tomas Riad and Carlos Gussenhoven, ed., Tones and Tunes: Experimental studies in word and sentence prosody, pp. 27-62.

Shu-hui Peng, Marjorie K. M. Chan, Chiu-yu Tseng, Tsan Huang, Ok Joo Lee, and Mary E. Beckman, "Towards a Pan-Mandarin System for Prosodic Transcription", in Sun-Ah Jun, ed., Prosodic Typology:  The Phonology of Intonation and Phrasing, Vol. 1, pp. 230-270.

In the preceding two paragraph quotation, I mentioned Czech scholars who have paid attention to these aspects of Mandarin speech.  Chief among them is the phonologist Oldřich Švarný who recorded huge quantities of the beautiful Pekingese speech of Tang Yunling and analyzed it in terms of stress patterns.  Christoph Harbsmeier, whom I also mentioned above, has arranged for the digitization of this enormous corpus, which makes these invaluable recordings available for further and more sophisticated studies (now that more advanced hardware and software have been developed).

Even more wonderful, Harbsmeier has loaded all of the digitized spoken material from Švarný-Tang into a beta web-site called MILK (Mandarin Audio Idiolect Dictionary).  This makes the material easily accessible to all who are interested in pursuing research on conversationally spoken, not read, Mandarin.  For each line of the transcript, you can open a window that displays the following:  waveform & spectrogram, formants, pitch, and selection stats.  Harbsmeier has informed me that he and his team have also applied Praat-style analysis to the recordings so that we can see where the stress is.  Through all of these devices, the phonetic features of Tang laoshi's speech are made visible.

Now, I invite you to the treat of listening to the 2,200 occurrences of 多少 in the Švarný-Tang corpus as it is recorded in Harbsmeier's MILK.  I think you will be astonished at the wide variation for just this one lexeme as it is realized in the living speech of a reliable native informant.  Enjoy!


  1. Noel Hunt said,

    November 8, 2017 @ 6:45 pm

    Thanks for the references to works on stress. This is an important area if one wants to speak 'flawless' Chinese, but it's not all. I would also like to see work done on 'articulatory setting' in Chinese. I think a fair amount of work on English and French has been done on this topic, and to some extent Japanese (Timothy Vance) but I have never seen discussions of articulatory setting in Chinese.

  2. Michael Watts said,

    November 8, 2017 @ 7:17 pm

    I had a native Mandarin speaker tell me that Beijing 北京 is stressed on the first syllable. Unfortunately, I'm not able to hear that — for me, tones 1 and 4 are strong, and tones 2 and 3 are weak, and my mental phonology does not allow for lexical stress to fall on a weak syllable. The tone sequence of 北京 is 3-1, and I inevitably perceive stress on the 京. :-(

  3. Mark Liberman said,

    November 9, 2017 @ 7:57 am

    For those interested in Mandarin stress, I recommend the works of San Duanmu, perhaps starting with his chapter on "Syllable Structure and Stress" from the Handbook of Chinese Linguistics, and including his book The Phonology of Standard Chinese.

    Relevant work that I've been involved with includes "A cross-linguistic study of prosodic focus", IEEE ICASSP 2015, "Investigating Consonant Reduction in Mandarin Chinese with Improved Forced Alignment", InterSpeech 2015, and "Prosodic Strength Intrinsic to Lexical Items: A Corpus Study of Tone Reduction in Tone4+Tone4 Words in Mandarin Chinese", ICSLP 2016.

  4. Victor Mair said,

    November 9, 2017 @ 12:21 pm

    One of Švarný's students, Hana Třísková, has followed in his steps. She writes: "Yes, I continue working on stress (in fact, on NON-STRESS / reduction, which, in my view, is more interesting and phonologically important in Chinese than stress)".

    Here are a couple of recent papers by her:

    Třísková, Hana. De-stressed words in Mandarin: drawing parallel with English. In: Hongyin Tao ed. Integrating Chinese Linguistics Research and Language Teaching and Learning. Amsterdam / Philadelphia: John Benjamins, 2016. pp. 121–144.

    Třísková, Hana. De-stress in Mandarin: clitics, cliticoids and phonetic chunks. In: Istvan Kecskes and Chaofen Sun eds. Key Issues in Chinese as a Second Language Research. New York and London: Routledge, 2017. pp. 29–56.

    ISBN: 978-1-138-96053-4

    Třísková, Hana. Acquiring and teaching Chinese pronunciation. In: Istvan Kecskes ed., Explorations into Chinese as a Second Language. Cham: Springer, Educational Linguistics series, 2017. pp. 3–30.
    (broader topic – how to teach Mandarin pronunciation as such)

    If someone is really interested in any of these papers, I have pdfs. Additional papers by Hana are available here:

  5. Chris Button said,

    November 9, 2017 @ 1:46 pm

    @ Noel Hunt

    I think "Articulatory Setting" has not garnered much attention from linguists because it is too rigid and all-encompassing. While I do feel it can on the whole be better accounted for by more traditional explanations, it does nonetheless highlight a very important aspect of pronunciation – namely that phonemes are essentially abstract and accordingly counter-intuitive to how we generally process speech in syllabic chunks (the syllable being essentially schwa whether underlyingly inherent or overtly manifested).

    For example, a /t/ phoneme is only really identifiable through the formants in its surrounding vocalic environment which will vary depending on whether the /t/ is alveolar, lamino-dental, retroflex etc. This is why the /i:/ in "tea" said by a typical speaker of Indian English after an unaspirated retroflex will sound different from the RP or GenAm /i:/ after an aspirated alveolar. As such, it is not simply the consonant that is different, but also the effect it has on the vowel. This can also go the other way – for example the "dark-l" when used in initial position by North Americans may be lighter (i.e. more like a British pronunciation) before certain vowels. In short, although the distinction between consonants and vowels can be refuted on an underlying phonological level (as has been argued for some living languages, and always eventually ends up being the case for reconstructed ones such as Old Chinese and Proto-Indo-European when presented without agenda), a distinction still needs to be maintained on the surface phonetic level, but even there they are crucially still mutually dependent on one another in speech.

    Since we like to make speech as easy as possible on our articulators, we predict coming sounds and do not stray any further than is required in order to make the next one – the cumulative effect of this behavior leads to what I think is being termed as "articulatory setting". However, rather than being a "setting", it is simply a reflection of our articulators becoming used to certain articulatory postions and then maximising the efficiency in how we manipulate them in sequences. This is why the initial stages of learning a foreign language with very different sounds from one's own can be physically quite tiring but over time people's mouths adjust as appropriate. If someone wants to mimic a specific accent, I would say the most important aspect is to nail the articulation perfectly and the supposed "articulatory setting" will just be a natural consequence of it.

  6. Chris Button said,

    November 9, 2017 @ 2:23 pm

    I should probably add that it is this mutual dependence/influence which is behind diachronic sound change. To continue with the "ti" example above, that is why the "t" has palatalised to /tʃ/ or /ʃ/ in "question" or "nation" before what was originally a high front vowel that, along with "o", is now simply schwa in an unstressed syllable. In terms of synchronic analysis, we then need to arbitrarily decide whether /tʃ/ and /ʃ/ are simply allophones of /t/ or alternatively independent phonemes regardless of their historical origin.

  7. Jerry Friedman said,

    November 9, 2017 @ 2:33 pm

    Michael Watts: When I took a class in Chinese literature in translation from Prof. Y. K. Kao, I got the impression that all Chinese bisyllables were accented on the first syllable. But I couldn't have told you the tone of any syllable I heard him say.

    (At the beginning of the semester, he gave us a handout by Prof. Lynn White with some guidance on pronouncing Wade-Giles and Pinyin, but he said we weren't going to worry about pronunciation. I did notice a slight change in his facial expression when a student read "Liu I" as "Leeoo One."

  8. Jerry Friedman said,

    November 9, 2017 @ 2:33 pm

    Have a right paren: )

  9. ~flow said,

    November 9, 2017 @ 3:23 pm

    FWIW there's a 2016 paper by Zuzana Pospěchová available at on the Prosodic Transcription of Standard Chinese (PTR) that makes use of the system devided by Oldřich Švarný.

    A short sample:

    PY: Zuótian Zhāng lǎoshī qǐng wǒmen qù tā jiā chīfàn.
    PTR: Zuótiān, zhāng-lao³shī, qǐng-women-qu⁴ ta¹-jiā chī-fàn.


    Tones are indicated by both diacritics and superscript numbers; the diacritics are for full tones, and the numbers are for weakened tones ("weakened tone ictus-bearing syllables", the paper says). Thus, 'lao³shī' would appear to indicate that both syllables do bear tones, but the second syllable is more prominent. I found this interesting because that closely matches the way I learned the way, whereas David Marjanović reported that "my textbook had "teacher" as lǎoshi, acknowledging the fact that the second syllable is toneless"; maybe there's more than one correct way to pronounce this word.

  10. Bathrobe said,

    November 9, 2017 @ 6:58 pm

    I think lǎoshi is more appropriate for 老是 than for 老师.

  11. Eidolon said,

    November 9, 2017 @ 7:21 pm

    How does one "teach" lexical stress? I must confess this concept is alien to me, as I have never encountered, either in foreign language learning or native language learning, pedagogical approaches to imparting an instinct of lexical stress.

  12. Jonathan Smith said,

    November 9, 2017 @ 7:50 pm

    Thanks to MYL for the links above. A point relevant to some of the above discussion is at p. 14 of Duanmu's chapter: "intuitive agreement
    on the stress difference between two heavy syllables, such as [jou dəŋ] ‘oil lamp’ in Beijing, is hard to obtain." I.e., speakers simply do not agree which if either syllable of lao3shi1, etc., is stressed, or (more often) simply find the question mystifying. It's not clear which phonetic device(s) might be leveraged to mark emphasis in such cases. I see little benefit in a Švarný-style "phonetic" transcription.

    Obviously, intonation is an important thing, and "neutral tone" is an important thing not least as it is a kind of stress loss — driver of "tone sandhi" processes across Sinitic and typologically similar languages. But my naive view is that it would be hard to disprove that Mandarin disyllabic words all have underlying initial stress and that a complex package of factors [compare those noted in the "Corpus Study of Tone Reduction…" abstract above] determines whether or not the second syllable surfaces with neutralized tone + attendant segmental reductions. This seems basically consistent with Duanmu's argument… I think.

  13. Chris Button said,

    November 9, 2017 @ 10:32 pm

    Thus, 'lao³shī' would appear to indicate that both syllables do bear tones, but the second syllable is more prominent. I found this interesting because that closely matches the way I learned the way, whereas David Marjanović reported that "my textbook had "teacher" as lǎoshi, acknowledging the fact that the second syllable is toneless"; maybe there's more than one correct way to pronounce this word.

    This is probably due to the standard lexical stress for the word in isolation varying in certain contexts.

    To give an example from English following the John Wells approach, a word like 'funda'mental has two stressed syllables "fund" and "ment" to attract accents. In isolation, the nuclear tone falls on "ment" as the last stressed syllable to give 'funda\mental (dictionaries usually refer to this as secondary and primary stress since they don't mark intonation tones). However, when combined with 'problem (or rather \problem in isolation), a native speaker tends to de-accent the middle stressed syllable "ment" (in what John Wells calls "The Rule of 3") to leave the phrase 'fundamental 'problem (or rather 'fundamental \problem with the nuclear tone marked). Only the most proficient non-native speakers are ever going to produce something like that.

  14. Chris Button said,

    November 10, 2017 @ 6:38 am

    @ Jonathan Smith

    My understanding, if I understand Kratochvil's work correctly, is that there is a tendency towards iambic (unstressed – stressed) alternations across an intonation phrase, but that this is frequently overridden by trochaic (stressed – unstressed) compounds. I should now compare that to what Profs Mair and Liberman have kindly referenced here.

  15. Chris Button said,

    November 10, 2017 @ 6:58 am

    To continue with the "ti" example above, that is why the "t" has palatalised to /tʃ/ or /ʃ/ in "question" or "nation" before what was originally a high front vowel t…

    Ok so my example here isn't a great one in English because the palatalisation had already happened before the borrowing and we just borrowed the spelling convention! In any case, the general point is the same…

  16. Rodger C said,

    November 10, 2017 @ 9:41 am

    when a student read "Liu I" as "Leeoo One"

    Any relation to Malcolm the Tenth?

  17. Eli said,

    November 10, 2017 @ 2:00 pm

    @Chris Button: The palatalization of /t/ to /s/ in "nation" had already happened when English got the word (hence Middle-English spellings like "nacyon"), but the further palatalization of /s/ to /ʃ/ had not. Words with "stion" like "question" did not have phonological palatalization when they entered English; the preceeding /s/ had a protective effect on the following /t/. In modern French, these words have /sj/ and /tj/ respectively.

  18. Chris Button said,

    November 10, 2017 @ 2:26 pm

    @ Eli

    Good point – thanks for clarifying! That explains the different /tʃ/ or /ʃ/ reflexes in English (I actually speak pretty decent French so should really have caught that one myself)

    Following on from that, do you have any idea why "bastion" and by extension the name "Sebastian" resist palatalisation in British English (they tend to palatalise regularly in North American English)? Could it simply be down to low frequency use of the word "bastion" which in turn influences the pronunciation of the name?

  19. Chris Button said,

    November 10, 2017 @ 4:45 pm

    Now that I have this mulling around in my head, I bet it was the other way round with the name "Sebastian" being the exception (due to it being a name and perhaps having a convoluted route into English) and then influencing the noun "bastion" due to the homophony.

  20. Jefferson DeMarco said,

    November 12, 2017 @ 7:56 pm

    I have just begun an informal study of MSM on my own using an app called Memrise. One of the strengths of this app is that it has videos of a wide variety of native speakers saying the phrases we have been learning. I was amazed at how different the exact same words sounded from speaker to speaker. It doesn't take you very far, but seems pretty good for an introduction.

  21. Victor Mair said,

    November 12, 2017 @ 8:45 pm

    @Jefferson DeMarco

    Thanks! Sounds interesting. What you write complements another post I'll be making about these issues on Tuesday or Wednesday.

    Can you give us a link to a website describing this app?

RSS feed for comments on this post