When I began studying Mandarin over half a century ago, I very quickly developed a pet phrase  (kǒutóuchán 口頭禪 / 口头禅):  lǎoshí shuō 老實說 / 老实说 ("to tell the truth; honestly"), After I married one of the best Mandarin teachers on earth (Chang Li-ching) several years later, she corrected me when I said my favorite phrase.  She told me that I made it sound like lǎoshī shuō 老師說 / 老师说 ("teacher says").

Now, I know my tones very well, and can tell the difference between first and second tone.  I'm also able to produce them clearly and distinctly.  So it wasn't a problem with my being incapable of distinguishing tonally in my speech between lǎoshí shuō and lǎoshī shuō.  Something else was wrong with the way I said "lǎoshí shuō" ("honestly speaking") that made it sound like "lǎoshī shuō" ("teacher says").  I think it had to do with the sorts of things discussed in this post and in other posts to which it links:

"Stress, emphasis, pause, and meaning in Mandarin" (11/8/17)

Now that we have readily available the large Švarný-Tang-Harbsmeier MILK corpus described in the preceding post, with waveform & spectrogram, formants, pitch, and selection stats, it should be possible to study these subtle — but significant — aspects of Mandarin speech scientifically.  For the present moment, I will provide some other types of evidence in support of the importance of phonological phenomena such as stress, emphasis, elongation, shortening, and pausation* for Mandarin speech production.

[*I realize that this is not an established technical term in linguistics, but I strongly believe that things like pauses, rhythm, cadence, etc. are extremely important in the way languages are pronounced.  Even in my Literary Sinitic / Classical Chinese classes, I always have the students read the sentences aloud before they attempt to translate and explicate their meaning.  Often, just from the rhythm and cadence of their reading, including where they put pauses, I can tell whether they have a grasp of the basic structure of the sentence in question.]

Kevin Nelson has provided this very important information about the pronunciation of lǎoshī 老师 ("teacher"):

When I worked at a university in China, I noticed that in 老師, the 師 always seemed stressed and longer… I'm not sure how to explain it. I never saw that written up in any textbook about speaking Chinese. It is the "longer" part that is hard to describe… it is like making a quarter note into a dotted quarter note. (I was in Xi'an.)

I love this comment!  It is extremely important for understanding how Mandarin is spoken in reality, not according to a dictionary or a textbook.  Of course, we can't do without good textbooks and good dictionaries (lord knows I've done my share in this life to produce them!), but we also need to pay attention to the tiniest details of actual spoken language.  I dare say that it was some feature such as that pointed out by Kevin — and not the tones —  that foiled me in my attempts to pronounce lǎoshí shuō 老實說 / 老实说 ("to tell the truth; honestly") properly.  More about that below.

This morning in my "Language, Script, and Society in China" class, I tested the students on how they said lǎoshī 老师 ("teacher") vs. how they said lǎoshí 老实 ("honest").  Every single one of the native speakers (about ten) lengthened the second syllable of lǎoshī 老师 ("teacher"), exactly as Kevin described.

Something similar happened when I was in Nepal (1965-67).  I was extremely fluent in spoken Nepali, being able to speak rapidly, clearly, and at length on any subject I wished.  But there were certain niceties that escaped me, such as the precise difference between cār चार् ("four") and cha ("six").   I was familiar with all the phonemic features (including vowel length and aspiration) of these two words, but when I put them all together, what came out of my mouth just didn't sound right to native speakers.  It was very frustrating, because no matter how hard I tried something was lacking in the way I pronounced cār चार् ("four") and cha ("six").

There was another word that I always innocently, but most embarrassingly, mispronounced in Nepal.  What I intended to say was jhiknu ("take out; extract"), but what I ended up with was more like ciknu ("chiknu"), i.e., "fuck". (The initial of this word is unvoiced unaspirated, like the Mandarin initial written 'z' in pinyin.)  It was bad enough when I tried to say jhiknu, but when I sang Old MacDonald and got to the part about "Here a chick there a chick, everywhere a chick-chick!", there would invariably be uproarious laughter.

As I've pointed out endlessly on Language Log, tones are not sacrosanct in Mandarin and other Sinitic languages.  The closer we get to observing spoken Mandarin in the raw (not transcribed according to any preconceived standards), the more it becomes clear that there are all sorts of changes in the way words are pronounced in real life that are not reflected in standard, prescriptive romanizations.

Yesterday one of my graduate students from China told me about a foreign student who has a superlative command of the tones for a couple thousand characters, but when she speaks Mandarin, it is excruciating for native speakers to listen to her.  My student says that this foreign student sounds as though she is parodying what Mandarin sounds like.

For one example of the sorts of transformations that regularly occur in actual speech, a linguist informant recently wrote to me:  "I should also mention that neutral tones after second tones nowadays sound like fourth tones."  This is interesting, because I wrote about this exact phenomenon already over two years ago in this post:

"Dissimilation, stress, sandhi, and other tonal variations in Mandarin" (8/26/14)

Further evidence for the shift of neutral to 4th tone is found in the pronunciation of wáwa 娃娃 ("doll; baby; child; moppet"), which the dictionaries tell us should have a neutral tone on the second syllable.  The graduate students from China who come to Penn are much amused by the convenience stores named "Wawa" (the Ojibwe word for "wild goose") in the Philadelphia area.  When discussing the similarity of "Wawa" to wáwa 娃娃 ("doll; baby; child; moppet"), they always pronounce the latter very clearly as "wáwà".  I was amazed to discover that even Google Translate's recording of the term pronounces it that way, even though their Pinyin has wáwá.

All of this shows that the phenomenon of neutral –> 4th tone is something that exists in the pronunciation of native speakers (every single one of my graduate students from the mainland pronounces it that way).  And there are many other similar adjustments in Mandarin and the other Sinitic topolects as they are spoken in diverse places in China and in the Sinophone diaspora.

So we have to be prepared to accept that the reality of spoken tonal languages does not always and necessarily mechanically follow the prescriptive tones given in dictionaries and textbooks.  With that in mind, let's return to the way I uttered my pet phrase, lǎoshí shuō 老實說 / 老实说 ("to tell the truth; honestly"), which made it sound to native speakers like lǎoshī shuō 老師說 / 老师说 ("teacher says").

In an attempt to narrow down where the problem lies, let's look at a whole series of common Mandarin words that consist of the syllables "lao" and "shi" in various combinations of tones:

lǎoshī 老师 ("teacher")
lǎoshì 老是 ("always")
lǎoshí 老实 ("honest")
lǎo shì 老式 ("old-fashioned")
lǎo shī 老湿 ("Always Wet" — nickname of a person)
lǎo shí 老石 ("Old Stone" — name of a person / film)
lǎo shí 老十 ("Old Ten" — generational name of a person)  I actually knew a guy who was called 十三老 ("Old Thirteen") — he was a Dungan and pronounced that "Sushanlo".  In a big, extended family, you could easily have an "Old Ten".

I maintain that if you pronounce these words with all the "correct" tones but do not pay attention to suprasegmental and prosodic features, you will sound unnatural to native speakers.

Although these categories may not even exist in basic phonological analysis, I'm sure that some of these disyllabic terms do have at least micropauses between syllables and differential stress, because I've been listening very carefully to how people say these things for fifty years and can detect how such features consistently occur in spoken language.

Aside from the fact that it is my natural bent, part of the reason why I am so attentive to such details is that, at Harvard, I was fortunate to study with Rulan (Iris) Chao Pian who had a handbook for the pronunciation of Mandarin that was quite long (I think that it was based on the work of her father, Y[uen] R[en] Chao, the renowned Chinese linguist), and many pages were devoted to the explication of rather arcane sequences of tones.  This handbook also paid attention to prosodic phrasing and other non-tonal phenomena.

Many of the points covered in the Chao-Pian pronunciation handbook are systematically presented in Y. R. Chao's masterful A Grammar of Spoken Chinese (Zhōngguóhuà de wénfǎ 中國話的文法) (Berkeley, Los Angeles, London:  University of California Press, 1968).  Under "stress", the index (p. 843a) lists the following topics:

and tone as word markers

as marker of logical predicate

contrasting str.

medium str.

neutral tone, or zero str.

occurrence of

normal str.

str. patterns in compds.

Chao discusses how stress, rapidity of speech, and other prosodic features affect vowel height, voicing, aspiration, and other phonemic qualities.  Charles N. Li and Sandra A. Thompson, in Mandarin Chinese:  A Functional Reference Grammar (Hànyǔ yǔfǎ 漢語語法) (Berkeley, Los Angeles, London:  University of California Press, 1981), p. 9, summarize some of Chao's observations on neutral tone as follows:

If a syllable has a weak stress or is unstressed, it loses its contrastive, relative pitch and therefore does not have one of the four tones….  In such a case, the syllable is said to hve a neutral tone.

They go on to explain that the neutral tone is not always at the same pitch, but is dependent upon the tone which precedes it.

Even if a student grasps all of Chao's findings on the phonological implications of prosodic and suprasegmental features and attempts to apply them conscientiously, their speech will sound awkward and unnatural.  The only way to speak Mandarin (or any other Sinitic topolect) well is to emulate the connected flow of whole sentences as uttered by native speakers, rather than attempt to insert isolated lexical items in grammatical templates.

Coming back to the question of how "lao" and "shi" in various tones are linked, one of my mainland students made the following changes to the items in my list of "laoshi" terms:

lǎoshī 老师
lǎoshì 老是 –> ˈlǎoˈshì
lǎoshí 老实  –> lǎo shi
lǎo shì 老式
lǎo shī 老湿
lǎo shí 老石 –> lǎoshí
lǎo shí 老十 –> lǎoshí

There were many other variations in the way other members of the class pronounced these terms.

I also had all the students say this sentence:  lǎoshī lǎoshì lǎoshí 老师老是老实 ("the teacher is always honest").  As pronounced by the students, the three words of the sentence differed not just in their tones, but also in the lengths of the syllables, whether there were short pauses (micropauses) between syllable, the relative speeds with which the syllables were pronounced in the different vocables, whether some of the syllables had stress, and so forth.  One of the students commented:  "I think the last character should be neutral instead of the second tone just because it's at the end of the sentence".

Although there may be no well-established standards or procedures for specifying and measuring the types of suprasegmental and prosodic features described in this post, as well as the nonprescriptive tonal transformations that we have observed, there can be no doubt that they are real, that they actually exist in the way people speak, and that they make a difference in the way others who hear them perceive what they say.

To end where this post began, probably the reason my "lǎoshí shuō 老實說 / 老实说" ("to tell the truth; honestly") was misheard by others as "lǎoshī shuō 老師說 / 老师说" ("teacher says") is not because I got my tones wrong, but because I drew out or emphasized the second syllable too much.  After all these years of practice and hearing others say it correctly, I'm doing better with it now than when I began speaking Mandarin half a century ago.

Appendix (advanced "lao + shi" studies)

láoshī 劳师 ("tire the troops; take greetings and gifts to army units")
láoshī dòngzhòng 劳师动众 ("mobilize the masses")
láoshī xíyuǎn 劳师袭远 ("mobilize troops to attack at a distance")
láoshí / láoshi 牢实 ("solid; strong; firm")
lǎoshī 老师 ("teacher")
lǎoshīfu 老师父 ("elderly teacher; mullah")
lǎoshīfū / lǎoshīfu 老师傅 ("elderly teacher; mullah; experienced worker; master craftsman")
lǎoshīsùrú 老师宿儒 ("elderly, learned Confucian scholar")
lǎoshí / lǎoshí 老实 ("honest; frank; well-behaved; simpleminded; naive; easily taken in")
lǎoshíbājiāo 老实巴交 ("cautious and timid; ingenuous")
lǎoshíbājiǎo / lǎoshibājiāo 老实巴脚 ("honest [fool]; simple; open-faced; good-natured")
lǎoshígēdā / lǎoshigēda 老实疙瘩 ("honest and trustworthy person")
lǎoshíhuà / lǎoshihuà 老实话 ("frank, straightforward speech")
lǎoshítóu / lǎoshitóu 老实头 ("a naive person; simpleton")
lǎoshì 老式 ("old-fashioned")
lǎo shì 老是 ("always")
lǎoshì 老视 ("presbyopia")
lǎoshìyǎn 老视眼 ("presbyopia")
láoshígǔzi / láoshígúzi 劳什骨子 ("detestable thing")
láoshígǔzǐ 牢什古子 ("disgusting thing")
lāoshízǐ / lāoshízi 捞什子 ("encumbrance; burden")
láoshízǐ / láoshízi 劳什子 / 牢什子 / 僗什子 ("obnoxious / unpleasant / disagreeable / nasty / distasteful / offensive / objectionable / unsavory / unpalatable / off-putting / awful / terrible / dreadful / frightful / revolting / repulsive / repellent / repugnant / disgusting / odious / vile / foul / abhorrent / loathsome / nauseating / sickening / hateful / insufferable / intolerable / detestable / abominable / despicable / contemptible / horrible / horrid thing")

N.B.:  In speech, the "lao + shi" portion of these terms would not uniformly and mechanically be articulated according to the pronunciations specified in dictionaries.

[Thanks to Mark Liberman, Heidi Harley, Boyd Mikhailosvsky, Tod Ragsdale, William Page, Bill Hanson, Carl Hosticka, Bob Badgley, Wayne Stinson, Jinyi Cai, and Tom Bishop,]


  1. Chris Button said,

    November 15, 2017 @ 1:57 pm

    I realize that this is not an established technical term in linguistics

    I think the technical term here is IP or "Intonation Phrase", although that refers to the chunk of speech itself rather than the actual pause between chunks.

  2. WSM said,

    November 15, 2017 @ 2:36 pm

    Good exercise for this kind of thing is watching Sichuan TV, where they still speak Madnarin but with sharp deviation from the standard, particular with regards to tone.

  3. Laura Morland said,

    November 15, 2017 @ 3:27 pm

    If you'll excuse a non-Sinitic response to your beautiful and detailed post: I train the readers at my church in Berkeley, and for a few years we had a grad student from New Zealand, with a lovely accent, and everyone enjoyed hearing her read. She informed me that when she arrived in California, nobody could understand her! "It wasn't my New Zealand vowel," she explained, "but the stress."

  4. languagehat said,

    November 15, 2017 @ 4:19 pm

    A superb post, and (as Laura Morland said) it applies to much more than just Chinese. I've never heard a convincing example of spoken Ancient Greek on those videos that purport to provide one, because the people speaking are working so hard to make sure the consonants, vowels, and pitches are correct that they don't sound like they're speaking a real language. I've even heard this complaint about actors speaking Klingon; it may not be "real," but if it's to be believable as a spoken language it has to sound like one, not like a careful combination of painfully learned sounds.

  5. David Marjanović said,

    November 16, 2017 @ 6:01 am

    At the end of this post there's a link to a video of how to read Ancient Greek without "sounding like yodelling Martians". It may still be a bit too regular, and there are a few occurrences of /h/ coming out with a modern accent as [x], but other than that I think it's pretty good.

  6. B.Ma said,

    November 16, 2017 @ 6:16 am

    languagehat has put into much better words the criticism I had of the Old Chinese recital linked in http://languagelog.ldc.upenn.edu/nll/?p=34405

    Victor's problems with pronouncing some Nepali words despite knowing what they should sound like mirror my experience trying to learn Polish. I tried with native speakers in person but I just can't seem to reproduce the sound of the language convincingly even though listening was not a problem. (Maybe having proper instruction from someone trained in linguistics would help, but I was only dabbling as part of a general interest in languages.)

  7. Dave said,

    November 16, 2017 @ 8:00 am

    I have no idea how anciently correct it may be, but
    "Μα Τον Δια"
    manages to sound much closer to Eurovision than to yodelling martians.

    (YouTube, which has a surprisingly large amount of Soviet Country&Western, and even a smattering of Electro House Yodel, somehow completely fails —at least in my bubble— to turn up any Marsianer/Marsmensch/Marsbewoher Jodel)

  8. Victor Mair said,

    November 16, 2017 @ 8:35 am

    From Bill Page:

    This is way above my level of competence, but maybe the problem is simply that it's difficult for foreigners to pronounce a word that consists of a third tone followed by a second tone. Given foreigners' stress patterns, it may be more natural for them to follow a third tone by a first tone.

    When I was studying Mandarin at the Army Language School (now the Defense Language Institute), our whole class had a problem formulating questions that ended with a fourth tone, e.g., "Ni hen lei ma?" ("Are you very tired?") Almost always, we pronounced "lei" with a rising tone, because it came at the end of a question, which in English almost always ends in a rising tone. I believe this is called native-language interference. In fact, our teachers started a "fourth tone club" for students who had trouble pronouncing the fourth tone. In English, it seems natural only at the end of a sentence, or if the speaker is angry. So we got the tone right if we said, "Wo hen lei"–but not if we asked, "Ni hen lei ma?"

    I always had trouble with the name of a mountain I used to visit often south of Taipei: Shihtoushan, Lion Head Mountain. The "shih" has a first tone. Taiwanese people often understood me to be saying Shihtoushan, Stone (or Rock) Mountain, where the "shih" has a rising, second tone. I don't know whether this was because I had faulty pronunciation or because they thought Stone Mountain made more sense than Lion Head Mountain, which they may not have heard of.

    Ah, well. As one of my instructors used to say, "Jungwen hen rungyi!" [VHM: "Chinese is easy."]

  9. languagehat said,

    November 16, 2017 @ 9:01 am

    At the end of this post there's a link to a video of how to read Ancient Greek without "sounding like yodelling Martians". It may still be a bit too regular, and there are a few occurrences of /h/ coming out with a modern accent as [x], but other than that I think it's pretty good.

    It's better than many, but yeah, way too regular, especially in doggedly making long vowels twice as long as short ones. That's not the way language works, but of course a modern Greek (whose native language has collapsed long and short) will have a hard time finding a middle ground. An impressive attempt, though. (I modestly confess that the best I am aware of is my own, playing the god Dionysus in a Classics Department performance of The Bacchae many years ago.)

  10. Mat Bettinson said,

    November 16, 2017 @ 9:04 am

    My intuition is that realisation of tones is influenced by the imperative to discriminate from a perceptively similar tone that might occur in this context. Perhaps this is obvious but to use Victor's example, if there was not 老實 (shi2 rising tone) or 老實 was not likely to occur in the same position, there might be less imperative to emphasize the steady tone of 師. Hyperarticulation is a well studied phenomena that occurs when people perceive that they may be misunderstood (it's explored a lot in the clear speech literature, starting with Lindblom's H&H theory in 1990).

    So now Victor has drawn attention to 老師… I'm suddenly aware how the shi is unusually long. Well, it's more like 'si' in Taiwan but still. So if the discrimination motivation was all that was at work, we wouldn't stress shi at all times right? But we seem to. So now I'm left wondering… what if stress patterns like this just get preserved in words. Or it's just simpler and laoshiiii has adopted some of the plaintive sense of a student crying "but misssssssss".

    Hmm, if you had a good data set… this might be possible to test. Or someone has done it already and this is just a bunch of amateur rambling.

  11. Richard Sears (Uncle Hanzi) said,

    November 18, 2017 @ 5:19 pm

    I have no conscious clue about tones in Mandarin, but I speak fluent Mandarin and everyone seems to understand me. I started learning Mandarin on the streets of Taiwan after the age of 22.

    I have studied a lot about how people recognize and memorize and understand characters and I feel sure that there are different ways in which different people process language.

    I compensate for my lack of tones by forming complete sentences all the time, which creates redundancy and thus leads to better understanding.

    I think I am a very humorous guy in Mandarin and I am always making people laugh, but my humor, in most cases is based on finding inconsistencies in everyday life as opposed to making up some pun.

    I find that one of my American friends who has perfect tones sometimes cannot understand me and is always asking me which character I mean, where a native Chinese will be able to understand me.

    An interesting observation about perception: In China when you pay for something, they take your money and then tell you how much you just gave them. For example, if I give them 100 yuan, they will say 收你一百shōu nǐ yībǎi (received 100 yuan). At first, I thought I heard 送你一百 sòng nǐ yībǎi (give you 100 yuan). I have listened to many people and my ears always hear 送你一百 but I know it is 收你一百.

    The dictionary: Several years ago I made a speech database for all the characters in Mandarin, Cantonese and Taiwanese. It is impossible to find anyone who speaks like the dictionary. I had to find someone who could pronounce it like the dictionary said it was supposed to be pronounced and read the pinyin only from the dictionary. Otherwise the speaker and the dictionary would not match.

  12. Eric said,

    November 19, 2017 @ 5:01 am

    @Bill Page (via Victor)–It's also possible the problem is specifically the third tone. As far as I can tell, it's still very common for teachers to emphasize tone three as a dipping tone, and then in a later lesson–after students have diligently drilled that dip into their brains–to also mention that there's a half-third tone that doesn't dip and is better described simply as *low*. This dipping description makes learning any contextualized version of the third tone much more difficult, with a tone 3+ tone 2 sequence being more or less impossible to get right. Well-trained and well-informed Chinese teachers are starting to think about how to address this problem better, but it's still pretty much standard that the pitch pattern of tone three is described as 2-1-4 (even though repeated phonetic measurements don't support that as a good generalization of tone three, even in isolation).

    I also think proper attention to neutral tones would be exceedingly helpful. It was like turning the lights on for me the first time I saw neutral tones described accurately. My ears suddenly heard everything anew. Bad explicit descriptions can really slow learners down.

  13. David Marjanović said,

    November 19, 2017 @ 12:12 pm

    our whole class had a problem formulating questions that ended with a fourth tone, e.g., "Ni hen lei ma?" ("Are you very tired?") Almost always, we pronounced "lei" with a rising tone, because it came at the end of a question, which in English almost always ends in a rising tone.

    I project a °?!" intonation into such cases (and then add the ma afterwards).

    doesn't dip and is better described simply as *low*

    Fortunately, that was right at the beginning of the textbook I got to use.

  14. David Marjanović said,

    November 19, 2017 @ 12:13 pm

    Interesting typo for "?!".

