The conundrum of singing with tones

« previous post | next post »

This is a problem we've raised and discussed many times on Language Log, and I've always been dissatisfied with the results.  With the following video, I've finally found a scholarly, convincing approach.

Julesy, "How do you sing in a tonal language like Chinese?" (a week ago)

As PhD Julesy convincingly explains — with evidence — it's as I've always thought.  In modern vernacular Sinitic languages like Mandarin, usually "the tendency is to allow the overall melody to dominate."

 

Selected readings

 



21 Comments »

  1. Stephen Goranson said,

    May 30, 2025 @ 7:10 am

    Great video. I had wondered about that.

  2. Randy Alexander said,

    May 30, 2025 @ 10:46 am

    As a composer living in China this is a big issue for me and I'm always looking for solutions and enhancements. I think it's a rather complex issue and how much you can deviate from what might be perceived as the inherent melodic contour of the tones and still be intelligible depends on how conversational the words are and whether you have expected collocations. If you have lots of common phrases then they will be easily understood no matter what the melody is, but if you have a lot of uncommon words then even if you follow a matching melodic contour the words will not be easily intelligible.

  3. Jonathan Smith said,

    May 30, 2025 @ 11:28 am

    The comment about "allowing the melody to dominate" applies specifically to Mandarin pop songs. A key point in the video is emphasis on tone-melody compatibility in Cantonese.

    The same observations may be found in the *comments* of the first linked post; see esp. remarks/cited works from Bob Ladd: "tone-melody match is much stricter in Cantonese [than in Mandarin]", "the basic [compatibility] principle […] is very widespread in songs in many unrelated tone languages", etc.

    Such matching is also emphasized in e.g. Taiwanese songs, and one may speculate along with the video that the nature of certain southern Sinitic tonal systems — specifically, multiple contrasting level tones or contrastive height generally — is a factor. However, it is also possible that Mandopop is just a young and heavily Western-influenced genre (as pointed out in the comments to the video)…

  4. cameron said,

    May 30, 2025 @ 5:32 pm

    I hope she gets around to doing a video about how tones are handled in Thai pop songs. Like Cantonese, Thai has three "level" tones, so composition practice might be quite similar in the two languages.

    I'd also be curious to see a similar analysis of Vietnamese – six tones, two of them "creaky" – how do the songwriters deal with them?

  5. David Moser said,

    May 31, 2025 @ 2:39 am

    Great website! I've paid attention this problem for a number of years. I've discovered that overall Mandarin tones are not as essential as we might think for understanding speech. You can take a series of Mandarin sentences and replace all the syllables with first tone — i.e. producing the stereotypical "robot speech" — and virtually all the sentences are easily comprehensible to native speakers. (As long as the subject matter is not too esoteric, or if the sentences involve classical Chinese.) A similar trick can be carried out in English. Take a series of sentences and replace all of the vowels with a short 'e' sound. The sentences will, for the most part, be comprehensible. So therefore, it's not surprising to me that Mandarin song lyrics do not need to correspond to tonal speech patterns. When I was a kid, I listened to Simon & Garfunkel's "The Sound of Silence." I noticed that several sections of the lyrics outrageously violate the conventional syllabic stress of the words. For example: "Because a vision soft-LY cree-PING/ Left its seeds while I WAS slee-PING." "People talking withOUT spea-KING/ People hearing withOUT listen-NING…" etc. Yet I had no trouble at all understanding the words. I began to experiment with my brother, reading sentences outloud, while putting huge stress on alll the wrong syllables. Of course, he had no problem understanding me. My conclusion: syllable stress is just not all that important in sentence comprehension. Interesting, though, that Cantonese pop songs tend to adhere more to the tonal properties of the lyrics. Maybe there's a kind of threshold wherein the more tones you have, the more it is necessary to adhere to the standard tones in singing. Thai would be another language to test this phenomenon.

  6. Victor Mair said,

    May 31, 2025 @ 2:49 am

    Brilliant observations, David!

    Thanks for your revealing contributions.

  7. JMGN said,

    May 31, 2025 @ 6:23 am

    According to The intonation pattern of the sentence is independent of the shape of individual tones, but rather defnes the range within which these tones are pronounced. In other words, the pitch pattern of each tone is realized within the range defned by intonation. Thus, the gray area represents the dynamics of pitch, duration and voice range for the sentence as a whole:
    https://imgur.com/a/grammar-of-mandarin-jeroen-wiedenhof-wnwOxvp

  8. Yves Rehbein said,

    May 31, 2025 @ 7:21 am

    @ David Moser, that would explain why they get by with ⅓ of syllables matching the musical tones (2:10 – 3:50).

    It is a fascinating topic though as expected the video is a little bit short on music theory. In particular, the Western scale is theoretically irrelevant for classic Chinese, though "music theory" usually means Western, so that's not unexpected. If they sing Ode to Joy (歡樂頌), naturally there are no corresponding tones and text comprehension would be rare.

  9. Victor Mair said,

    May 31, 2025 @ 9:36 am

    From Daan Pan:

    I don’t believe what she interprets is valid. When a native speaker of Chinese sings, this singer simply follows the music, and technically, there is no way to blend the tones with the musical notes.

    Otherwise, this singer will zou-diao (走調;lit. mis-sing the tune, i.e., the notes).

  10. David Marjanović said,

    May 31, 2025 @ 11:45 am

    The video makes an interesting assumption: that you have the tune already, and then you compose the lyrics to fit the tune. The traditional Cantonese approach is the opposite: you compose the lyrics, then you exaggerate the range of your speaking voice so it becomes possible to turn all six tones into distinct level tones, you read the lyrics aloud with this exaggeration, and… that's it. You're singing. That is the tune.

    Of course Cantopop doesn't limit itself to just 6 pitches, but I'm not surprised the tones and the pitches still match 92% of the time.

    And the 37% of Mandopop are still a better match than random.

  11. David Marjanović said,

    May 31, 2025 @ 11:51 am

    (Another case of tunes being determined 100% by the lyrics, BTW, is Vedic chanting. No phonemic tones there, except marginally in the youngest parts of the Vedas, but there are still three pitches: stress is exaggerated into a rise that stretches, in isolation, from low to high across three syllables.)

  12. Bob Ladd said,

    May 31, 2025 @ 12:33 pm

    Julesy’s video is indeed based on recent research, but it kind of skates around what to me is the key finding of pretty much everything that’s been done on this topic since Marjorie Chan’s work in the 1980s. Namely, the secret to matching tones in song is not primarily a matter of squashing the tone pattern of EACH SYLLABLE onto the corresponding sung note, but is centrally about getting the pitch direction FROM ONE SYLLABLE TO THE NEXT to match the melodic transition between the corresponding tones. This is the stuff Julesy is talking about in her section on “similar settings” and “contrary settings”. This idea goes right back to Chan’s original discussion of Cantopop and is central to much of the serious research on tone language singing since then, including work on African languages as well (e.g. McPherson & Ryan in Language, 2018; Li, Carter-Enyi, and Aina in Analytical Approaches to World Musics, 2024).

    The fact that Mandarin pop music manages OK without worrying much about text-setting doesn’t contradict any of this. It’s clear that tone-melody matching is more important in some languages and musical traditions than in others, for all kinds of reasons. But in languages and musics where it’s important, it seems to work in great measure by matching tone transitions and musical transitions.

    @Cameron: There are similar findings for Thai and Vietnamese, mostly in conference papers (Thai: Ketkaew and Pittayaporn 2015; Vietnamese: Kirby and Ladd 2016) – Google Scholar should find them if you’re interested.

  13. Bob Ladd said,

    May 31, 2025 @ 12:36 pm

    David Marjanović's comment appeared while I was posting mine, but he's right: the notion of "text-setting" implies that the tune is there first, which is not the case in lots of musical traditions. The paper I cited by Li, Carter-Enyi and Aina discusses this.

  14. Jonathan Smith said,

    May 31, 2025 @ 1:19 pm

    In more practical terms, on these traditions complete lyrics don't produce complete tunes algorithmically of course; rather, lyrical *phrases* tend to suggest compatible melodic *phrases*, then the two might be mutually adjusted, longer series might be strung together, etc…. whence song. The effect incorporates Bob Ladd's observation about note-pair transitions.

    Listen e.g. the first line here, where each three-syllable phrase is broadly compatible with the spoken version but (e.g.) beh of the last phrase is *an octave lower* than its tonal counterpart pak of the first phrase (phonetic tone values shown-ish by trailing dashes):

    Sai — pak ‾ hoo — tit _ tit _ loh ‾ chit ‾ a ‾‾ hi ╱ beh ‾ chhoa __ boo ╲
    (the storm-rain is pouring down Mr. Carp is taking a wife)

    Or e.g. the 30 seconds beginning <a href="here, where the songwriter tells of being inspired by a couple specific sentences from a radio announcer then setting them to a melody (sung for our benefit)…

  15. Yves Rehbein said,

    May 31, 2025 @ 6:33 pm

    There's a difference between tonal and melodic music. Naturally, tonal phonology does produce a measurable tune as it were algorithmic.

  16. Victor Mair said,

    June 1, 2025 @ 2:27 am

    From Daan Pan:

    Singing a Chinese song is different from intoning a Chinese poem. We may have heard some scholars “intone” poems from Shi-jing or Tang poems, but whether they did so the same way ancient people did remains uncertain. Many contemporary intoners claim authenticity but how to verify their claim remains to be seen.

    (Also posted as a comment to the Julesy tones and intonation thread.)

  17. David Marjanović said,

    June 1, 2025 @ 7:03 am

    …well, when the Shijing was composed, there weren't tones yet…

  18. John S. Rohsenow said,

    June 1, 2025 @ 2:29 pm

    Decide for yourself? See/listen:

    https://www.facebook.com/reel/713813288181007

  19. Victor Mair said,

    June 1, 2025 @ 4:13 pm

    Bingo, John!

  20. James Kirby said,

    June 2, 2025 @ 8:23 am

    @cameron: in my conversations with Vietnamese composers, several mentioned that they typically try to avoid ngã and nặng where they can, because they're a hassle to set. Given the frequency distribution of the tones, that’s generally pretty easy.

    @David Marjanović: on text-before-tune: it depends. In Cantopop the tune seems to have typically come before the text. I found this interview enlightening.

    That said, there are clearly traditions in which the (what some might term the "surface realisation" of an "underlying") skeletal tune is shaped by the text. Some Thai court song works like this; so does Vietnamese chầu văn. For some examples, with references to the original literature sources, see here.

    What I have unfortunately never written up, or investigated properly, is the interaction between the skeletal melodies and the metrical demands of the rhyme schemes, like the lục bát poems which form the basis for many chầu văn lyrics. This is similar to other cases of intoning/chanting of classical poetry, but involves the interaction with some type of melodic structure.

    Finally, re the observation that "37% of [similar transitions in] Mandopop are still a better match than random" – I suspect what is more relevant is the fact that that 77% of transitions are non-opposing. Just as in stress-beat text-setting, the operative constraint isn't so much "be similar" but rather "avoid clash".

  21. Chris Button said,

    June 2, 2025 @ 9:26 pm

    In Cantopop the tune seems to have typically come before the text.

    That makes sense given the number of songs with Mandarin and Cantonese versions. And let's not forget the many covers. I have fond memories of listening to Faye Wong's 夢中人 cover of "Dreams" by the Cranberries. And the internet tells me that she also recorded a Mandarin version too.

    The opposite idea of text before tune is really interesting. I had never even thought about it before.

RSS feed for comments on this post · TrackBack URI

Leave a Comment