{"id":58120,"date":"2023-02-28T11:20:00","date_gmt":"2023-02-28T16:20:00","guid":{"rendered":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=58120"},"modified":"2023-02-28T11:53:17","modified_gmt":"2023-02-28T16:53:17","slug":"syllable-rhythm-in-english-and-mandarin","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=58120","title":{"rendered":"Syllable rhythm in English and Mandarin"},"content":{"rendered":"<p>I've always been skeptical of the distinction between \"stress-timed\" and \"syllable-timed\" languages, at least as a claim about the phonetic facts of speech timing as opposed to the psychological dimensions of speech production and perception. Syllable durations in all languages vary widely, due to differences in the intrinsic durations of different vowels and consonants, the effects of phrasal position and emphasis, and many other factors. As a result, inter-stress intervals in languages like English or German are not actually \"isochronous\", and neither are inter-syllable intervals in languages like French or Spanish. And it's not even true that speakers generally make such intervals closer to isochronous than the relevant timing factors would otherwise predict.<\/p>\n<p>But in \"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=8116\" target=\"_blank\" rel=\"noopener\">Speech rhythms and brain rhythms<\/a>\", 12\/2\/2013, I showed a <a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/TIMIT1.png\" target=\"_blank\" rel=\"noopener\">plot of the average syllable-scale power spectrum<\/a> in the 6300 American-English sentences in the <a href=\"https:\/\/catalog.ldc.upenn.edu\/LDC93S1W\" target=\"_blank\" rel=\"noopener\">TIMIT dataset<\/a>, which indicated a key periodicity at 2.4 Hz. I noted that \"2.4 Hz corresponds to a period of 417 msec, which is too long for syllables in this material. In fact, the TIMIT dataset as a whole has 80363 syllables in 16918.1 seconds, for an average of 210.5 msec per syllable, so that 417 msec is within 1% of the average duration of two syllables. [&#8230;] One hypothesis might be that this somehow reflects the organization of English speech rhythm into 'feet' or 'stress groups', typically consisting of a stressed syllable followed by one or more unstressed syllables.\"<\/p>\n<p>I added that \"Unfortunately there aren't any datasets comparable to TIMIT in other languages; but I'll see what I can come up with as a more-or-less parallel test in languages that are said to be 'syllable timed' rather than 'stress timed.\" Almost ten years later, I've never delivered on that promise, though it would have been easy to do so. So for today's Breakfast Experiment&#x2122; I'll show the same analysis for the 6300 sentences in the recently-published <i><a href=\"https:\/\/catalog.ldc.upenn.edu\/LDC2021S03\" target=\"_blank\" rel=\"noopener\">Global TIMIT Mandarin Chinese<\/a><\/i> dataset.<br \/>\n<!--more--><\/p>\n<p>So here's the plot for American English TIMIT:<br \/>\n<a href=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/TIMIT1.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/TIMIT1.png\" width=\"490\" \/><\/a><\/p>\n<p>And the same thing for Mandarin Chinese Global TIMIT:<\/p>\n<p><a href=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/CMN1a.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"https:\/\/languagelog.ldc.upenn.edu\/myl\/CMN1a.png\" width=\"490\" \/><\/a><\/p>\n<p>This time, the key periodicity seems to be at 4.59 Hz., which corresponds to a duration of 218 msec. This is not much greater than the average syllable duration of 197 msec. in that dataset (88848 syllables in 17457 seconds = 0.1965 seconds per syllable). So maybe (this variety of) Chinese is (sort of) syllable-timed (on average) after all?<\/p>\n<p>The \"stress timing\" hypothesis, as far as I know, originated in Daniel Jones, <em><a href=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/3\/37\/An_outline_of_English_phonetics_..._with_131_illustrations_%28IA_cu31924027389505%29.pdf\" target=\"_blank\" rel=\"noopener\">An Outline of English Phonetics<\/a><\/em>, 1918 (p.106):<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #800000;\">Vowel length also depends very largely on the rhythm of the sentence. There is a strong tendency in connected speech to make<\/span><br \/>\n<span style=\"color: #800000;\">stressed syllables follow each other as far as possible at equal distances.<\/span><\/p>\n<p>Jones (and subsequent British phoneticians) were so fixated on this idea that they totally ignored the most important contextual effect on duration, in English as in all other languages, which is pre-boundary lengthening. First documented (for French) by <a href=\"https:\/\/fr.wikipedia.org\/wiki\/Jean-Pierre_Rousselot\" target=\"_blank\" rel=\"noopener\">Jean-Pierre Rousselot<\/a> several decades earlier, pre-boundary lengthening seems to have been ignored by British phoneticians until the second half of the 20th century. (If you know of exceptions, please send me the details&#8230;)<\/p>\n<p>There have been many debunking attempts over the years: a few random examples are Pier Marco Bertinetto, \"<a href=\"https:\/\/www.researchgate.net\/profile\/Piermarco-Bertinetto\/publication\/284699652_Reflections_on_the_dichotomy_%27stress%27_vs_%27syllable-timing%27\/links\/56c6dd5808ae8cf82900da01\/Reflections-on-the-dichotomy-stress-vs-syllable-timing.pdf\" target=\"_blank\" rel=\"noopener\">Reflections on the dichotomy \u2018stress\u2019 vs.\u2018syllable-timing\u2019<\/a>\" (1989); Richard Cauldwell, \"<a href=\"https:\/\/www.speechinaction.org\/wp-content\/uploads\/2014\/10\/Stress-timing-Eger-paper.pdf\" target=\"_blank\" rel=\"noopener\">Stress-timing: Observations, beliefs, and evidence<\/a>\" (1996); Antonio Pamies Bertr\u00e1n, \"<a href=\"https:\/\/citeseerx.ist.psu.edu\/document?repid=rep1&amp;type=pdf&amp;doi=29de5fe9942f6bcee1475cef7674728f0960f912\" target=\"_blank\" rel=\"noopener\">Prosodic typology: on the dichotomy between stress-timed and syllable-timed languages<\/a>\" (1999); Amalia Arvaniti, \"<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0095447012000137\" target=\"_blank\" rel=\"noopener\">The usefulness of metrics in the quantification of speech rhythm<\/a>\" (2012). As Bertinetto wrote in 1989,<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #800000;\">Perhaps no other phenomenon of phonology is so widely accepted, with so little supporting evidence.<\/span><\/p>\n<p>But the stress-timing\/syllable-timing idea <a href=\"https:\/\/scholar.google.com\/scholar?as_ylo=2019&amp;q=%22stress+timed%22&amp;hl=en&amp;as_sdt=0,39\" target=\"_blank\" rel=\"noopener\">remains a seductive one<\/a>. As Arvaniti observes, people keep coming up with clever new metrics to show that if you just look at things the right way, some aspects of the relevant units are indeed deeply isochronous (or at least deeply isochronous-trending). And an even larger group just assumes the idea without discussion. So maybe I'm joining the isochronism chorus?<\/p>\n<p>Not yet.<\/p>\n<p>But it's easy to look at average syllable-scale spectra for (more diverse) collections of recorded speech in a wide variety of languages, including things like audiobooks, news broadcasts, narratives, and conversations. So we'll see&#8230;<\/p>\n<hr \/>\n<p>Some relevant past posts:<\/p>\n<p>\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=124\" target=\"_blank\" rel=\"noopener\">Slicing the syllabic bologna<\/a>\", 5\/5\/2008<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=126\" target=\"_blank\" rel=\"noopener\">Another slice of prosodic sausage<\/a>\", 5\/6\/2008<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=131\" target=\"_blank\" rel=\"noopener\">Stress timing? Not so much<\/a>\", 5\/8\/2008<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=9159\" target=\"_blank\" rel=\"noopener\">Speech rhythm in <i>Visible Speech<\/i><\/a>\", 2\/18\/2013<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I've always been skeptical of the distinction between \"stress-timed\" and \"syllable-timed\" languages, at least as a claim about the phonetic facts of speech timing as opposed to the psychological dimensions of speech production and perception. Syllable durations in all languages vary widely, due to differences in the intrinsic durations of different vowels and consonants, the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-58120","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/58120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=58120"}],"version-history":[{"count":13,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/58120\/revisions"}],"predecessor-version":[{"id":58133,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/58120\/revisions\/58133"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=58120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=58120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=58120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}