Speech rhythm in Visible Speech

« previous post | next post »

In "Speech rhythms and brain rhythms", 12/2/2013, I showed the results of a simple experiment looking for evidence of speech rhythms in the frequency domain, which found a peak at about 2.4 Hz in the average spectrum of the waveform envelope of 6300 read sentences. I don't have anything new to say about what what this means, but I wanted to note a 65-year-old example of a somewhat similar experiment.

The Sound Spectrograph was developed at Bell Laboratories during WWII, and was revealed to the world in a series of papers in the July 1946 issue of the Journal of the Acoustical Society of America. Shortly afterwards, a book-length exposition of many results and applications was published: Ralph Potter, George Kopp, and Harriet Green, Visible Speech, 1947. This work, and the machine whose applications it documented, revolutionized all areas of science and technology dealing with sound, especially phonetics and other kinds of speech science.

Some aspects of this early work have become part of the cultural DNA of speech science and technology — but the 1947 book includes many suggested directions of investigation that have not been as productive of imitation. One example is the section in Chapter 14 ("Phonetic Research Possibilities")  on "Rhythm in Speech" (pp. 312-313):

By recording speech in such a way that its energy envelope only is reproduced, it is possible to learn something about the effects of recurrences such as occur in the recital of rimes or poetry. In one form of portrayal, the rectified speech envelope wave is speeded up one hundred times and translated to sound pattern form as if it were an audible note. Syllabic components of a one-cycle-per-second rate thus would become oscillations at 100 cycles a second, while those a ten cycles would be boosted to 1,000 cycles.

Oral reading of ordinary page material, when reproduced in this way, results in a randomly mottled sound spectrogram such as appears in A of Fig. 12. The evidence of regularity is in the pauses for breath that appear as faint while lines forming a vertical grid. The length of this sample is about 40 seconds, and the same time scale is used in B and C. The vertical frequency scale is from zero to about 35 cycles.

When "Hickory, dickory, dock" was recited, the result is as shown in B. Notice that a semblance of order is beginning to appear in the low-frequency parts, although the upper components are still randomly related. In C, are shown four short samples that illustrate the effect of repeating words. To produce the first sample at the left, the word "telephone" was repeated over and over. For the second, two words "telephone" and "spectrograph" were repeated alternately. In the last sample, the words "acoustic" and "spectrograph" were used. The use of the same words extends the systematic relationship to a higher frequency.

This is the first attempt — and the first of many failures — to find evidence of literal isochronism in English speech. As Potter, Kopp & Green found, the syllable-scale spectrum of an individual phrase is in general "randomly mottled".  Subsequent research on speech rhythms has mainly relied on time-domain measurements of inter-event intervals, where evidence of isochronism has been similarly "mottled" at best. In contrast, my little experiment looked at overall average syllable-scale spectra of thousands of phrases, and presumably found only evidence of a typical time-scale of sonority variation.

As I noted in the earlier post, there's a recent literature exploring the idea that endogenous syllable- or phoneme-scale brain rhythms are involved in the production and/or perception of speech. Aside from this literature, I haven't found any other work related to the idea explored in that little section of Visible Speech.

N.B. Some readers may know other versions of the book title Visible Speech. Potter, Kopp, and Green were consciously and explicitly imitating two older works by Melville Bell, the father of Alexander Graham Bell: "Visible Speech: A New Fact Demonstrated", 1865; and Visible Speech: The Science of Universal Alphabetics; Or Self-interpreting Physiological Letters, for the Writing of All Languages in One Alphabet, Illustrated by Tables, Diagrams, and Examples, 1867. These were the first systematic attempts to create a "phonetic alphabet" that could be used to describe all varieties of all human languages. Here's the title page of the 1867 work, showing what Bell's original glyphs looked like:

More recently, John DeFrancis added to the same titular tradition in his 1989 book, Visible Speech: The Diverse Oneness of Writing Systems.

Update — See also
"Towards automated babble metrics", 5/26/2019
"Cumulative syllable-scale power spectra", 6/11/2019
"New models of speech timing", 9/11/2023



2 Comments

  1. Fred Cummins said,

    December 20, 2013 @ 4:24 am

    > This is the first attempt — and the first of many failures — to find evidence of literal isochronism in English speech.

    Not quite. As far back as 1939, Classé used a kymograph to study the succession of syllable onsets evident in the speech waveform (Classé, 1939). The speech employed was read English, and Classé wanted to inquire whether the impression of rhythmic regularity in the sequence of syllables encountered in prose could find empirical validation in isochronous interval measurements. Subjects read formal texts ranging from highly poetic (The Song of Songs) to informal prose (taken from Daniel Jones' transcriptions). They also tapped at points they considered rhythmically salient. This latter intervention is interesting, as it has the side effect of making the intervals between taps more regular than the corresponding speech intervals spoken freely, and thus will tend to favour the production of evenly spaced rhythmic beats.

    Classé's findings were not terrifically surprising. Even spacing between successive stressed syllables emerged as a tendency in the recordings—a tendency greatly encouraged when the lexical material was written with an ear to rhythm, when successive intervals contained phonetically matched segments and syllables, and when they had relatively similar grammatical construction. Any such tendency was disrupted by inter-sentence breaks.

    Classé, A. (1939). The Rhythm of English Prose. Basil Blackwell, Oxford, England.

  2. NQA2 said,

    August 7, 2014 @ 12:14 am

    I would not have commented on it without reading it. I'm puzzled why you assume I hadn't read the very comment I was talking about. :(

RSS feed for comments on this post