Language Log

Another political melody

June 8, 2008 @ 8:36 am · Filed by Mark Liberman under Language and music, Language and politics

A couple of days ago, I posted some audio clips in which bits of political speechifying were reduced to their pitch and amplitude contours alone ("Political melodies", 6/5/2008). Robert Delius Royar asked me to apply the same technique to a passage from a speech that John F. Kennedy gave at Rice University on 9/12/1962. I've done as he asked; and I've also posted the code that I used, so that others can try the same thing at home.

Here's the melodized version of the first two phrases of the passage that Robert recalled, with the Obama clip for comparison:

JFK melody (first two phrases)

Obama melody

Here's the whole segment from JFK's speech, in both the melodized and original versions:

JFK melody (longer passage)

Original

Here's a comparison of JFK's original first two phrases with the two Obama phrases that triggered Robert's memory:

JFK original (first two phrases)

Obama original

Here's a (shorthand form of) the recipe I used for "melodizing" the clips. If you're genuinely interested in trying this, but find the explanation too compressed or too geeky, I might be persuaded to write a more detailed tutorial.

1. Get the audio. I downloaded the .mp3 file from americanrhetoric.com, and converted it to a .wav file using switch. I then found and edited out the passage that Robert asked about (from 8:48.287 to 9:14.564 in the downloaded audio), using audacity. (Actually, I could have read the mp3 file directly into audacity, but I've occasionally had troubles with audacity's mp3 encoding, so I superstitiously avoid using it in either direction.)

2. Create the pitch and amplitude "score". I got the time functions of pitch and amplitude by running David Talkin's get_f0 program from the ESPS package, but you could also use the built-in pitch-tracking function of the WaveSurfer program, which is wraps a convenient GUI around the ESPS code. You could also use the pitch-tracking function in Praat — which is a wonderful program overall, but its pitch tracker is not as robust as Talkin's code. (If the pitch tracker makes a significant number of errors — octave jumps, or tracking background noise, or the like — you might want to correct the track by hand. Praat has good facilities for doing this.)

I configured the tracker to produce estimates every 5 milliseconds, i.e. 200 times per second. This is probably a denser sampling than is strictly necessary, but it does no harm. I wrote out the pitch and amplitude estimates as a text file, one pair of values per line. The start of this file (leaving out an initial run of unpitched stuff) looks like this:

0.0 1156.93823242
0.0 1330.94921875
0.0 1489.02575684
165.842285156 1515.02490234
165.203079224 1509.15917969
171.29725647 1678.94897461
170.384185791 1963.77746582
188.458984375 2140.13574219
191.401809692 2366.39038086
200.456390381 2681.98095703
207.251037598 2736.33666992

The first number on each line is the pitch estimate (in Hz) and the second number is the RMS amplitude. Nothing would change if those absurdly over-precise numbers were rounded off.

3. Turn the score into audio. To perform such scores as quasi-music, I wrote a little Matlab function that amplitude- and frequency-modulates a signal using an arbitrary wavetable oscillator. The function is here. (Needed improvements: at a minimum, add some sanity checks on the inputs; there are some issues with accumulation of time-base round-off errors; it could be made more elegant and more general in various ways. But I believe that it does roughly the right thing in the cases under discussion.)

To create the audio for the JFK clip, I used this script. It would probably be better to use a wavetable with more high-frequency energy, to produce a less muffled sound that would be more appropriate for public speakers projecting their voices. An "instrument" with seven overtones, weighted [1.0 0.9 0.8 0.7 0.6 0.5 0.4], is probably too far in the other direction, though it does caricature President Kennedy's somewhat nasal-sounding edge:

JFK

Obama

(I believe that all this should also work in Octave — the free software version of Matlab — but I haven't checked.)

I then used switch again to convert the synthesized .wav file to an .mp3 file for the purposes of this post.

June 8, 2008 @ 8:36 am · Filed by Mark Liberman under Language and music, Language and politics

Permalink

4 Comments

rootlesscosmo said,

June 8, 2008 @ 1:41 pm

I heard substantial differences among Clinton, Obama, and McCain in your earlier trial of this method; I hear substantial similarities between Obama and JFK. But those are subjective impressions, and I don't trust 'em. Can you devise a reasonably simple way to measure similarity and difference in these samples, on some reasonably meaningful parameters? If we had enough samples treated this way, and a pretty good similarity metric, could we start to find some meaningful patterns?
Steve Harris said,

June 8, 2008 @ 3:59 pm

A suggestion for what to look for in the way of data:

For me, the interesting similarities and differences are in cadence, by which I mean the timing and regularity between peaks in amplitude. For the first line of data, I suggest abstracting peaks by some routine which selects out local maxima which are sharply higher than their surrounds, that is to say, peaks which are both narrow and substantial. The crucial information is then the sequence of time intervals between successive peaks.

The hard part lies in a metric which captures something a lot more informative than simply the average time between peaks. What I'd like to see is a metric which captures rhythm as described by, say, distinguishing long and short intervals. Perhaps, at a first pass, one can simply classify intervals into those two categories, long and short, using something which inventories intervals and applies a bimodal mask (no, I don't know how to do that). Then parse the sequence of intervals into phrases beginning and ending with long intervals. The rhythm is the pattern of lengths of phrases (i.e., how many short intervals between pairs of long intervals).

This still leaves open the question of gauging a metric on patterns of phrase lengths. A start: average length of phrase and standard deviation of phrase-lengths. But that doesn't capture a pattern of repetition; I don't know what would.
Andrew Rodland said,

June 8, 2008 @ 11:15 pm

How to tune the synth? Simple, just use a sample of a muted trombone for that perfect "Peanuts special" effect.
Tim McKenzie said,

June 8, 2008 @ 11:49 pm

rootlesscosmo and Steve Harris: Perhaps a Fourier-transform-type-thing applied to the amplitude estimates would show something interesting. Then again, maybe not. I wonder if this idea has ever been used by either side in the stress-timed/syllable-timed languages debate.

RSS feed for comments on this post

Another political melody

4 Comments

rootlesscosmo said,

Steve Harris said,

Andrew Rodland said,

Tim McKenzie said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta