Manfred Schroeder

« previous post | next post »

Manfred Schroeder died on Dec. 28, 2009, as I just learned.  He was a physicist specializing in acoustics, who worked at Bell Labs from 1954 to 1969, and then split his time between Göttingen and Bell Labs.  He carried forward the tradition of Harvey Fletcher, an accomplished physicist whose most important work was in the psychology of hearing. As Manfred jokingly pointed out to me when we first met in 1975, this is also the tradition of  Gleb Vikentyevich Nerzhin, the mathematician in Solzhenitsyn's The First Circle who seals his fate by chosing to work on psycho-acoustics rather than cryptography.

Although you probably don't know Manfred Schroeder's name, his work on psycho-acoustics led to several innovations that have almost certainly affected your life. First, in the 1970s, he developed with Bishnu Atal and Joseph Hall the idea of perceptual coding, described as follows in the abstract for their joint paper "Optimizing digital speech coders by exploiting masking properties of the human ear", Journal of the Acoustical Society of America, 66(6): 1647-1652, 1979:

In any speech coding system that adds noise to the speech signal, the primary goal should not be to reduce the noise power as much as possible, but to make the noise inaudible or to minimize its subjective loudness. “Hiding” the noise under the signal spectrum is feasible because of human auditory masking: sounds whose spectrum falls near the masking threshold of another sound are either completely masked by the other sound or reduced in loudness. In speech coding applications, the “other sound” is, of course, the speech signal itself. In this paper we report new results of masking and loudness reduction of noise and describe the design principles of speech coding systems exploiting auditory masking.

This simple and elegant idea is the basic design principle behind MP3 and AAC coding.  Manfred was involved in developing both of these, but in any case, the foundational idea of perceptual coding is largely due to him.

The second innovation ("Code-Excited Linear Prediction", or CELP) is a bit harder to understand. Many modern methods of digital acoustic analysis represent the sound spectrum in terms of "linear prediction", where each successive output sample is modeled as a linear combination of the N previous output samples plus an error term.  (This is equivalent to modeling the signal in the frequency domain in terms of N/2 resonances or "poles".) The basic idea of this sort of analysis was developed by Norbert Wiener in his work during WWII on radar-controlled anti-aircraft guns, and eventually published in a declassified form as Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications (1949).

Manfred also played a role in the early application of these ideas to speech analysis and synthesis. According to Bishnu Atal, "The History of Linear Prediction", IEEE Signal Processing Magazine, 2006:

… in 1966, I was one day in Manfred R. Schroeder’s office at Bell Labs when John Pierce brought a tape showing a new speech time compression system. Schroeder was not impressed. After listening to the tape, he said that there had to be a better way of compressing speech. Manfred mentioned the work in image coding by Chape Cutler at Bell Labs based on differential pulse code modulation (DPCM) technique, which was a simplified version of predictive coding. Our discussions that afternoon kept me thinking. Since my recently started Ph.D. thesis work focused on automatic speaker recognition, I hesitated to start a side project on speech compression at that time. Also, I had doubts whether I could add anything useful to this crowded field of research. However, Manfred’s remarks at our meeting made a deep impression.

But this is not yet the second invention that I mentioned — LPC was developed more or less simultaneously in Japan by Itakura and Saito, and LPC itself would not have had such a significant impact on your life without another development, which didn't come along for another two decades.

The error term in linear prediction is also sometimes called an "innovation" term, since it represents the aspect of the signal not predicted by the model.  In signal coding applications, you can think of the innovation or error term as a "source" signal exciting an auto-regressive "filter". If the full innovation sequence is transmitted, the signal is reconstructed perfectly, but on the other hand, no compression is achieved; so the trick is to code the innovation sequence as parsimoniously as possible.  As LPC ("linear predictive coding") for speech was originally developed in the 1960s, the source signal in voiced speech is modeled as a quasi-periodic impulse train, representing the frequency and amplitude of the glottal source, and in unvoiced speech, the source is modeled as amplitude-modulated white noise. The good news is that this parametric source can be transmitted with very few bits; the bad news is that the result doesn't sound very good.

In the early 1980s, Manfred Schroeder and Bishnu Atal developed a different idea, described in their paper "Code-excited linear prediction (CELP): High-quality speech at very low bit rates", ICASSP 1985.

We describe in this paper a code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion.

The "fidelity criterion", needless to say, is based on a perceptual distortion measure — and it turns out that random code-books do a pretty good job. CELP is now the most widely used form of speech coding, and in particular is the basis of  all (?) digital cellular telephony.

If you read Manfred's home page, which is still available at Göttingen, you'll see that he made many other contributions, in areas from concert-hall acoustics to computer graphics.  Among his publications, my favorite is his book Number Theory in Science and Communication, and I think it might have been his favorite as well.  Certainly I never saw him as happy and excited as when he explained to me about his idea for quadratic-residue diffusors to solve the acoustic problem caused by modern concert-hall design, where relatively low height compared to width causes undesirable median-plane sound reflections.

(I should mention here another small-world connection — Joe Hall, co-author of the original paper on perceptual coding, is Barbara Partee's cousin.)



5 Comments

  1. Tim Silverman said,

    January 20, 2010 @ 8:41 am

    I know of Schroeder from reading his excellent book Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise, which is both fun and informative, and which I recommend at every opportunity. (However, it doesn't really talk about acoustics, apart from mentioning his work on the design of concert halls. But, at least in his authorial persona, he does come across as an enthusiastic and delightful person.)

  2. John Cowan said,

    January 20, 2010 @ 1:30 pm

    I wonder if people would have devised LPC (in its known form, at least) if more of us spoke languages where creaky voice was significant.

    [(myl) Well, any voice-quality distinctions would require additional dimensions in a parametric voice source. But CELP (or multi-pulse LPC) should work fine for voice-quality issues (though I don't know of any experimental confirmation of that hypothesis).]

  3. uberVU - social comments said,

    January 23, 2010 @ 3:36 pm

    Social comments and analytics for this post…

    This post was mentioned on Twitter by PhilosophyFeeds: Language Log: Manfred Schoeder http://goo.gl/fb/ToNr

  4. John Chowning said,

    January 25, 2010 @ 2:10 am

    Manfred Schroeder's seminal paper "Natural Sounding Artificial Reverberation" (1962) was a revelation when I discovered it in my first investigation into spatial processing in 1964. The Schroeder allpass delay advanced my work as well as the work of many others and remains an integral part of reverberation theory and practice to this day. Schroeder had only heard his artificial reverberation in simulations of Avery Fisher Hall with an impulse as a source signal until he visited Stanford in 1971 when I played for him a synthesized brass canon enhanced by a form of his allpass-based "natural sounding" reverberation. He was pleased and so was I.

  5. John Chowning said,

    January 25, 2010 @ 2:15 am

    The sound example that I played for Manfred, referenced above ccrma.stanford.edu/~jc/FM-BrassCanon2.mp3

RSS feed for comments on this post