Weird synthesis

I wouldn't have predicted that this would work as well as it does:

Sinewave synthesis (Robert Remez et al. "Speech perception without traditional speech cues", Science 1981) pointed the way, but this is pretty far down the road. Now I want to hear the string orchestra and brass band versions.

Of course, the apparent intelligibility is mostly a function of knowing what was "said". But still.

[Hat tip: Aengus.]



  1. Richard Howland-Bolton said,

    March 2, 2010 @ 8:15 am

    Ahh! That takes me back to the days of Sparky's Magic Piano, though that sounded much better :-)

  2. Richard Howland-Bolton said,

    March 2, 2010 @ 8:17 am

    Oh! Oh! Oh!!
    Here is is!!

  3. Jon Weinberg said,

    March 2, 2010 @ 9:05 am

    I had expected the link from "the apparent intelligibility is mostly a function of knowing what was 'said'" to point somewhere like this.

  4. John Cowan said,

    March 2, 2010 @ 10:59 am

    After the word "politicians", I shut my eyes to see if I could understand the rest. Not a word. Without the subtitles, the pseudo-speech was completely unintelligible to me. I stopped when the child began to read the text out loud, and restarted the video with my eyes shut. Again, I could understand only what I had seen the subtitles for, so the effect is robust.

  5. couk said,

    March 2, 2010 @ 11:46 am

    In the screenshot @1:52 it seems that the input signal is overdriven, which introduces extra noise. And I wonder what the composer means when he says that a "fairly high resolution" is only possible with a mechanical piano. Obviously the effect relies on an artificially low resolution, and an electronic instrument wouldn't be so limited by its nature. Unless that electronic instrument is a bunch of hard drives.

  6. Faith said,

    March 2, 2010 @ 11:50 am

    I covered the screen before I started watching (since I'd been given the hint that there would be subtitles). I could understand "we declare" and "responsibility" and "world." I haven't watched it with the subtitles to check if those words are what the piano was actually trying to say.

  7. majolo said,

    March 2, 2010 @ 12:01 pm

    I had a different experience from John Cowan's. I tried looking away after reading some subtitles, and the last phrase "protecting our mother earth" was pretty clear with no visual cues. In fact, I would rather say that the apparent intelligibility (for me) was mostly a function of knowing that something was being said, not what was being said.

  8. Victor Mair said,

    March 2, 2010 @ 2:27 pm

    I kept my eyes closed throughout and could easily hear phrases like "we are responsible" and "we proclaim that another world is possible."

  9. Will said,

    March 2, 2010 @ 3:05 pm

    I unfortunately watched the video before reading the comments, and I never covered the screen because I really expected them to do a segment without subtitles specifically for a demonstration of understandability, but they never did that. Oh well.

  10. Spell Me Jeff said,

    March 2, 2010 @ 3:08 pm

    I think perhaps resolution is a misleading word to describe what is going on, though perhaps correct from a strictly psychoacoustic point of view. I dislike it because in most contexts it suggests a kind of precision. What a mechanical piano would introduce is quite the opposite, but rather all kinds of harmonics and resonance, which would be especially noticeable in a live context. The absence of such effects is what makes early tone generators so annoying to listen to. But in this situation, it might be reasonable to suggest that the string resonance and the entire structure of a boxy, wooden piano substitute for the complex structure of human vocal apparatus, from diaphragm to vocal cords made of tissue to sinus resonance. Sophisticated samplers and tone generators can capture and/or reproduce most, but not all of this. (You can always tell if it's live or Memorex.)

    [(myl) The organic complexity of acoustic instruments is probably not relevant here, as the success of sinewave synthesis shows. I'm skeptical, frankly, that the outcome would be in general less intelligible or less speech-like if synthetic tone-generators were used instead of this electromechanical rig. And if there's any difference, it would have to do with things that would be easy to change in the synthetic version, like (say) excessive phase coherence or something.]

  11. James C. said,

    March 2, 2010 @ 3:34 pm

    I believe that he meant resolution in the sense of having enough frequencies available to coarticulate. For a non-electronic instrument the only things which would provide a large frequency spectrum with the possibility of simultaneous frequencies being active are pianos or something like them, e.g. xylophone, marimba, harpsichord, harp. A piano has the added ability to automatically damp strings, and lots of resonance would have destroyed the effect.

    I think if they’d chosen a speaker with a slightly lower range they might have been more successful. I note that the lower keys weren’t being used at all. Frankly they did a remarkable job at rendering the high frequency consonants like fricatives.

  12. Mike Albaugh said,

    March 2, 2010 @ 3:42 pm

    Having done a fair bit of sound synthesis "back in the day" (70's and 80's), I heard the "only a piano…" comment as "There are very few mechanical instruments that can play up to 88 notes at once". Most of the synthesizers I could afford had far fewer (< 8) available "voices". If your goal is to use addition of (sorta) sine waves, you probably do need such independence.

    [(myl) The electromechanical device here can play up to 88 notes at once (I guess), with good temporal precision. A human pianist wouldn't have the needed control either in note selection or in timing. Some "player pianos" might not either, I guess.

    Synthesizers that need to keep up with real time may have a limit on how many notes they can simultaneously generate, though these days I don't think it would be too hard to find a processor capable of generating 88 simultaneous tones via (say) wavetable synthesis. But there's nothing about this discussion that limits us to real-time synthesis. And I bet that the same recipe, applied to create a digital audio file at whatever speed the software happens to work, would have about the same results as the mechanical system does.]

  13. Kenny Easwaran said,

    March 2, 2010 @ 4:56 pm

    I had assumed the "only a mechanical piano" meant that he'd never be able to get this sort of effect with a piano played by a human. (Although the fact that some of Conlon Nancarrow's etudes for player piano can be played now by some live instrumentalists suggests that maybe some day some human pianist might be able to do this with a simple phrase.)

  14. Jarek Weckwerth said,

    March 2, 2010 @ 5:39 pm

    Yes, I agree with Kenny Easwaran. What he means may probably be temporal resolution going well below (or beyond, if you like) traditional note lengths (half notes, quarter notes etc.), and the very precise synchronisation between the notes, unachievable for a human player. The visual illustration there is the scrolling midi-like control sheet (usually called the "piano roll" in sequencer applications).

  15. David L said,

    March 2, 2010 @ 7:10 pm

    myl said: "And I bet that the same recipe, applied to create a digital audio file at whatever speed the software happens to work, would have about the same results as the mechanical system does."

    But a major limitation of the particular mechanical transducer used here is that you have little if any control over the temporal profile of each note. You bang the key with the mechanical finger and the resulting sound has a certain loudness and duration. Whereas with a purely digital system (if that's what you mean; I'm not sure) you could add together "notes" with arbitrary intensity vs. time profiles. It would produce, I'd guess, a more accurate version of the voice. Wouldn't be nearly so cool, though.

  16. Dan Lufkin said,

    March 2, 2010 @ 10:19 pm

    Am I the only one here who remembers the Bell Labs Voder at the 1939 NY World's Fair? (110,000 hits) That had a keyboard with an operator who could play requests at the demo. It made a lasting impression on me.

    Of course that was parmetric rather than synthetic. Didn't A.G. Bell have a crude version?

  17. Mark F. said,

    March 3, 2010 @ 12:18 am

    I wish there were a video that consisted only of the recitation by the piano. Just as I think I'm starting to pick out the words, the narration cuts in and I lose it.

  18. Graeme said,

    March 3, 2010 @ 8:54 am


    But all I could think was how eerily tinny the 'voice' was; and how incongruous both the timbre (or lack of it) and the whole project (cheap computer synthesising something as natural as the human voice) were, given the environmental message.

    At least Kraftwerk focused on pocket calculators, trains and showroom dummies.

  19. Aviatrix said,

    March 3, 2010 @ 5:16 pm

    I suspect that the effect is similar to listening to a heavily accented speaker. Some people find him unintelligible, and others can pick out the words. I imagine that after listening to this for a while it wouldn't be much more impenetrable than any other accent. After all, isn't that what's going on here? The "speaker" is unable to form the English phonemes normally and is approximating them with best match sounds from its own "language."

  20. Frans said,

    March 3, 2010 @ 5:53 pm

    I unfortunately watched the video before reading the comments, and I never covered the screen because I really expected them to do a segment without subtitles specifically for a demonstration of understandability, but they never did that. Oh well.

    Same here.

  22. Dmajor said,

    March 5, 2010 @ 3:11 am

    I've noticed a similar effect while printing pages on my Epson cx5400 printer. Sometimes the sound seems like a short phrase just on the edge of intelligible speech. Although why the printer should repeat, say, "Bob Mulroney, Bob Mulroney" or "macaroni hat, macaroni hat" I don't know.

  23. цarьchitect said,

    March 6, 2010 @ 6:35 pm

    Musicians have been using the human voice as an instrument for a while now, it's about time someone tried to create a voice from an instrument.

    Also, this reminds me of the harmonic telegraph.

    @ Kenny Easwaran – I thought the human-powered Nancarrow pieces were arrangements and not the pieces themselves.

  25. Dennis Des Chene said,

    March 31, 2010 @ 8:35 pm

    I'm not surprised. Hold down the damper pedal and shout a vowel into the strings of a piano. You will hear the timbre of the vowel, faintly, resonating in the strings. Notice that the "synthesizer" is much better with vowels than with consonants. You can’t really do noise very well.

