Waseda talker

« previous post | next post »

"This is cool", writes John Coleman — and it is. More later.


  1. Leonardo Boiko said,

    March 21, 2011 @ 10:15 am

    I find it interesting how much lip rounding the robot does for /u/; I’ve always been assured that in Japanese this vowel is realized with compressed, unrounded lips (something similar to a [ɯ]; wikipedia writes it as [ü͍]). Perhaps the robot anatomy makes it so that a cardinal, rounded [u] is needed for distinction?

  2. Dan Lufkin said,

    March 21, 2011 @ 11:29 am

    Warum einfach, wenn es unständlich auch geht?

    Why simple (digital) when you can also do it complicated (analog)?

    Are we seeing anything that would surprise Prof. Henry Doolittle?

    You could put a microphone in front of this kludge and, presto, with a little more programming, text to speech for your tech support center.

  3. richard howland-bolton said,

    March 21, 2011 @ 11:58 am

    @Dan isn't it supposed to be discovering something about the way we make speech noises, rather than just making them?
    Anyway it's probably just another step on the road to R. Daneel Olivaw or Mr Data.

  4. Spell Me Jeff said,

    March 21, 2011 @ 12:38 pm

    Yes. Text-to-speech can be handled very well with sampled sounds. Semanticizing and contextualizing speech sounds is another matter, at least for my desktop Macintosh. However the Waseda works, it will have the same issues.

    As a way of studying sound production, it seems pretty cool. There's only so much you can do with imagery and invasive techniques on humans before they themselves interfere with the things they are designed to measure.

  5. ~flow said,

    March 21, 2011 @ 1:05 pm

    @leonardo—while japanese does sound way cooler with unrounded [u]s, there is no phonemic contrast between [ɯ] and [u], and my impression is that the /u/s do vary quite a bit in real life. in korean, there *is* a distinction /ɯ/ vs /u/, and consequently those /ɯ/s tend to be quite [ɯ]-ish /ɯ/s. one student helped me grasp the concept with an almost comical, clearly articulated /ɯ/, after which she pointed out how really wide and flatly drawn out her lips were. people won't do that in japan, or at least i've never encountered it.

    also, some japanese use more open realizations for their /o:/s, making them sound like [ɔ:], which is also way cooler or way vulgar, depending on the observer.

  6. Dan Lufkin said,

    March 21, 2011 @ 2:46 pm

    I mention the analog approach because I clearly remember listening to the Bell Labs Vocoder at the NY World's Fair in 1939. I was just a lad then, but it made a great impression on me. You'd never mistake it for a human, but it was clearly intelligible. I believe that it used a pneumatic mechanical model of the human vocal tract, controlled by a keyboard like a piano. A young lady was seated at the console and invited spectators to give her things to say.

    A Bell Labs engineer there had her play "Joe took father's shoe box out. She was waiting at my lawn". He told me that this was their hardest test sentence. I've been using it myself in text/speech work ever since. (See my LL posting of 18 Feb.)

  7. Dan Lufkin said,

    March 21, 2011 @ 3:05 pm

    Sorry, I should have Googled around a bit. The VODER (not Vocoder) was an all-electronic system of band-pass filters. I was thinking of the von Kempelen – Weatstone approach. See:


    This is the Stockholm Univ speech synthesis site where the basic work was done that my group at the National Weather Service used for the first fully digital synthesized speech dissemination of weather information, first via telephone, later via NOAA Weather Radio.

  8. Darlo Paul said,

    April 18, 2011 @ 5:29 am

    Very amusing, I'm sure, but aren't there enough humans talking Janglish at us, without building a machine?

RSS feed for comments on this post