The "Text-to-Speech Synthesis Technology" ASA Standards working group (S3-WG91) is conducting a web-based test that applies the method it will be proposing as an ANSI standard for evaluating TTS intelligibility. It is an open-response test ("type what you hear"). The test uses syntactically correct but semantically meaningless sentences, Semantically Unpredictable Sentences (SUS).
To take the test, click here.
More information from Ann's email:
+ For each TTS system, intelligibility will be evaluated across a wide range of six speaking rates.
+ There are 60 short sentences presented to a listener during a test session in blocks of 10 sentences at each of six speaking rates. The test generally takes 15-20 minutes.
+ Several synthesis techniques will be tested, including formant synthesis, small inventory diphone concatenation, unit selection, and HMM-based synthesizers. Each synthesis technique is represented by at least two TTS systems. Both female and male American English voices will be tested for each system.
+ A listener will hear only one synthesizer and voice during a test session. The rate of speech will change from its default rate to increasingly faster rates as the test session progresses.
+ Synthesizers will be varied over different test sessions.
+ The set of 60 sentences tested will be varied over different test sessions. Eventually, across sessions and listeners, all systems will have been tested with the same larger set of test sentences.
+ Two human speech reference conditions will also be tested following the same procedures as used for the TTS systems. In one, the talker spoke at different speaking rates, and in another, natural speech originally spoken at a moderate rate was speeded through signal processing.
This study represents a large collaborative effort. We plan to share results with the synthesis community in the form of conference papers and a journal article. Some related studies are being conducted with blind listeners and with aided hard-of-hearing listeners.