In current rotation on Doonesbury, Bernie (Mike's boss) is pitching an idea to Sid (Boopsie's agent):
Bernie's idea? To create celebrity GPS voices:
Sid is sure he can recruit speakers:
Though what this means, in the case of actors who don't always play themselves, is less clear:
I was pretty sure that this was a re-run, and a quick web search confirmed that the same sequence ran back in November of 2009. But first I had a moment of doubt, wondering whether it might be a dream memory.
Why might I have false memories about celebrity speech synthesis? Well, in the late 1980s, when I still worked at Bell Labs, I spent a week in Denver recording a (minor) celebrity voice. The speaker was a woman working at a country music radio station, who had previously recorded the messages and prompts for AT&T's AUDIX voicemail system, for which the engineering development was then done at the Western Electric facility in Denver. The AUDIX people wanted to see if they could add general text-to-speech capability using the same voice.
This didn't work out in the end, for various reasons (corporate re-orgs; unclarity about how customers would get text into the system in those pre-cell-phone, pre-internet days; the Uncanny Valley effect, etc.). But it was the occasion for a fair number of semi-jokes about celebrity voices — could we recruit Charlton Heston, we wondered?
And I learned some interesting things about voices from Miss Audix, as she was called in this professional role.
When she originally recorded the AUDIX prompts, using her normal voice-over persona, the results had been vehemently rejected by the customer. After several equally unsatisfactory iterations, she finally hit on what she called her "happy secretary voice", which turned out to be exactly what they wanted. So she tried to adopt that persona for the material that I recorded (a typical list of phonetically-balanced sentences, somewhat like the current ARCTIC list but an order of magnitude larger).
This was not easy — she broke down laughing more than once, trying to give the happy secretary version of something like "The hogs were fed chopped corn and garbage". (Although we probably didn't actually include the Harvard Sentences in our list, fragments taken out of context in order to optimize the coverage of segmental n-grams tend to be similarly incongruous.) In the end, I think she reverted to a more standard voice-over delivery.
And things broke down further when we got to the prose passages. I wanted to get recordings of some extended passages of coherent prose, for use in modeling her prosodic patterns in material other than sentence lists, and so I'd included a collection of newswire stories. But when we got to that point, Miss Audix objected.
In the first place, she explained, newswire stories are not written to be read out loud. As a professional newsreader, she would need to rewrite the stories and mark them up in various ways before reading them. I'd allowed one 40-minute session for her to read the stack of stories that I'd printed out for her — but it would take much longer than that for her to re-write them so as to be suitable for reading on the air.
And in the second place, she insisted, the clash of personas was just too much. Happy secretaries are not newsreaders, nor vice versa. OK, I said, just read them as you would normally.
But that was not a clear enough instruction, because she had worked at several different kinds of radio stations. And she gave a fascinating demonstration of the acting method behind her different ways of reading the same story on a public radio station, on an all-news AM station, or on a top-40 music station.
On an NPR outlet, she explained, her presentation would embody the idea that "This is really complicated stuff, but I'm intelligent, and you're intelligent, so I'm going to lay the ideas out in a way that intelligently reflects their structure, and since you're paying careful and intelligent attention, you'll understand." And her sample exhibited a correspondingly elaborate modulation of amplitude, pitch, and time.
On an all-news AM station, she explained, the idea is "This is really important and you're really busy so just listen for a minute and you'll get all the essential stuff you need to know". And in her sample, she talked fast and loud and urgently, with great but generally uniform emphasis.
And on a music station her message was "You don't want to hear this, and I don't want to read this either, but the FCC makes us do it, so just ignore me for a minute and we'll get back to the tunes." The corresponding was rapid, soothing, unemphatic and easily backgrounded.
I think that we resolved the dilemma with something like "just imagine that you're reading the newspaper out loud to your grandmother whose eyesight is failing", but I'm not sure. Anyhow, this is an aspect of speech synthesis (and speech science) that still needs some work.