In their own words

« previous post | next post »

Speech researchers at Google have applied speech-to-text to YouTube's Politicians channels, indexed the results, and wrapped the whole thing in a Elections Video Search "gadget" that you can add to your iGoogle page or embed elsewhere. The announcement on the Official Google Blog is here.

The speech recognition part seems to work quite well, especially for the sorts of words that you're likely to want to search for — as BBN's Podzinger has done in similar applications for several years. I'll bet that the Google gadget's language model is well adapted to current U.S. political discourse, and perhaps the acoustic models are tuned up for the purpose as well, I don't know. In any case, the half-dozen queries that I tried all worked well, in the sense that the hits were all genuine ones.

The biggest problem with this tool, at the moment, seems to be in the relatively limited coverage of the YouTube material that it indexes. You won't, for example, find John McCain's discussion of the Iraq-Afghanistan border — I presume that's because the clip is not in the relevant section of YouTube, or maybe it's there but hasn't been transcribed and indexed yet.

I do have two significant criticisms of the interface. First, as far as I can tell, there's no way to get a link to the YouTube videos that are indexed. You can watch and listen to each of the videos that it finds for you, and you can read the video's headline, but you can't find out in any direct way which video clip  you're watching. Second, the order of the hits is the result of Google's usual mysterious internal calculations. In this particular case, it seems to me, the lack of any way to re-order the hits by date will be especially problematic for many users.

You shouldn't be surprised to learn that the automatic transcription is by no means perfect. The gadget only shows us show brief snippets of the transcript — put the mouse pointer over one of the little yellow markers on the video progress bar in order to see one — but this is enough to find some of the usual amusing ASR errors. For example, the top hit this morning for a search on "Iraq" is John McCain's July 15 Town Hall Meeting (note that I had to search YouTube for the title "Obama Wrong on Iraq Wrong for Afghanistan" in order to find a link to it), and the snippet for the second instance of "Iraq" in this video is shown below:

What McCain actually said was:

And I note that he's speaking today
about his plans for Iraq and Afghanistan
before he's even left;
before he's talked to General Petraeus,
before he's seen the progress in Iraq,
and before he has set foot in Afghanistan for the first time.

Comparing the two:

betray us before you seen that progress in Iraq and before he has set foot in
Petraeus before he's seen the progress in Iraq and before he has set foot in

This is an ironic — if obviously unintended — echo of the shameful "General Betray Us" ad.

Still, the speech recognition is apparently good enough for high-quality document retrieval, and I'm sure that this is just the leading edge of interesting things to come.

1 Comment

  1. Feisal Schlee said,

    July 23, 2008 @ 11:25 am

    I couldn't find a way to get a link directly, but the URL of the thumbnail picture for the video includes the alphanumeric video id. Copy/pasting that may be slightly faster than searching for the title.

RSS feed for comments on this post