"One big Donald Trump AIDS"

« previous post | next post »

As I've observed several times over the years, automatic speech recognition is getting better and better, to the point where some experts can plausibly advance claims of "achieving human parity". It's not hard to create material where humans still win, but in a lot of ordinary-life recordings, the machines do an excellent job.

Just like human listeners, computer ASR algorithms combine "bottom-up" information about the audio with "top-down" information about the context — both the local word-sequence context and various layers of broader context. In general, the machines are more dependent than humans are on the top-down information, in the sense that their performance on (even carefully-pronounced) jabberwocky or word salad is generally rather poor.

But recently I've been noting some cases where an ASR system unexpectedly fails to take account of what seem like some obvious local word-sequence likelihoods. To check my impression that such events are fairly common, I picked a random youtube video from YouTube's welcome page — Bill Maher's 6/23/2017 monologue — and fetched the "auto-generated" closed captions.


Here's an example that combines impressive overall performance with one weird mistake:

5:07 Mitch McConnell says he wants a vote
5:10 before the 4th of July when Trump voters
5:13 traditionally blow their hands off
5:19 oh the fourth of July hey summers here
5:24 boy it was real Beach weather in Phoenix
5:26 the other day did you see that it was
5:28 122 122 plains could not take off hey
5:34 climate deniers
5:36 if melting IceCaps and rising oceans and
5:40 pandemics aren't enough to scare you not
5:42 being able to leave Phoenix that should
5:50 work

I'll give the machine a pass on "summers" instead of "summer's", and we can ignore the issue of "oh" vs. "ah", and forgive the hallucinated "work" at the end — but "plains could not take off"? In Psalm 114:4 the mountains skipped like rams, but not even then did the plains take off.

A bit later:

6:32 but speaking of solar Donald Trump broke
6:36 some news at the rally that the wall you
6:39 know the wall between us and Mexico it's
6:41 going to have solar panels on he said it
6:43 was his idea solar battles okay so the
6:47 wall which is never going to be built
6:49 which Mexico is never going to be paying
6:52 for which now has imaginary so propels
6:56 on because if it's one big Donald Trump
6:59 AIDS it's fake news

So the system got "solar panels" right the first time, but then heard "solar battles" and "so propels". In fairness, Maher kind of garbles the last one into something like "solar pels":

But still, I don't think anyone in the audience heard "so propels".

And then at the end, "if it's one thing Donald Trump hates it's fake news" get turned into "if it's one big Donald Trump AIDS it's fake news":

In that case, I don't hear any acoustic phonetic excuses. And surely "one thing Donald Trump hates" is a priori a more probable word string than "one big Donald Trump AIDS"…

I don't know which generation of ASR Google is using to generate YouTube captions. But it's possible that this sort of thing is an example of the sometimes-peculiar behavior of RNN language models.



6 Comments

  1. Michael Watts said,

    June 25, 2017 @ 2:36 pm

    As I hear it, the "solar battles" is correct if you ignore the semantic context. This is corrupt, though, because I was reading along with the recording.

    [(myl) The fact that the system ignored the semantic context (and also the local bigram probability) is exactly my point.]

    I'm also pretty sure that Maher produced "if it's one thing Donald Trump hates" rather than "if there's one thing Donald Trump hates".

    [(myl) Sorry, my use of "there's" was a case of top-down substitution of a priori sequences — I really heard "it's" but somehow my fingers typed what my parietal cortex expected…]

  2. Rubrick said,

    June 25, 2017 @ 4:36 pm

    Has there been any progress (or even attempted progress, I have no idea) on systems which don't just attempt to accurately capture the stream of words, but to insert plausible punctuation? It seems a rather different, and quite interesting, problem.

    [(myl) There's a long tradition of work on this topic — see e.g. Ji-Hwan Kim and Philip C. Woodland. "The use of prosody in a combined system for punctuation generation and speech recognition", 2001, its references, and the reports that cite it; or Yang Liu et al., "Structural metadata research in the EARS program", 2005, and its citers. It's not a solved problem.]

  3. ardj said,

    June 25, 2017 @ 6:57 pm

    For those less than entirely familiar with the US of A, do Trump voters traditionally remove the dust, hay seeds, corn husks, &c., from their hands by ventilation or by the controlled application of a high-speed projectile on the 4th of July ?

    [(myl) Fireworks are a 4th of July tradition, and one reason that they've become illegal in some states is that people often harm themselves while setting them off. Maher is implying, probably falsely, that Trump voters are especially interesting in explosives play and especially careless. Also, people from New York traditionally go to places like South Carolina for their celebratory explosives:


    ]

  4. Ross Presser said,

    June 25, 2017 @ 9:59 pm

    And surely "one thing Donald Trump hates" is a priori a more probable word string than "one big Donald Trump AIDS"…

    Well, from now on I am going to be using that phrase as a label for anything bad, so the a priori will no longer be true, at least around me.

    "Ugh, this cake is terrible. It's just one big Donald Trump AIDS."

  5. Sniffnoy said,

    June 26, 2017 @ 3:37 pm

    I definitely know people who will use "cancer" or "AIDS" to describe anything bad. Usually it's an adjective but it can also be a noun sometimes. Thus a bad situation involving Donald Trump could be described as "one big Donald Trump AIDS". As such, the phrase didn't seem immediately off to me! It's only in context that it doesn't make sense.

  6. John Swindle said,

    June 27, 2017 @ 11:55 pm

    4th of July, Independence Day, is the U.S. national day, a day for great patriotism, and Trump supporters are believed to value patriotism. Plus what Professor Liberman said.

RSS feed for comments on this post