"The people that stayed back the facts"

« previous post | next post »

This is a reality check on the current state of automatic speech recognition (ASR) algorithms. I took the 186-word passage by Scottie Nell Hughes discussed in yesterday's post, and submitted it to two different Big-Company ASR interfaces, with amusing results. I'll be interested to see whether other systems can do better.

This is the audio, with my transcription:

SNH: well I think it's also an idea of- of- of an opinion and- and that's- on one hand I hear half the media saying
that these are lies
but on the other half there're many people who go no it's true
and so one thing that has been interesting this entire campaign season to watch
is that people that say facts are facts
they're not really facts
everybody has a way- it's kind of like looking at ratings or looking at a glass of half full water
everybody has a way of interpreting them
to be the truth
or not truth there's- there's no such thing unfortunately more of facts
and so mister Trump's tweet amongst a certain crowd a large
uh a large m- a large part of the population
are truth
when he says that millions of people illegally voted
he has some ((fa- and see-)) in his- amongst him and his supporters
and people believe
they have facts to back that up
those that do not like mister Trump
they say that those are lies and there's no facts to back it up

And here's the two ASR outputs, matched phrase-by-phrase with my transcription:

well I think it's also an idea of- of- of an opinion and- and that's- on one hand I hear half the media saying
Well I think it's also an idea of the opinion in that I want to enhance the media saying
well I think it's also an idea that I would have pinion and then. On one hand I hear has to be the same


that these are lies
these are lies
it's either live


but on the other half there're many people who go no it's true
but on the other hand there me because now it's true
or on the other half that makes you know it's true


and so one thing that has been interesting this entire campaign season to watch
enter one thing there's been interesting this entire campaign didn't why
and so one thing that has been interesting that you can watch


is that people that say facts are facts
is the people that stayed back the facts.
that people that say facts are facts


they're not really facts
They're not really fat.
they're not really fact


everybody has a way- it's kind of like looking at ratings or looking at a glass of half full water
Everybody has a way it's kind of like looking at ratings are looking at a glass of hot water
everybody has a way it's kind of like we can trade it was he had a glass of hot water


everybody has a way of interpreting them
everybody has a way of interpreting them
everybody has a way of interpreting them


to be the truth
to be the truth
to be the truth


or not truth there's- there's no such thing unfortunately more of facts
or not true there there's no such thing important more abstract.
or not true there's no such thing unfortunately more upset


and so mister Trump's tweet amongst a certain crowd a large
And so Mister transit we'd amongst a certain crowd a more
and so mr. Trump's tweet and most a certain crowd a large


uh a large m- a large part of the population
up our borders and a large part of the population
large large part of the population


are truth
or true
are truth


when he says that millions of people illegally voted
when he said that millions of people legally voted
when he says that millions of people illegally voted 


he has some ((fa- and see-)) in his- amongst him and his supporters
he has embassy in his in my two minutes the poor
he has some badass again and his intimate images supporters


and people believe
and the people believe
and people believe 


they have facts to back that up
they have back to back them up.
they have facts to back it up


those that do not like mister Trump
Those who do not like Mister dropped
go to do not like this truck


they say that those are lies and there's no facts to back it up
they say that otherwise there's no back to back it up.
when they say that those are no facts to back it up



6 Comments

  1. Cervantes said,

    December 4, 2016 @ 11:16 am

    Good grief.

    While I do appreciate the sacrifice you're making here for Science …

  2. Dan Lufkin said,

    December 4, 2016 @ 12:33 pm

    "uh a large m- a large part of the population"

    Whoops! She almost said "majority" there, I think.

    A point to make here is that ASR can be very useful under well-controlled conditions. I used Dragon in a quiet office environment with excellent audio hardware that consistently gave me 99+% recognition accuracy, measured against both actual work product and Harvard sentences (q.G.). Under these conditions, translating with Dragon paid for itself in about 30 minutes.

    I tried several times to interest fellow translators in working with Dragon but demonstrating the process with a laptop at conferences with people talking in the background never convinced anyone that it was a good idea. In that environment I was lucky to get 90% recognition.

    It shouldn't surprise anyone to hear that hip replacement surgery is unsuccessful when done with a chainsaw in the back yard.

  3. jaap said,

    December 5, 2016 @ 9:32 am

    I occasionally make a video and upload it to YouTube. It then automatically tries to generate a set of subtitles (or closed captions if you prefer). I think that over the last year or so, the quality has improved remarkably. Apart from the occasional word that is not in the dictionary, the most problematic parts are always the small hesitations, stumbles, repeats and corrections that one makes. I suspect this is one of the next hurdles to overcome, and in the transcript above you can see that they are working on it – some of the uh's and repeats are not transcribed.

  4. Bean said,

    December 5, 2016 @ 12:06 pm

    I was dozing in the dentist's chair watching Canada AM (or some such) a few weeks ago, and laughed out loud when the host apparently said "Oh my Gord!" (according to the ASR). It may become my exclamation of choice going forward. Far better than the original.

  5. bobbie said,

    December 5, 2016 @ 1:41 pm

    My favorite ASR to date: The Prophet Ezekiel = The profit is equal

  6. Andrew Usher said,

    December 6, 2016 @ 7:34 am

    How can I do that? I've spent some time trying to find a way to speech-recognise arbitrary audio files (as I would make in Praat), but I come up with nothing. It has to work offline in Windows XP, and not require writing a new program. I would pay a small amount for this if need be.

    k_over_hbarc at yahoo dot com

    [(myl) You could buy one of the Dragon products. But Windows XP? Really?]

RSS feed for comments on this post