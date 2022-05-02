« previous post | next post »

Trevor Noah's speech at the White House Correspondents' Dinner has gotten a lot of well-deserved praise. But what impressed me most about it was the quality of the "auto-generated" transcript associated with the YouTube version.

Assuming that "auto-generated" means "the output of automatic speech-to-text", the results are overall excellent — with a few odd glitches. For example, the transcript consistently renders "Covid" as "covert". The first one, at around 1:40 —

Your browser does not support the audio element.

and uh covert risk aside can i just say

how happy i am that this event is

happening again for the first time in

three years

Then again at about 9:23 —

Your browser does not support the audio element.

but

despite some hiccups president biden has

led the country through some really dark

times since he took office the covert

pandemic

the war in ukraine the launch of cnn

plus

(The odd line divisions are the YouTube transcript's choices, not mine…)

And for the third time, around 19:01 —

Your browser does not support the audio element.

how about that fox primetime lineup huh

tucker carlson sean hannity laura ingram

their coverage of covert was really

impactful

The system's failure to recognize Covid is in striking contrast to its near-perfect rendition of the Fox lineup — it should be Laura Ingraham, not "Laura Ingram", but that's the kind of mistake I make all the time.

Still, some other celebrities' names get rougher treatment. For example, at about 2:32 —

Your browser does not support the audio element.

yeah i might roast you gently you know

like a pair of testicles on a tucker

carson special but

i'm not i'm not doing this just for the

attention all right i'm a comedian not

kirsten cinema all right

and by the way give it up for kirsten

cinema whoever thought we'd see the day

in american politics when a senator

could be openly bisexual

but closeted republican huh

that's progress that's progress

For readers from Mars or the U.K., "Kirsten Cinema" should be Kyrsten Sinema.

Such readers may also have trouble with the subtitles' mis-recognitions of Madison Cawthorn and Ron DeSantis. At around 4:49 —

Your browser does not support the audio element.

and i'll tell you somebody coming from

africa i mean

i've just got to say this is so exciting

you know to be at this swanky party full

of washington's most powerful people

you know it's not as exciting as madison

carthand made it sound but still very

sexy

very very sexy there's many big names

here tonight yeah one of my favorites

rhonda sanchez is here yeah yeah

oh man i'm actually surprised that he

found the time

And DeSantis gets transformed again, at around 5:53 — first as "rhonda sanchez" again, then as "de sanctus" —

﻿ ﻿ Your browser does not support the audio element.

you see what i like about rhonda sanchez

is like if trump was the original

terminator de sanctus is like the t1

thousand you know you're smarter than

him you're slicker than him you can walk

down ramps

yeah because you see no trump said he

won the election but everyone was just

able to look at the numbers and see that

he was wrong that's why ronda sanchez is

one step ahead first you ban the math

textbooks then nobody knows how to count

the votes boom my man

There are other omissions, substitutions, and insertions here and there. And the odd line divisions, combined with the lack of punctuation and capitalization, make the transcripts unnecessarily hard to read. But still, the quality is impressive.

(And some of the errors, like "Cinema" for "Sinema", are actually appropriate to the context…)

Permalink