The covert pandemic

« previous post | next post »

Trevor Noah's speech at the White House Correspondents' Dinner has gotten a lot of well-deserved praise. But what impressed me most about it was the quality of the "auto-generated" transcript associated with the YouTube version.

Assuming that "auto-generated" means "the output of automatic speech-to-text", the results are overall excellent — with a few odd glitches. For example, the transcript consistently renders "Covid" as "covert". The first one, at around 1:40 —

and uh covert risk aside can i just say
how happy i am that this event is
happening again for the first time in
three years

Then again at about 9:23 —

despite some hiccups president biden has
led the country through some really dark
times since he took office the covert
the war in ukraine the launch of cnn

(The odd line divisions are the YouTube transcript's choices, not mine…)

And for the third time, around 19:01 —

how about that fox primetime lineup huh
tucker carlson sean hannity laura ingram
their coverage of covert was really

The system's failure to recognize Covid is in striking contrast to its near-perfect rendition of the Fox lineup — it should be Laura Ingraham, not "Laura Ingram", but that's the kind of mistake I make all the time.

Still, some other celebrities' names get rougher treatment. For example, at about 2:32 —

yeah i might roast you gently you know
like a pair of testicles on a tucker
carson special but
i'm not i'm not doing this just for the
attention all right i'm a comedian not
kirsten cinema all right
and by the way give it up for kirsten
cinema whoever thought we'd see the day
in american politics when a senator
could be openly bisexual
but closeted republican huh
that's progress that's progress

For readers from Mars or the U.K., "Kirsten Cinema" should be Kyrsten Sinema.

Such readers may also have trouble with the subtitles' mis-recognitions of Madison Cawthorn and  Ron DeSantis.  At around 4:49 —

and i'll tell you somebody coming from
africa i mean
i've just got to say this is so exciting
you know to be at this swanky party full
of washington's most powerful people
you know it's not as exciting as madison
carthand made it sound but still very
very very sexy there's many big names
here tonight yeah one of my favorites
rhonda sanchez is here yeah yeah
oh man i'm actually surprised that he
found the time

And DeSantis gets transformed again, at around 5:53 — first as "rhonda sanchez" again, then as "de sanctus" —

you see what i like about rhonda sanchez
is like if trump was the original
terminator de sanctus is like the t1
thousand you know you're smarter than
him you're slicker than him you can walk
down ramps
yeah because you see no trump said he
won the election but everyone was just
able to look at the numbers and see that
he was wrong that's why ronda sanchez is
one step ahead first you ban the math
textbooks then nobody knows how to count
the votes boom my man

There are other omissions, substitutions, and insertions here and there. And the odd line divisions, combined with the lack of punctuation and capitalization, make the transcripts unnecessarily hard to read. But still, the quality is impressive.

(And some of the errors, like "Cinema" for "Sinema", are actually appropriate to the context…)



  1. Philip Taylor said,

    May 2, 2022 @ 7:32 am

    Focussing solely on the Covid/covert error, in British English "covert" as adjective takes the /ʌ/ vowel — /ˈkʌv ət/. Is the same true for American English, or does American "covert" (adj.) take the /oʊ/ vowel — /ˈkoʊ vɝːt/ — in which case the confusion would be understandable.

  2. Geoff M. said,

    May 2, 2022 @ 7:35 am

    I actually heard it as "De Sanctus", and wondered if I was being influenced by having been to mass earlier in the day!

  3. Jaap Scherphuis said,

    May 2, 2022 @ 7:35 am

    I make YouTube video's, I've been quite impressed by the auto-generated subtitles for some time, especially how it filters out some of the disfluencies, the short ums and ahs, etc. and how it can handle fast unbroken speech, even with background noise.

    What annoys me most however is that for some reason it does not do any punctuation or capitalization at all, so I have to go in and edit it manually every time. It would already be a great help if they just capitalized each "i", which isn't much to ask for.

  4. Bloix said,

    May 2, 2022 @ 8:42 am

    I've noted, here and on Language Hat's blog, how there's a tendency among educated African Americans to devoice final d -e.g. United States, among White Americans, is voiced "Unided States" while among Black Americans it's closer to "Uninet States."

    Trevor Noah, of course, is not an African American, but it does seem that he may have this tendency, so that Covid is sounded covit.

    I wonder if there's a consistent White bias in machine transcription such that common Black pronunciations are mis-transcribed.

  5. Robert Coren said,

    May 2, 2022 @ 9:58 am

    @Philip Taylor: Yes, American speech has /oʊ/ in "covert".

  6. Alexander Browne said,

    May 2, 2022 @ 9:59 am

    Philip Taylor: OED and Wiktionary list both vowels, but I only use /oʊ/, and I think that's all I hear here in the upper midwest.

  7. Nicholas Allott said,

    May 2, 2022 @ 10:31 am

    @Philip Taylor: /'kʌvət/ is a possible pronunciation of the adjective in British English, but /'kəʊvɜ:t/ is more frequent – as the Cambridge pronunciation dictionary claims (by listing /'kəʊvɜ:t/ first).

    One can check by running a search for 'covert' in the UK section on Youglish. In the first ten examples, I found no instances of /'kʌvət/. The only variability was in which syllable was stressed: mostly /'kəʊvɜ:t/, but a few instances of /kəʊ'vɜ:t/ (8 to 2 by my count).

  8. maidhc said,

    May 2, 2022 @ 6:21 pm

    I remember reading a story that the word "covert" was barely used in American English, but came into use describing the activities of the CIA. There was a discussion within the CIA of how it should be pronounced, and the consensus was that using the British pronunciation might give the impression that they were elitist and pro-British snobs. So the decision was made to go with a more American sort of red-blooded anti-Communist pronunciation, which had not been widely used before.

    I'm afraid that I cannot find a way to attribute this story to Winston Churchill at a party.

  9. AntC said,

    May 3, 2022 @ 1:27 am

    [the speech] has gotten a lot of well-deserved praise.

    Hmm. This Brit sense of humour really can't see the U.S. comedian's roast as other than doubling-down on the victim's embarrassment/schadenfreude. I agree Trevor's was deft and light-handed — unlike Chris Rock's crass attacks at the Oscars. I kinda snigger sometimes, but I don't go so far as to laugh. (It seems to be infecting Brit humour: Ricky Gervais/The Office is just irredeemably embarrassing and socially awkward, with painfully obvious situational setups. Like poking fun at the already-afflicted.)

    So I see no loss at the passing of Gilbert Gottfried. Was Norm MacDonald ever funny? Everything I can find on YouTube is just unpleasant.

    Trevor and (especially) Stephen Colbert's ribbing of The Former President is hilarious — but I guess like shooting fish in a barrel.

  10. Philip Taylor said,

    May 3, 2022 @ 1:40 am

    I suspect that, within the context of British English, /'kʌ vət/ v./ˈkoʊ vɝːt/ is very much a generational thing. Those of us who still pronounce "conduit" as /'kʌn dɪt/ also pronounce "covert" as /'kʌ vət/ while more recent generations have adopted the spelling-influenced /ˈkɒn dju‿ɪt/ and /ˈkoʊ vɝːt/.

    Incidentally, there is at least one error in the electronic LPD wrt "covert" — for "covert, n." the sound shown is /ˈkʌv ət / while the sound spoken is /ˈkoʊ vɝːt/, while for "covert, adj." the sound both shown and spoken is /ˈkʌv ət/. I have never heard "covert, n." spoken as /ˈkoʊ vɝːt/ in real life (it is very much a country term in my experience — a place which gives shelter to wild animals or game; esp. a thicket), whilst I have definitely heard "covert, adj." spoken as /ˈkoʊ vɝːt/ ("covert surveillance"). The latter pronunciation may well be American influenced.

  11. Philip Anderson said,

    May 3, 2022 @ 7:19 am

    @Philip Taylor
    You may be right that the British pronunciation is generational, but if so it’s far from a recent change. For me, as for you, the noun is like cover+t, and the relationship is obvious, whereas the adjective has the other vowel (perhaps American influence since the phrase ‘covert operations’ is the usage that springs to mind).
    But I’ve never heard the cover vowel in ‘conduit’, nor seen it in a dictionary.

  12. Alexander Browne said,

    May 3, 2022 @ 9:23 am

    "Covert" as an adjective may have been popularized from American usage, but the OED in their entry for covert, adj. sense 2.a. "figurative. Concealed, hidden, secret; disguised." has several citations that precede John Cabot. (And FWIW, as many or more early citation than their sense 1.a. "literal. Covered, hidden; roofed over; overgrown; sheltered. Now rare.".)

  13. Chas Belov said,

    May 3, 2022 @ 10:15 am

    @Jaap Scherphuis: My understanding of best practice for captioning is to not filter out disfluencies, so that's not actually a good thing if Google is doing that. And yes, the lack of capitalization and punctuation, particularly periods, bugs me as well.

  14. Philip Taylor said,

    May 4, 2022 @ 2:52 am

    Philip A — "I’ve never heard the cover vowel in ‘conduit’, nor seen it in a dictionary" — it's listed as a variant in the LPD :

    ˈkɒn dju‿ɪt ˈkʌn-, -du‿ɪt, →§-dʒu‿ɪt, §‿ət; ˈkɒnd ɪt, ˈkʌnd-, §-ət

    and is the pronunciation I was taught at about the age of nine (i.e., during the same period that I was taught that the final /t/ of "trait" is silent).

  15. Philip Anderson said,

    May 4, 2022 @ 7:00 am

    Googling for “cunduit” shows that as a common spelling in the Elizabethan period, so showing the pronunciation then.

RSS feed for comments on this post