Putin: "pollutant"? "pooch and"?

« previous post | next post »

The transcriptions on YouTube are generally pretty good these days, but sometimes the results are weird.

A notable recent example is the transcription of Donald Trump's 8/31/2024 Fox interview with Mark Levin, where the system renders "Putin" first as "pollutant" and then as "pooch and".

The relevant audio clip, with my transcription:

You didn't have
Iran saying they're going to blow up Israel.
And you didn't have
Putin saying he's going into Ukraine. That would've never happened.
Putin going into Ukraine would've never happened.

What the (automated?) YouTube transcript has:

In addition to the weird renderings of "Putin", the substitution of "never would've" for "would've never" is odd.

It's not clear to me whether it's Google or Fox that was responsible for automated transcript/subtitling of this interview…

Or maybe human subtitling was outsourced to some Ukrainians?

Update — It's easy to find subtitled FoxNews clips on YouTube where "Putin" is accurately transcribed, e.g. here and here
…which refutes the bizarre idea that this is caused by a Fox transcription system that doesn't have "Putin" in its language model.



6 Comments »

  1. Cervantes said,

    September 1, 2024 @ 8:57 am

    Well, of course Putin is not an English word. I find automated transcription generally can't deal well with proper nouns.

  2. Mark Liberman said,

    September 1, 2024 @ 9:11 am

    @Cervantes: "Well, of course Putin is not an English word. I find automated transcription generally can't deal well with proper nouns."

    Not relevant. The transcript of the same interview correctly represents many proper names — "KAMALA", "KAVANAUGH", "ELIZABETH POCAHANTAS WARREN", "BERNIE SANDERS", "VENEZUELA", "PENNSYLVANIA", "HILARY", etc.

    And if "PUTIN" is not an English word, how about "CARACAS"?

    It's possible that the transcription was done by an automated system whose language model didn't have "Putin" in it — but that in itself would be weird, for transcription of a contemporary news-oriented interview program.

  3. Philip Taylor said,

    September 1, 2024 @ 9:34 am

    Interesting (to me, at least) is that to my mind, Trump gets closer to the Russian sound of "Putin" than most British speakers, almost all of whom say /ˈpjuː tɪn /.

  4. Robert Coren said,

    September 1, 2024 @ 9:42 am

    We have recently started watching non-sports TV programs with the captions on, because (1) sometimes the characters speak too fast and indistinctly for our ancient ears, and (2) sometimes the accents (especially Scottish or North-of-England ones) are impenetrable.

    Lately we've been watching the "Agatha Christie's Marple" series on PBS (for which we probably don't really need the captions), and I've been struck by the frequency with which the captioning is slightly off. My favorite example so far is a scene with a bridge game going on in the background, and one of the players' "no bid" was transcribed as "low B".

  5. Rodger C said,

    September 1, 2024 @ 9:50 am

    I once watched a captioned news show in which "Texas" became THE ACCIDENT.

  6. Cervantes said,

    September 1, 2024 @ 10:11 am

    Well, a fortiori the translator didn't have Putin in its dictionary. That might seem odd but it is what it is.

RSS feed for comments on this post · TrackBack URI

Leave a Comment