Putin: "pollutant"? "pooch and"?
« previous post | next post »
The transcriptions on YouTube are generally pretty good these days, but sometimes the results are weird.
A notable recent example is the transcription of Donald Trump's 8/31/2024 Fox interview with Mark Levin, where the system renders "Putin" first as "pollutant" and then as "pooch and".
The relevant audio clip, with my transcription:
You didn't have
Iran saying they're going to blow up Israel.
And you didn't have
Putin saying he's going into Ukraine. That would've never happened.
Putin going into Ukraine would've never happened.
What the (automated?) YouTube transcript has:
In addition to the weird renderings of "Putin", the substitution of "never would've" for "would've never" is odd.
It's not clear to me whether it's Google or Fox that was responsible for automated transcript/subtitling of this interview…
Or maybe human subtitling was outsourced to some Ukrainians? [Update — maybe this is what happened, because the "auto-generated" version gets Putin's name right — see Update #2 below…]
Update — It's easy to find subtitled FoxNews clips on YouTube where "Putin" is accurately transcribed, e.g. here and here…
…which refutes the bizarre idea that this is caused by a Fox transcription system that doesn't have "Putin" in its language model.
Update #2 — I should note that some YouTube transcriptions are labelled as "English (auto-generated)" and some aren't — so maybe the others are human-generated? And this one is labelled "English – CC1" — but the menu (which I didn't look at earlier) offers the "auto-generated" option:
And the auto-generated version gets "putin" right!
Cervantes said,
September 1, 2024 @ 8:57 am
Well, of course Putin is not an English word. I find automated transcription generally can't deal well with proper nouns.
Mark Liberman said,
September 1, 2024 @ 9:11 am
@Cervantes: "Well, of course Putin is not an English word. I find automated transcription generally can't deal well with proper nouns."
Not relevant. The transcript of the same interview correctly represents many proper names — "KAMALA", "KAVANAUGH", "ELIZABETH POCAHANTAS WARREN", "BERNIE SANDERS", "VENEZUELA", "PENNSYLVANIA", "HILARY", etc.
And if "PUTIN" is not an English word, how about "CARACAS", "HAMAS", "HEZBOLLAH", which are rendered correctly?
It's possible that the transcription was done by an automated system whose language model didn't have "Putin" in it — but that in itself would be weird, for transcription of a contemporary news-oriented interview program.
Philip Taylor said,
September 1, 2024 @ 9:34 am
Interesting (to me, at least) is that to my mind, Trump gets closer to the Russian sound of "Putin" than most British speakers, almost all of whom say /ˈpjuː tɪn /.
Robert Coren said,
September 1, 2024 @ 9:42 am
We have recently started watching non-sports TV programs with the captions on, because (1) sometimes the characters speak too fast and indistinctly for our ancient ears, and (2) sometimes the accents (especially Scottish or North-of-England ones) are impenetrable.
Lately we've been watching the "Agatha Christie's Marple" series on PBS (for which we probably don't really need the captions), and I've been struck by the frequency with which the captioning is slightly off. My favorite example so far is a scene with a bridge game going on in the background, and one of the players' "no bid" was transcribed as "low B".
Rodger C said,
September 1, 2024 @ 9:50 am
I once watched a captioned news show in which "Texas" became THE ACCIDENT.
Cervantes said,
September 1, 2024 @ 10:11 am
Well, a fortiori the translator didn't have Putin in its dictionary. That might seem odd but it is what it is.
David Morris said,
September 2, 2024 @ 2:24 am
Would an auto-subtitler be likely to use 'putain'?
/df said,
September 2, 2024 @ 4:58 am
May I mention here, possibly not for the first time, the extremely precious Sky News UK automatic transcription, ca. 2010, of Putin's von Ribbentrop Sergei Lavrov as "so gay lover"?
J.W. Brewer said,
September 2, 2024 @ 7:36 am
@Philip Taylor: Trump not saying /ˈpjuː tɪn / is consistent with what you'd hear from most if not all American speakers. Trans-Atlantic differences in patterns of yod-dropping account for it more parsimoniously than any individual familiarity with Russian. By contrast, an American who pronounces "Vladimir" with stress on the second syllable rather than the first would likely be demonstrating (perhaps pretentiously) familiarity with the Russian pronunciation.
awelotta said,
September 2, 2024 @ 8:11 am
It's probably captioner error.
"pollution" was might have been mapped to PAO*UPBT. The captioner misremembered their own outline for Putin, and got the wrong output. Then they tried something else to try to fix it. "pooch and" was probably mapped to PAO*FPB?
Though, the usual outline for a word like pollution is PHRAOUGS.
Maybe Putin was mapped to PAOUPBT.
idk im not a stenographer. but maybe the other words did not suffer from the same mistake because they're longer and thus have more straightforward outlines without any consonant inversion.
If none of that made any sense https://lapwing.aerick.ca/Chapter-01.html this should be a good start (idk, i havent really read it). Lapwing is the name of a steno theory (ie scheme for deriving outlines from words). Other theories exist, but most will use the same base layout and most of the same chords to represent different sounds.
Andrew Usher said,
September 5, 2024 @ 8:27 pm
That is all irrelevant as this could not possibly be human work. No human transcriber could have failed to recognise that 'Putin' was a name (even if unfamiliar) and could not, therefore, have transcribed it if two _different_ ways, both ungrammatical.
The previous conclusion that it was a program – a different one than YouTube uses – that didn't know 'Putin' is clearly correct.
k_over_hbarc at yahoo.com
awelotta said,
September 27, 2024 @ 11:50 pm
> No human transcriber could have failed to recognise that 'Putin' was a name (even if unfamiliar) and could not, therefore, have transcribed it if two _different_ ways, both ungrammatical.
Computer steno involves memorizing potentially completely arbitrary codes for inputting a fixed vocabulary of words. (The amount of memorization is comparable to graphical input methods for Chinese characters.) Inputting whole words at a time is what enables real-time speed. If a word isn't in the stenographer's dictionary (ie the program converting from the code to English), or they forgot the code, then they probably would transcribe it two different ways, because they would try a different code after seeing that the output was wrong for the first. The human recognizing the word as a name is irrelevant; the code doesn't care about grammaticality, just as a QWERTY keyboard lets me type gibberish.