Auto-translated subtitles from auto-generated subtitles

« previous post | next post »

Mark Metcalf learned something new this Monday morning: YouTube not only provides subtitles, but if the subtitles haven't been created in English, it can generate/translate them on the fly – at least for German. Doesn't seem to be available for Chinese yet.

Select 'CC' at the bottom left of the right side of the video window menu bar to auto-generate the German subtitles. Then click on the gear icon and select auto-translate, from which pick English. You should see English subtitles in near real-time.
The results are mind-boggling:  fast as greased lightning and impressively accurate.

Mark tried it out on this interview with the Latvian mezzo-soprano Elīna Garanča at the Wiener Staatsoper about her role as Kundry in Wagner's »Parsifal«.  Mark noted that the auto-subtitle generator / translator seems to render the character name "Kundry" as "customer".  He's right.

The mixup between Kundry and "customer" is interesting (and understandable).  It's because in most of those cases — when she's not actually saying the name "Kundry" (the high messenger of the Grail in Wagner's "Parsifal") — she's saying the German word "Kundin" (fem. for "customer"), but which also has the secondary meaning of "character".  (Of course, the auto-translator correctly renders German "Charakter" as English "character" when she uses the German cognate!)

It's fascinating to me that the same sort of informal usage of "customer" as "character" occurs in English:

(informal) A person, especially one engaging in some sort of interaction with others.

a cool customer, a tough customer, an ugly customer
1971, Herman Wouk, chapter 52, in The Winds of War:
Pug could just see Slote's pale face under his fur hat. "I don't agree with you on that. He's a pretty tough customer, Hopkins."
2020 January 2, Philip Haigh, “Ten eventful years and plenty of talking points”, in Rail, page 54:
This switch led to Philip Hammond becoming the Transport Secretary and he quickly proved to be a tricky customer, asking questions about rail spending and reining it back whenever possible.


It was simply overwhelming for me to listen to Madame Garanča speaking German rapidly with nuanced emotion / emphasis / intonation and watch the corresponding English instantaneously fly across bottom of the screen.

Selected readings


  1. Jarek Weckwerth said,

    March 7, 2024 @ 7:25 am

    This has been there for several years now (probably more than five for English?), but is only available for the small subset of languages for which automatic speech recognition can be used for the captions. BTW I don't think you can generate the ASR captions on a video that doesn't belong to you, but maybe I'm wrong. The translation works irrespective of whether the captions are automatic or manual.

    There are may external websites that make good use of the captions. For example, YouGlish and Filmot (among others) allow you search through the captions the way you would do in a spoken corpus, something that is impossible on YT itself. Check them out. Beyond invaluable.

  2. Laurence Whiteside said,

    March 7, 2024 @ 7:39 am

    While watching a production of Molière's "Fourberies de Scapin" with auto generated French subtitles, "consulter un avocat" was rendered as "insulter un avocat".

  3. Michèle Sharik Pituley said,

    March 7, 2024 @ 10:41 am

    I watched a video last night with auto-generated captions which rendered hypotenuse as hot pot in use. LOL

    I don’t know that that sort of thing will ever be completely accurate, given the number of different accents, but it is impressive nonetheless.

  4. Christian Horn said,

    March 7, 2024 @ 6:05 pm

    I think the translation part is not surprising here.. but the onthefly-voice-recognition.

    Some time ago, automated subtitle creation for google meet sessions in Japanese became available, and that's really impressive. In such calls, it's also interesting how a subtitle is starting to get written on the screen, and as the speaking person says more – so more content and context get available, the already written subtitle might get removed and something else get written onto the screen. Quite usable in it's quality, and making it easier to follow the contents for Japanese beginners.

  5. Jarek Weckwerth said,

    March 8, 2024 @ 9:42 am

    @ Christian Horn: The speech recognition on YouTube is not on-the-fly. You ask for it when submitting the video, and it takes some sweet time to generate. That allows it to be really good for the major languages with a lot of training data.

    The machine translation on YT may a bit more "more real time", but doesn't need to be lightning fast. The source text is already there after all.

    The real-time subtitles in meeting apps, PowerPoint etc. are more impressive.

    And the most impressive is what is being rolled out at the European Parliament and other European institutions at the moment. That IS in fact real-time speech-to-speech translation, from ASR of the original speaker, via machine translation into the entire set of EU languages, to Text-to-Speech (TTS) at the other end.

  6. cM said,

    March 15, 2024 @ 6:30 am

    Just watched the whole video: She never says "Kundin": It's the auto-transcription that wrongly transcribes her slightly accented "Kundry" as "Kundin" – which then of course leads to "customer" in the auto-transcription's auto-translation.

  7. Chas Belov said,

    April 3, 2024 @ 12:56 am

    While I greatly appreciate the timing that YouTube applies to captions, it's still important for accessibility to download, proofread, and re-upload those automatic captions. I regularly see errors. As for automatic translation, if the original English transcription has errors, I expect them to be magnified in translation.

RSS feed for comments on this post