Articulate Tory gestures

« previous post | next post »

At our most recent Penn Phonetic Lab meeting, we heard a (virtual) talk by Marc Garellek on the topic "Reconsidering voicing during glottal sounds". The talk was quite interesting, but more relevant for a general audience was what happened when someone turned on Zoom's "Live Transcription" feature:

Overall the accuracy was not terrible, as such things go these days — I'm not able to give an accurate Word Error Rate number, since I don't think the presentation was recorded, but I'd estimate it at about 25%. The fun part was what the errors were.

For example, the phrase "articulate Tory" occurred four times during the 60 seconds or so of transcript I saved, and these were not references to Yorick Wilks.

Rather,  Marc's subject naturally involved the word articulatory, and each time he used that word, Zoom transcribed it as "articulate Tory". And whenever he said glottalization it was transcribed as "globalization". So when he said (what I believe was)

For many languages, we really need to only assume two articulatory gestures here, aspiration and glottalization.

… Zoom transcribed:

For many languages, we really need to only assume to articulate Tory gestures here, aspiration, and globalization

I suspect that Yorick is more committed to aspirational gestures than to globalization-related ones, but I'll leave it to him to let us know.

Among other fun errors, creaky vowel was regularly rendered as "creek evil",  and glottal was sometimes rendered as "rebuttal" (so Do different glottal sounds differ in their voicing -> "Do different rebuttal sounds differ in their voicing"), and h as "page" or "stage" (so hook top h -> "hook top page" and voiced h as "voice stage"), and so on.

So the subtitles/transcripts are probably helpful for some non-native speakers and perhaps those with weak network connections, but …

 



13 Comments

  1. Bob Ladd said,

    February 26, 2021 @ 5:36 pm

    A few months ago my wife recorded a lecture for her (virtual) language development course with a system that added captions that she was supposed to check before uploading the recording. The phrase the child's grammar came out consistently as the child's grandma.

  2. Garrett Wollman said,

    February 26, 2021 @ 5:58 pm

    To amplify Mark's final paragraph: this is why, for recorded video, MIT's ADA consent decree requires that we use a professional (human) captioner on any videos we release to the public.

  3. mg said,

    February 26, 2021 @ 6:51 pm

    My cell phone's voice recognition was stubborn in its insistence that COVID is covet.

  4. Danni said,

    February 26, 2021 @ 7:06 pm

    !! I turned on the caption, and thought that it was only for me (like Google Hangouts).. Did not realize in Zoom it's available for all the audience…

  5. Duncan said,

    February 26, 2021 @ 11:38 pm

    I frequently play youtube "N hours of rain for sleeping/studying/coding" type videos. Sometimes they include a CC button and early on I wondered what on earth there could be that it considers captionable.

    So I turned it on, and got… "[applause]" … "[applause]".

    Never thought of it that way before, but I guess chaotic/random rain drops and chaotic/random hand-claps /do/ sound much the same.

    … And of course there's the now typical response below such recordings "Sounds like bacon frying. Now I'm hungry!" (Echos of chaotic-popping-pattern vocal fry?)

  6. Frédéric Grosshans said,

    February 27, 2021 @ 2:57 am

    @Bob Ladd: I find funny that the computer seems to have the same confusion as Martine, in Molière’s “Les femmes savantes” in1672 misheard “Grand’mère” for “Grammaire”

  7. Terpomo said,

    February 27, 2021 @ 6:01 am

    I will say that on occasion I've watched YouTube videos with the sound off (say, no headphones on hand and didn't want to disturb people nearby) just with the auto-captions and, while not perfect, it was generally good enough to follow what was being said- it wasn't usually hard to figure out by context what the mistaken words must have been. That said, I think the videos in question were mostly on everyday topics without much specialized terminology.

  8. Tom Dawkes said,

    February 27, 2021 @ 7:47 am

    @Bob Ladd
    The misunderstanding occurred in 17th century France, too.
    Molière 'Les femmes savantes' Act 2, Scene 4. Bélise is one the ladies of the title, and Martine is her maid.

    BÉLISE
    Ton esprit, je l’avoue, est bien matériel.
    Je, n’est qu’un singulier ; avons, est pluriel.
    Veux-tu toute ta vie offenser la grammaire ?
    MARTINE
    Qui parle d’offenser grand’mère ni grand-père ?

    Grammaire and grand'-mère both had a nasal [ã] in the first syllable, so were effectively homophones.

    [(myl) I don't think it's correct that grammaire has a nasal vowel in the first syllable — see (and listen to) the Wiktionnaire pronunciations of grammaire, and compare the pronunciations of grand-mere. So the two words are confusable, but not homophones.]

  9. Jody Kreiman said,

    February 27, 2021 @ 11:13 am

    Zoom transcribes 'Jody' as 'God'.

  10. Bob Ladd said,

    February 27, 2021 @ 12:17 pm

    @ Tom Dawkes, @ MYL:
    Allowing for all the difficulties in figuring out fine phonetic details of ages past, it does seem that grammaire did have a nasal vowel in Molière's time, but that denasalisation of vowels before intervocalic nasal consonants was going on at about that time to create a difference between grammaire and grand'mère.

  11. amy said,

    February 28, 2021 @ 8:04 pm

    YouTube's auto-captioning also tends to give a similar political/news slant in its inaccuracies. I suspect it's due to the material that such systems are normally trained on, where "articulate Tory" and "globalization" would definitely occur far more than "articulatory" and "glottalization".

  12. sgac said,

    March 1, 2021 @ 12:42 am

    I recently kept a log of the number of ways NVIVO transcription rendered "smoke alarm". In the course of one interview, I amassed "Michael Holmes", "spike collapses", "Michael I'm", "might collapses", "my car alarm", "smile alarm" and "microloans". Not to mention several others before I started keeping a record.

  13. Laichar said,

    March 1, 2021 @ 7:26 pm

    The phrase "articulate Tory" is an oxymoron.

RSS feed for comments on this post