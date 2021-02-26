« previous post |

At our most recent Penn Phonetic Lab meeting, we heard a (virtual) talk by Marc Garellek on the topic "Reconsidering voicing during glottal sounds". The talk was quite interesting, but more relevant for a general audience was what happened when someone turned on Zoom's "Live Transcription" feature:





Overall the accuracy was not terrible, as such things go these days — I'm not able to give an accurate Word Error Rate number, since I don't think the presentation was recorded, but I'd estimate it at about 25%. The fun part was what the errors were.

For example, the phrase "articulate Tory" occurred four times during the 60 seconds or so of transcript I saved, and these were not references to Yorick Wilks.

Rather, Marc's subject naturally involved the word articulatory, and each time he used that word, Zoom transcribed it as "articulate Tory". And whenever he said glottalization it was transcribed as "globalization". So when he said (what I believe was)

For many languages, we really need to only assume two articulatory gestures here, aspiration and glottalization.

… Zoom transcribed:

For many languages, we really need to only assume to articulate Tory gestures here, aspiration, and globalization

I suspect that Yorick is more committed to aspirational gestures than to globalization-related ones, but I'll leave it to him to let us know.

Among other fun errors, creaky vowel was regularly rendered as "creek evil", and glottal was sometimes rendered as "rebuttal" (so Do different glottal sounds differ in their voicing -> "Do different rebuttal sounds differ in their voicing"), and h as "page" or "stage" (so hook top h -> "hook top page" and voiced h as "voice stage"), and so on.

So the subtitles/transcripts are probably helpful for some non-native speakers and perhaps those with weak network connections, but …

