Robotic anaerobic Rodak erotic rotisserie

« previous post | next post »

In yesterday's "Lively Blind Men" post, Ben Zimmer was appropriately amused by Zoom's speech-to-text mis-recognition of Lila Gleitman's name. But as everyone now has opportunities to learn, speech-to-text systems continue to make strange (and often amusing) mistakes in transcribing words and phrases that they haven't been trained to recognize. There are plenty of examples in pretty much any automatic transcription, and the 10/26 edition of the "Spectacular Vernacular podcast", which Ben co-hosts with Nicole Holliday, doesn't disappoint.

(According to the podcast's website, "Slate podcast transcripts are created by Snackable using machine-learning software and have not been reviewed prior to publication.")

At the start of the 10/26 edition, the hosts focus on syllable-final /r/ in Anthony Fauci's old-school NYC accent. And at first, the transcript is nearly perfect, except for the problem of how to represent the different pronunciations that they're talking about, and using "Jay and Jay" instead of "J & J":

Zimmer: Some people say data. Some people say data. Dr Fauci tends to go with data. Something else struck me this time. Listen to this clip where Dr Fauci is being interviewed by Martha Raddatz on the ABC show this week.
Fauci: But the data of boosting the Jay and Jay first dose with the Jay and Jay second dose is based on clinical data. So what’s going to happen is that the FDA is going to look at all those data, look at the comparison and make a determination of what they will authorize.

In the exchange that follows, Nicole's reduced pronunciation of "wasn't" is transcribed as "was":

Zimmer: Wow, did you heard that right? It wasn’t just me.
Holliday: Yeah, it was just you.

Nicole then explains things, using the technical term "rhotic" — and Snackable renders her first three uses successively as "robotic", "anaerobic", and "rodak":

Holliday: Well, we know that New Yorkers historically have dropped their r’s at the ends of syllables. And if you are a person that pronounces all of your hours in this context, then we would say that you are a robotic speaker. But if you drop the R when it comes after a vowel and it’s not followed by another vowel in this context, you’re not anaerobic.

And it’s worth noting that there’s variation in what people do with our type sounds. That’s common both cross linguistically and within varieties of English. So in addition to seeing this non rodak pattern in New York, it’s also common among some Southerners and African-Americans across regions, as well as most UK varieties.

("Rodak" is variously the name of a Slovak footballer and a Japanese supervillain, among other things that Snackable might have encountered in its training. Note also that /r/ (with single and plural inflections) is variously rendered as "r's", "hours", "R", "our" — with more variants to come later on.)

Snackable continues to go with "rodak" for "rhotic" through Ben's next turn, but then things get weird.  "In a rhotic style" is rendered as "an erotic sale"; "in a non-rhotic style" is rendered as "in a Rodak style"; "I can't, I'm very rhotic" comes out as "I can't remember who wrote it"; and "non-rhoticity" is rendered as "non rotisserie":

Nicole: And now I get to tell you about one of my favorite sociolinguistics studies. It’s a really famous sociolinguistics study by Bill above at Penn, where I am back in the 60s, where he went to three department stores, a high end one in New York, Saks Fifth Avenue and a middle class one Macy’s and a working class one that was called S. Klein. And he’d go up to the store workers and ask for something that he knew was on the fourth floor. And then he would listen for whether they said fourth floor, an erotic sale or fourth floor in a Rodak style. I can’t even do because I super.
S4: Yeah. What is it? Ben fourth floor? Yeah, I can’t remember who wrote it,
Nicole: but he found that workers in the discount store had much higher rates of this non rotisserie this hour, dropping Ben in the middle class and upper class stores.

In addition, Snackable renders "Bill Labov" as "Bill above"; and gets confused about who's talking when ("diarization"), making up a non-existent speaker to cover the mixed region.

As is all too often the case, the automatic transcript gets everything right except for what the transcribed passage is actually about :-)  This is partly due to the systems' linguistic experience — and humans can have similar (= "egg corn") problems — but it's also a matter of poor evaluation of transcriptional uncertainty, and lack of (an equivalent to) common-sense evaluation of the context.

So I wonder what Snackable would have done if the hosts had used the terms "r-less" and "r-ful" instead.

[h/t Cynthia McLemore]



  1. Rodger C said,

    October 27, 2021 @ 9:31 am

    R-less pronunciation: Pronunciation that sounds like George Arliss.

  2. Ben Zimmer said,

    October 27, 2021 @ 9:44 am

    Thanks for posting, Mark! I should also mention that the "hyper-rhoticity" that Nicole and I talk about with regards to Dr. Fauci's speech is something I posted about here on Language Log back in 2008 — see "Botswaner and Louisianer." The hyper-rhotic examples I gave in that post from Norman Siegel are even more remarkable than Dr. Fauci's, but unfortunately I didn't have time on Spectacular Vernacular to bring all of that up.

  3. david said,

    October 27, 2021 @ 11:14 am

    Fauci: …the data of boosting the Jay and Jay first dose with the Jay and Jay second dose is based on clinical data. So what’s going to happen is that the FDA is going to look at all those data …

    the first two 'data' weree pronounced without an 'r' sound and the third 'data' had an 'r' sound. Is this variation of New York speech consistent?

    joke — the words 'later data' rhyme in Cambridge (MA) and in Brooklyn (NY).

  4. KevinM said,

    October 27, 2021 @ 12:56 pm

    @Ben Z. Going to law school in NYC years ago, I was instructed in the Lore of Evidence, which actually was not inaccurate, now that I think of it.

  5. Michael said,

    October 27, 2021 @ 3:22 pm

    Personally, I was quite impressed that Snackable broke down "gonna" (twice!) quite accurately into "going to."

  6. 번하드 said,

    October 27, 2021 @ 3:44 pm

    r-ful? awful.

RSS feed for comments on this post