Name-transcription slop
« previous post |
Friday's On The Media, "Deep Fakes, Data Centers, And AI Slop — Are We Cooked?" has some linguistically-interesting discussion, especially the part about the rise of AI-generated trolling — more on that later. But this post is just a quick note on a widespread symptom of current end-to-end speech-to-text technology, where the text end of the process is letter-sequence tokens of obscure origin, yielding some peculiar spelling errors.
The show signs off like this
…which YouTube's "auto-generated" transcript renders as:
Checking the show's website, we see that a couple of these names are correctly spelled: Molly Rosen and Katya Rogers.
A few others are spelled wrong, but in a more-or-less plausible way: Candice Wang becomes "Candace Wong", Eloise Blondiau becomes "Eloise Blondio", and Micah Loewinger becomes "Michael Owinger".
Rebecca Clark-Callendar entirely loses her post-hyphen syllables, to become "Rebecca Clark".
Then Jennifer Munson becomes unpronounceable as "Jennifer Mnson", and to top it all off, Brooke Gladstone become "Broo Gladstone"…
In the YouTube post-closure closure, Ira Flato loses his 'l':
And I continue to be puzzled about YouTube's failure to even try to do phrase division and speaker diarization — but again, that's a topic for another day…


Jarek Weckwerth said,
December 21, 2025 @ 5:06 pm
So true. However, all of those names above are, let us say, a little less frequent. I'm always much more puzzled by how the system also botches names that should absolutely be present in the dictionary… if there was a dictionary. I'm watching a video right now about Cracow, Poland. It's Kov in the captions. Capitalized, so it knows it's a name. Puzzling.
Rick Rubenstein said,
December 21, 2025 @ 5:31 pm
I learned in a recent video about AI (likely from Welch Labs? Not sure) that modern LLMs use somewhere in the ballpark of 200,000 tokens. This seemed surprisingly small to me, given the enormous number of proper nouns that must be present in the training set. I wonder if this apparently baked-in "name blindness" is responsible for (or at least relevant to) phenomena like (what I've read is) LLM's propensity, when generating Sci-Fi, to name male characters "Kael".
Haamu said,
December 21, 2025 @ 6:36 pm
Slightly off-topic, but in keeping with the recent preposition theme: “Engineering from”?
ktschwarz said,
December 21, 2025 @ 7:59 pm
Ira Flatow lost his w as well as his l.