AI and slang
« previous post | next post »
As someone who is particularly fond of and sensitive to vernacular (I didn't say "vulgar"), I knew it was only a matter of time before this came up. Below is a stimulating article about the seeming inability of ChatGPT and LLMs to grasp slang as well as they do common language. Every paragraph, indeed every sentence, is thought-provoking. I encourage readers to turn to the original publication if they want more of what I have excerpted below.
Why AI Doesn’t Get Slang
And why that’s a good thing
By Caleb Madison
The Atlantic (October 28, 2023
——–
Slang is born in the margins. In its early form, the word itself, slang, referred to a narrow strip of land between larger properties. During England’s transition from the rigid castes of feudalism to the competitive free market of capitalism, across the 14th to 17th centuries, the privatization of open farmland displaced countless people without inherited connection to the landed elite. This shift pushed people into small corridors between the recently bounded properties.
Confined to the literal fringes of society, they needed to get creative to survive. Some became performers and hucksters, craftspeople and con artists, drifters and thieves. They lived in makeshift homes, often roaming in groups along their slim municipal strip. This was the slang: the land on the outskirts of early English ownership and, by association, its counterculture. The slang had its own rules, its own politics, its own dialect. Roving bands needed a way to speak surreptitiously in the presence of law enforcement, a rival group, or a mark. So over time they developed a secret, colorful, and ephemeral cant.
Across languages and throughout time, the term slang has evolved to mean a subversive lexicon, purposefully unintelligible to whoever’s in charge, perpetually shape-shifting against the mainstream. Organically encrypted through shared experience, slang is difficult for anyone outside the given speaking community to reproduce.
…
Language models, in the most basic sense, represent our 26-letter alphabet in strings of numbers. Those digits might efficiently condense large amounts of information. But that efficiency comes at the price of subtlety, richness, and detail—the ability to reflect the complexities of human experience, and to resist the prescriptions of formal society. Artificial intelligence, in contrast, is disconnected from the kind of social context that makes slang legible. And the sterile nature of code is exactly what slang—a language that lives in the thin threshold between integers—was designed to elude.
Even ChatGPT agrees. “Can we talk in slang?” I prompted it recently.
“Sure thing! We can chat in slang if that’s what you’re into. Just let me know what kind of slang you want to use.”
I responded that I wanted to use “modern slang” and confessed my suspicion that LLMs might have difficulty dealing with vernacular.
Thus spake the algorithm: “Slang can be hella tricky for LLMs like me, but I'm here to vibe and learn with you … We can stay low-key or go all out—it’s your call! ” The words and their meanings were all technically correct—but something was definitely off. The usage didn’t ring true to any consistent place or time. The result was an awkward Frankenstein of tone and rhythm that could make the corniest dad cringe.
…
This gets us into the whole matter of LLMs and registers and levels of language. I think it would be too much to ask them to develop personalities, characters, and socio-political attributes.
Selected readings
- "The many meanings and faces of 'vernacular'" (7/26/23)
- "Vulgar village vernacular" (8/21/21)
- "Mixed literary and vernacular grammar" (9/3/16)
- "Annals of literary vs. vernacular, part 2" (9/4/16)
- "Shandong vernacular, then and now" (8/1/21)
- "Arabic and the vernaculars, part 5" (8/20/22)
By now we've had scores of posts on ChatGPT, LLMs, and so forth. Here I will give only a few of the more recent and relevant ones, some of which have links to many earlier posts.
- "ChatGPT writes VHM" (2/28/23)
- "ChatGPT: Theme and Variations" (2/21/23)
- "GLM-130B: An Open Bilingual Pre-Trained Model" (1/25/2023)
- "ChatGPT writes Haiku" (12/21/22)
- "Translation and analysis" (9/13/04)
- "Welcome to China" (3/10/14)
- "Alexa down, ChatGPT up?" (12/8/22)
- "Detecting LLM-created essays" (12/20/22)
- "Artificial Intelligence in Language Education: with a note on GPT-3" (1/4/23)
- "DeepL Translator" (2/16/23)
- "Uh-oh! DeepL in the classroom; it's already here" (2/22/23)
[Thanks to Don Keyser]
Dan Milton said,
October 29, 2023 @ 10:07 am
I’m traveling, so don’t have access to OED, but the origin of “slang” in the common sense from what Wiktionary gives as “(UK, dialect) Any narrow piece of land, a promontory” strikes me as nonsense.
David L said,
October 29, 2023 @ 11:01 am
Anatoly Liberman has an OUP blog post, dated 2016, arguing for the derivation of the modern word slang from the earlier meaning as a narrow piece of land. Liberman's piece strikes me as more assertion than proof, though.
Vance Koven said,
October 29, 2023 @ 11:05 am
Doesn't the AI's response sound altogether too much like Eddie, the onboard computer from "Hitchhiker's Guide to the Galaxy"?
Dan Milton said,
October 29, 2023 @ 12:23 pm
OK. After reading Liberman’s post, I don’t consider the word origin nonsense. Unconvincing, rather.
Philip Taylor said,
October 29, 2023 @ 12:36 pm
Although I no longer regard the OED as infallible as I once did, I still regard it as more authoritative than Wiktionary. So :
Thus the OED does not suggest that Slang, n, 1a may be derived from Slang, n, 2. Caleb may well be correct with his hypothesis, but the OED neither supports nor rejects it as of today.
Graeme Hirst said,
October 29, 2023 @ 12:58 pm
One of the grad students in my department, Zhewei Sun, has been studying AI/NLP models of slang identification and generation for several years. His publications on the topic are listed here: Google Scholar
Stephen Goranson said,
October 29, 2023 @ 1:17 pm
Another possibility is that slang/cant came from slang/cannon.
David Morris said,
October 29, 2023 @ 2:31 pm
One student told me that someone had told him that 'slang' means 's(treet) lang(uage)'. That didn't sound right to me, but without immediate access to a suitable dictionary, I had to say "I don't think so, but I will check".
John C Swindle said,
October 29, 2023 @ 6:20 pm
Why isn't "slang" the past tense of "sling," like ring/rang/rung, sing/sang/sung, or ding/dang/dung? No, wait, maybe not ding/dang/dung.
Seth said,
October 29, 2023 @ 6:33 pm
Sorry, while this statement is technically true – "Language models, in the most basic sense, represent our 26-letter alphabet in strings of numbers" – it is so far from conveying useful and relevant understanding as to put me off from the rest of the article. It conveys to me that the author is way out of their depth. The rest of that paragraph is similar.
I think this is all re-iterating the idea that LLM's don't have a model of the world, which I certainly would agree is a very important insight. But I'd say the author is struggling to have a model of LLM's, and it's not clear to me which of them is worse at the task they're attempting.
It might be interesting to test specifications more specific than "modern slang" (which group?). African-American Vernacular English? Jive? Valley Girl? Hipster?
Jarek Weckwerth said,
October 30, 2023 @ 10:27 am
Again, spot on, Seth!
GH said,
October 30, 2023 @ 3:36 pm
@Dan Milton,
I think you're misreading the Wiktionary article. It gives the origin of "slang" in the language sense as:
The bit you quoted is not the proposed origin of "slang," but the definition of a different sense of the word.
Admittedly the structure of the article is pretty misleading, since it uses "Etymology [N]" as the section header for each sense [N] that has a separate derivation, even when there is no explicit etymology offered for that sense. I believe this is the standard format (or at least one common format) on Wiktionary, even though it's very counterintuitive.
GH said,
October 31, 2023 @ 6:01 am
Ah, I see now that I misread the comment, and that it was actually criticizing a claim in the quoted article, not Wiktionary. Apologies.
rpsms said,
November 1, 2023 @ 12:59 pm
Definitely reminds me of the "fellow kids meme"