AI and slang

« previous post | next post »

As someone who is particularly fond of and sensitive to vernacular (I didn't say "vulgar"), I knew it was only a matter of time before this came up.  Below is a stimulating article about the seeming inability of ChatGPT and LLMs to grasp slang as well as they do common language.  Every paragraph, indeed every sentence, is thought-provoking.  I encourage readers to turn to the original publication if they want more of what I have excerpted below.

Why AI Doesn’t Get Slang
And why that’s a good thing

By Caleb Madison
The Atlantic (October 28, 2023

——–

Slang is born in the margins. In its early form, the word itself, slang, referred to a narrow strip of land between larger properties. During England’s transition from the rigid castes of feudalism to the competitive free market of capitalism, across the 14th to 17th centuries, the privatization of open farmland displaced countless people without inherited connection to the landed elite. This shift pushed people into small corridors between the recently bounded properties.

Confined to the literal fringes of society, they needed to get creative to survive. Some became performers and hucksters, craftspeople and con artists, drifters and thieves. They lived in makeshift homes, often roaming in groups along their slim municipal strip. This was the slang: the land on the outskirts of early English ownership and, by association, its counterculture. The slang had its own rules, its own politics, its own dialect. Roving bands needed a way to speak surreptitiously in the presence of law enforcement, a rival group, or a mark. So over time they developed a secret, colorful, and ephemeral cant.

Across languages and throughout time, the term slang has evolved to mean a subversive lexicon, purposefully unintelligible to whoever’s in charge, perpetually shape-shifting against the mainstream. Organically encrypted through shared experience, slang is difficult for anyone outside the given speaking community to reproduce.

Language models, in the most basic sense, represent our 26-letter alphabet in strings of numbers. Those digits might efficiently condense large amounts of information. But that efficiency comes at the price of subtlety, richness, and detail—the ability to reflect the complexities of human experience, and to resist the prescriptions of formal society. Artificial intelligence, in contrast, is disconnected from the kind of social context that makes slang legible. And the sterile nature of code is exactly what slang—a language that lives in the thin threshold between integers—was designed to elude.

Even ChatGPT agrees. “Can we talk in slang?” I prompted it recently.

“Sure thing! We can chat in slang if that’s what you’re into. Just let me know what kind of slang you want to use.”

I responded that I wanted to use “modern slang” and confessed my suspicion that LLMs might have difficulty dealing with vernacular.  

Thus spake the algorithm: “Slang can be hella tricky for LLMs like me, but I'm here to vibe and learn with you … We can stay low-key or go all out—it’s your call! ” The words and their meanings were all technically correct—but something was definitely off. The usage didn’t ring true to any consistent place or time. The result was an awkward Frankenstein of tone and rhythm that could make the corniest dad cringe.

This gets us into the whole matter of LLMs and registers and levels of language.  I think it would be too much to ask them to develop personalities, characters, and socio-political attributes.

 

Selected readings

By now we've had scores of posts on ChatGPT, LLMs, and so forth.  Here I will give only a few of the more recent and relevant ones, some of which have links to many earlier posts.

[Thanks to Don Keyser]



14 Comments

  1. Dan Milton said,

    October 29, 2023 @ 10:07 am

    I’m traveling, so don’t have access to OED, but the origin of “slang” in the common sense from what Wiktionary gives as “(UK, dialect) Any narrow piece of land, a promontory” strikes me as nonsense.

  2. David L said,

    October 29, 2023 @ 11:01 am

    Anatoly Liberman has an OUP blog post, dated 2016, arguing for the derivation of the modern word slang from the earlier meaning as a narrow piece of land. Liberman's piece strikes me as more assertion than proof, though.

  3. Vance Koven said,

    October 29, 2023 @ 11:05 am

    Doesn't the AI's response sound altogether too much like Eddie, the onboard computer from "Hitchhiker's Guide to the Galaxy"?

  4. Dan Milton said,

    October 29, 2023 @ 12:23 pm

    OK. After reading Liberman’s post, I don’t consider the word origin nonsense. Unconvincing, rather.

  5. Philip Taylor said,

    October 29, 2023 @ 12:36 pm

    Although I no longer regard the OED as infallible as I once did, I still regard it as more authoritative than Wiktionary. So :

    Slang, n, 1.a.
    1756–
    The special vocabulary used by any set of persons of a low or disreputable character; language of a low and vulgar type. (Now merged in sense 1c.)
    In the first quot. 1756 the reference may be to customs or habits rather than language: cf. the use of slang adj. 2b.
    Etymology: A word of cant origin, the ultimate source of which is not apparent. It is possible that some of the senses may represent independent words. In all senses except 1 only in slang or canting use.

    Slang, n, 2
    1610–
    A long narrow strip of land.
    The precise sense varies a little in different localities.

    Etymology: Of obscure origin. Some dialects have the form sling; further variations are slanget (slanket) and slinget (slinket).

    Thus the OED does not suggest that Slang, n, 1a may be derived from Slang, n, 2. Caleb may well be correct with his hypothesis, but the OED neither supports nor rejects it as of today.

  6. Graeme Hirst said,

    October 29, 2023 @ 12:58 pm

    One of the grad students in my department, Zhewei Sun, has been studying AI/NLP models of slang identification and generation for several years. His publications on the topic are listed here: Google Scholar

  7. Stephen Goranson said,

    October 29, 2023 @ 1:17 pm

    Another possibility is that slang/cant came from slang/cannon.

  8. David Morris said,

    October 29, 2023 @ 2:31 pm

    One student told me that someone had told him that 'slang' means 's(treet) lang(uage)'. That didn't sound right to me, but without immediate access to a suitable dictionary, I had to say "I don't think so, but I will check".

  9. John C Swindle said,

    October 29, 2023 @ 6:20 pm

    Why isn't "slang" the past tense of "sling," like ring/rang/rung, sing/sang/sung, or ding/dang/dung? No, wait, maybe not ding/dang/dung.

  10. Seth said,

    October 29, 2023 @ 6:33 pm

    Sorry, while this statement is technically true – "Language models, in the most basic sense, represent our 26-letter alphabet in strings of numbers" – it is so far from conveying useful and relevant understanding as to put me off from the rest of the article. It conveys to me that the author is way out of their depth. The rest of that paragraph is similar.

    I think this is all re-iterating the idea that LLM's don't have a model of the world, which I certainly would agree is a very important insight. But I'd say the author is struggling to have a model of LLM's, and it's not clear to me which of them is worse at the task they're attempting.

    It might be interesting to test specifications more specific than "modern slang" (which group?). African-American Vernacular English? Jive? Valley Girl? Hipster?

  11. Jarek Weckwerth said,

    October 30, 2023 @ 10:27 am

    Again, spot on, Seth!

  12. GH said,

    October 30, 2023 @ 3:36 pm

    @Dan Milton,

    I think you're misreading the Wiktionary article. It gives the origin of "slang" in the language sense as:

    1756, meaning "special vocabulary of tramps or thieves", origin unknown. Possibly derived from a North Germanic source, related to Norwegian Nynorsk slengenamn (“nickname”), slengja kjeften (“to abuse verbally”, literally “to sling one's jaw”), related to Icelandic slengja (“to sling, throw, hurl”), Old Norse slyngva (“to sling”). Not believed to be connected with language or lingo.

    The bit you quoted is not the proposed origin of "slang," but the definition of a different sense of the word.

    Admittedly the structure of the article is pretty misleading, since it uses "Etymology [N]" as the section header for each sense [N] that has a separate derivation, even when there is no explicit etymology offered for that sense. I believe this is the standard format (or at least one common format) on Wiktionary, even though it's very counterintuitive.

  13. GH said,

    October 31, 2023 @ 6:01 am

    Ah, I see now that I misread the comment, and that it was actually criticizing a claim in the quoted article, not Wiktionary. Apologies.

  14. rpsms said,

    November 1, 2023 @ 12:59 pm

    Definitely reminds me of the "fellow kids meme"

RSS feed for comments on this post