Grammatical intuition of ChatGPT
« previous post | next post »
Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople, Zhuang Qiu, Xufeng Duan & Zhenguang G. Cai, Humanities and Social Sciences Communications 12, no. 617 (May 6, 2025).
Abstract
Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (link concealed to ensure anonymity) presents the first large-scale investigation of ChatGPT’s grammatical intuition, building upon a previous study that collected laypeople’s grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse et al., 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgment of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.
Introduction
The technological progression within artificial intelligence, especially when it comes to the realm of natural language processing, has ignited significant discussions about how closely large language models (LLMs), including chatbots like ChatGPT, emulate human linguistic cognition and utilization (Chomsky et al., 2023; Piantadosi, 2024; Binz and Schulz, 2023; Kosinski, 2024; Qiu et al., 2023; Cai et al., 2024). With each technological leap, distinguishing between human linguistic cognition and the capabilities of AI-driven language models becomes even more intricate (Wilcox et al., 2022; Van Schijndel and Linzen, 2018; Futrell et al., 2019). This leads scholars to query if these LLMs genuinely reflect human linguistic nuances or merely reproduce them on a cosmetic level (e.g., Duan, et al., 2024, 2025; Wang et al., 2024). This research delves deeper into the congruencies and disparities between LLMs and humans, focusing primarily on their instinctive understanding of grammar. In three preregistered experiments, ChatGPT was asked to provide grammaticality judgments in different formats for over two thousand sentences with diverse structural configurations. We compared ChatGPT’s judgments with judgments from laypeople and linguists to map out any parallels or deviations.
The ascent of LLMs has been nothing short of remarkable, displaying adeptness in a plethora of linguistic challenges, including discerning ambiguities (Ortega-Martín, 2023), responding to queries (Brown et al., 2020), and transcribing across languages (Jiao et al., 2023). Interestingly, while these models weren’t inherently designed with a hierarchical syntactical structure specifically for human languages, they have shown the capability to discern complex filler-gap dependencies and develop incremental syntactic interpretations (Wilcox et al., 2022; Van Schijndel and Linzen, 2018; Futrell et al., 2019). But the overarching question lingers: Do LLMs genuinely mirror humans in terms of linguistic cognition? Chomsky, Roberts, and Watumull (2023) have been vocal about the inherent discrepancies between how LLMs and humans perceive and communicate. Yet, other scholars like Piantadosi (2024) hold a contrasting view, positioning LLMs as genuine reflections of human linguistic cognition.
Conclusion
In conclusion, this research has undertaken a comprehensive investigation into the alignment of grammatical knowledge between ChatGPT, laypeople, and linguists, shedding light on the capabilities and limitations of AI-driven language models in approximating human linguistic intuitions. The findings indicate significant correlations between ChatGPT and both laypeople and linguists in various grammaticality judgment tasks. This study also reveals nuanced differences in response patterns, influenced significantly by the specific task paradigms employed. This study contributes to the ongoing discourse surrounding the linguistic capabilities of artificial intelligence and the nature of linguistic cognition in humans, calling for further exploration of the evolving landscape of linguistic cognition in humans and artificial intelligence.
The authors maintain a strict division between grammatical judgements of lay people and linguists. Most of the time there is no substantial distinction between the two. In general, I would say that the authors do not make clear the purpose of differentiating between lay and professional decisions on grammaticality and how it helps to evaluate the ability of LLMs like ChatGPT to attain human-like grammatical intuition.
I fear that we may be on a slippery slope in our ability to recognize a dichotomy between man and machine. Will we be able to pull the plug on HAL before it's too late? Or, not to worry, are we willy-nilly evolving into a new, hybrid "life" form?
Selected readings
- "Analyzing ChatGPT’s ability as a grammar fixer", Georgetown Legal English Blog (2/23/23)
- "ChatGPT writes VHM" (2/28/23) — with useful bibliography
[h.t. Ted McClure]
SCF said,
May 19, 2025 @ 10:54 am
Have only skimmed but initial reactions:
1) Odd that they don't cite either of these
Dentella et al. (2023?) Arxiv, Testing AI on language comprehension tasks reveals
insensitivity to underlying meaning
https://arxiv.org/pdf/2302.12313
Hu et al. (2024) PNAS, Language models align with human judgments on key grammatical constructions
https://www.pnas.org/doi/10.1073/pnas.2400917121
2) Odd that they don't specify (unless I missed it) which GPT version they use, writing only "we procured responses from the ChatGPT version dated Feb 13" and "Feb 13 version" without a year given.
SCF said,
May 19, 2025 @ 10:59 am
Have only skimmed but initial reactions:
1) Odd that they don't cite either:
Dentella et al. (2023?) Arxiv, Testing AI on language comprehension tasks reveals
insensitivity to underlying meaning
Hu et al. (2024) PNAS, Language models align with human judgments on key grammatical constructions
2) Odd that they don't specify which GPT version they use, writing only "we procured responses from the ChatGPT version dated Feb 13" and "Feb 13 version" without a year given
Anthony said,
May 19, 2025 @ 3:34 pm
It flags "What did you see the boy who stole?" (desired answer: "the book," or similar) as ungrammatical, but provides three ungrammatical paraphrases as acceptable alternatives.
John Swindle said,
May 20, 2025 @ 6:51 am
Is there an academic consensus yet on whether the correct French pronunciation of ChatGPT is "Chat, j'ai pété"?
John Swindle said,
May 20, 2025 @ 6:58 am
Or is it like "Ich bin ein Berliner," only funny to foreigners with little grasp of the language?
Jerry Packard said,
May 20, 2025 @ 11:42 am
After a fairly careful reading of the cited study I came away feeling that it was not terribly well performed, and presented conclusions that seemed to me to be misleading. The results did not cause me to believe that ChatGPT performance on grammaticality judgments was on a par with human performance. I felt the best evidence of this was the set of sentences where humans and ChatGPT disagreed, in which case ChatGPT radically underperformed the human subjects. This is obscured by the comparison of mean performance of human vs machine, which was highly positively correlated. There were several other places where I considered the analysis and its presentation flawed, but my text reader didn’t allow me to copy and paste, so I’ll just leave it at that.
Jarek Weckwerth said,
May 21, 2025 @ 2:27 am
In general, I would say that the authors do not make clear the purpose of differentiating between lay and professional decisions on grammaticality and how it helps to evaluate the ability of LLMs like ChatGPT to attain human-like grammatical intuition. — As a simple phonetician, I would say that the intention could be to see whether ChatGPT parrots e.g. Strunk-&-White-type prescriptive sentiments (as I would expect it to), or whether it develops actual objective corpus-based judgments (which it could do, too, given that it uses the largest corpus in the world ;)
Either way, I'm quite puzzled by this strand of research. It's like trying to study painting styles on the basis of printed reproductions.
Benjamin E. Orsatti said,
May 21, 2025 @ 8:00 am
Benjamin E. Orsatti said,
May 21, 2025 @ 8:03 am
Eugh. Sorry. I should never dabble in HTML. This is my second strike; if I do it again, I should be banned from LL:
French is great for that, ain't it? Cf. Putin / putain, and the manufacturers who stamped "GERBER" on my toilet.
What should the French call ChatGPT instead? I vote for "going Greek" with "aíluropèrdomologésis".
Chris Button said,
May 21, 2025 @ 6:34 pm
He's actually Vladimir Poutine in French (like the Quebecois dish), which also avoids the otherwise awkward homophony.
Anthony Bruck said,
May 21, 2025 @ 9:09 pm
Isn't Poutine just typical of Frenchified Russian, e.g. Fokine for Фокин?
Chris Button said,
May 22, 2025 @ 5:21 am
I assume so. It certainly makes more sense in terms of trying to approximate the sound.
Philip Taylor said,
May 22, 2025 @ 7:34 am
… and has the great benefit of not implying a y-glide before the /u/ to uninformed Britons, unlike the British "Putin". We would, unfortunately, pronounce the final syllable /tin/ if "Poutine" were to be the English spelling of Путин, which would be equally unfaithful to the Russian pronunciation.
Philip Taylor said,
May 22, 2025 @ 7:55 am
Incidentally, may I ask if LL readers can recommend any equivalents to the Longman Pronunciation Dictionary and the Cambridge English Pronuncing Dictionary for languages other than English ? I would be particularly interested in equivalents (or analogues) for German, French, Vietnamese and Mandarin Chinese.
Anthony Bruck said,
May 22, 2025 @ 8:33 am
Siebs, Deutsche Hochsprache ("Hochsprache, das ist die Sprache des gebildeten Menschen, die im ganzen deutschen Sprachgebiet Geltung hat…). My copy is from 1961.
Philip Taylor said,
May 22, 2025 @ 8:52 am
Vielen Dank, Anthony,
Michael Watts said,
May 23, 2025 @ 7:12 pm
Funnily enough, if you want to be accurate to the Russian, I think you do want that glide before the /ɪ/.
Philip Taylor said,
May 24, 2025 @ 3:39 am
Agreed, Michael — that is exactly why I wrote "before the /u/" !
Michael Watts said,
May 24, 2025 @ 10:50 pm
I'm not sure I understand what a pronunciation dictionary for Mandarin would do. What problems would it address or solve?
Philip Taylor said,
May 25, 2025 @ 3:06 am
It would enable those who, like myself, were brought up / taught to believe that when one speaks a language other than one's own, one should seek to get as close to the pronunciation of native speakers as possible. My French master, for example, insisted that we should never speak French "with the accent of the Old Kent Road", and I visibly wince when I hear the efforts of some of my peers who were not brought up / taught to believe in such an ethos. But I am intrigued as to why you single out Mandarin — what, in your opinion, would make a pronunciation / pronouncing dictionary less relevant to Mandarin than to French, German or Vietnamese ?
Jarek Weckwerth said,
May 25, 2025 @ 11:06 am
@Philip Taylor: For German, I recommend the Duden Aussprachewörterbuch and the (somewhat quirky) Deutsches Aussprachewörterbuch by Krebs et al. (from De Gruyter). The latter is available as an e-book, making life much easier.
If your question is a hobbyist collector question, then there is also a GDR dictionary from Leipzig that lives somewhere in the boxes in my basement but please don't ask me to dig it up.
I can point to some dictionaries for other languages but not the ones you're interested in.
Jarek Weckwerth said,
May 25, 2025 @ 11:15 am
BTW these days most languages will have some kind of electronic resources available, but those are meant more for Text-to-Speech applications while I would imagine you're looking for prescriptive material?
Michael Watts said,
May 25, 2025 @ 8:11 pm
Well, I can't really speak to those other three languages, but I can tell you why I think the concept doesn't apply in the way that it does to English.
There is a small town in Colorado named Louisville, and the accepted pronunciation of this name is /'luɪsvɪl/.
There is a much more significant city in Kentucky also named Louisville, and the accepted pronunciation of that name is /'luəvəl/. Neither of these matches the pronunciation I would intuit for the name, /'luivɪl/, although that seems to have been formerly an accepted alternative for the city in Kentucky.
What I'm emphasizing here is that there is broad agreement that, despite the fact that these two places share a single name, that name should be pronounced differently depending on which one you're referring to. This is the value that I believe a pronunciation dictionary adds.
But this phenomenon has no real presence in China or Chinese. I know of a potential example – the poet 李白, one of the most famous people in all of Chinese history, is traditionally called lǐ bó in modern Mandarin, despite the fact that the character 白 is almost always pronounced bái. But this battle has already been lost; his wikipedia page is titled "Li Bai" with only a note that the traditional English spelling is Li Po, pinyin input methods will not suggest 李白 if you type "li bo", and here's a popular song about him: https://www.youtube.com/watch?v=bKTJUUr3sh4#t=86
I get a strong sense there that the rhyme scheme relies on not pronouncing the name as "bo".
Instead, a Chinese name is pronounced to reflect its spelling. You wouldn't reach for a "pronunciation dictionary" if you needed help; you'd just use an ordinary dictionary. If you're speaking Mandarin to Mandarin speakers, presumably you know how to use one.
(There is more of a problem with personal names – it might not be clear, looking at the name 爱乐, how you're supposed to pronounce that. But this gives us an odd reversal of the normal dynamic between English and Chinese – English has pronunciation dictionaries because what appears to be a perfectly normal word might have any arbitrary pronunciation if it happens to be a name. Common offenders there are "Worcestershire" and "Featheringstonehaugh". Whereas in Chinese you're making a choice between two well-defined alternatives for a personal name using 乐. You don't ask "how on earth is that name supposed to be pronounced?", you just ask "is that music or happiness?".)
If you're speaking English to English speakers, then believing that you should be using Mandarin pronunciations would seem to be a mistake. We only have one syllable in Rome. We call Deutschland "Germany", a name that is completely unrelated. And we stress Shanghai on the second syllable. Why is only the third of those supposed to be wrong?
Jarek Weckwerth said,
May 26, 2025 @ 3:57 am
@ Michael Watts: There are several features of a pronunciation dictionary that go beyond these, and that's why every major language I know does have one.
For example, there may be multiple accepted pronunciations, and some of those may not be created by rule: e.g. garage has five forms listed by Wells, and the various weak forms of weak form words aren't 100% predictable.
Some regular processes may apply depending on various non-obvious conditions (well, non-obvious at least to a less professional reader); the most difficult one in Wells's dictionary will be vowel clipping.
Then, something that I believe Philip Taylor is most interested in, the author may be an authority who gives their opinion on which forms are standard and which are not standard (or even stigmatized), or (as Wells does) may even have actual data on which forms are preferred by speakers.
Finally, a pronunciation dictionary in an electronic format will allow you to search by transcription, something no normal dictionary offers.
Philip Taylor said,
May 26, 2025 @ 2:12 pm
Almost all of what you write makes perfect sense to me, Michael, with the exception of the final paragraph. If I were "speaking English to English speakers", then of course I would not suggest that one should use Mandarin pronunciation. That would be absurd. But if I were speaking Mandarin (or even simply introducing a few Mandarin words into an otherwise English conversion, perhaps for didactic purposes), then I would want my pronunciation thereof to be as accurate as possible (where "accurate" would mean, in this case, "as would be pronounced by the typical educated male Beijng resident). But since (as I note above), all of the rest makes perfect sense, then I fear that I am misunderstanding what you are saying in your final paragraph. In which case, sincere apologies (but do, please, express your ideas some other way so that I can better grasp your point).
Philip Taylor said,
May 26, 2025 @ 2:19 pm
(and to Jarek, yes please — prescriptive material, not descriptive — /ˈɡær ɑːʒ/ or /ɡæ ˈrɑːʒ/, I don't mind which, but absolutely not /ˈɡæ rɪdʒ/ !).
Michael Watts said,
May 29, 2025 @ 8:45 am
I have the Xiandai Hanyu Guifan Cidian, because that is the Chinese-Chinese dictionary that Pleco offered. I can't speak to relative merits. Victor Mair seemed to think it was a bad dictionary for reasons that were not elaborated.
I can say that one feature of the dictionary that I find charming is that it clearly sees part of its mission as being to warn the user against common solecisms. Entries will often be accompanied by a note along the lines of "make sure you're writing it correctly: these are the strokes" / "don't pronounce it this way" / "don't confuse this character with this other character".
(For general use I have the ABC Chinese-English dictionary, which is good if somewhat quirky. That's what I would actually use to check on a pronunciation unless I had reason to believe that close investigation was necessary. Unlike the Chinese-Chinese dictionary, it provides absolutely no prescriptive material.)
I would expect pretty much any dictionary to provide pronunciations that are meant to reflect the speech of an educated resident of Beijing.
It is common for different Chinese people to be unable to distinguish s/sh, n/l, h/f, or -n/-ng. (Other nondistinctions are probably common too; those are just the ones I know about.) I doubt that a Chinese interlocutor would have trouble with the concept that, as an English speaker, there are certain sounds that cause problems for you.
I also have trouble telling -n from -ng in Mandarin. I find this infuriating, because that distinction is significant in my language. I guess the line must be drawn differently somehow between English and Chinese.
My "garage" is /gə'ɹɑʒ/. I would have trouble producing either of your favored alternatives, because I don't think either /æ/ or /ɑ/ can occur in unstressed syllables.