Grammatical intuition of ChatGPT
« previous post |
Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople, Zhuang Qiu, Xufeng Duan & Zhenguang G. Cai, Humanities and Social Sciences Communications 12, no. 617 (May 6, 2025).
Abstract
Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (link concealed to ensure anonymity) presents the first large-scale investigation of ChatGPT’s grammatical intuition, building upon a previous study that collected laypeople’s grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse et al., 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgment of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.
Introduction
The technological progression within artificial intelligence, especially when it comes to the realm of natural language processing, has ignited significant discussions about how closely large language models (LLMs), including chatbots like ChatGPT, emulate human linguistic cognition and utilization (Chomsky et al., 2023; Piantadosi, 2024; Binz and Schulz, 2023; Kosinski, 2024; Qiu et al., 2023; Cai et al., 2024). With each technological leap, distinguishing between human linguistic cognition and the capabilities of AI-driven language models becomes even more intricate (Wilcox et al., 2022; Van Schijndel and Linzen, 2018; Futrell et al., 2019). This leads scholars to query if these LLMs genuinely reflect human linguistic nuances or merely reproduce them on a cosmetic level (e.g., Duan, et al., 2024, 2025; Wang et al., 2024). This research delves deeper into the congruencies and disparities between LLMs and humans, focusing primarily on their instinctive understanding of grammar. In three preregistered experiments, ChatGPT was asked to provide grammaticality judgments in different formats for over two thousand sentences with diverse structural configurations. We compared ChatGPT’s judgments with judgments from laypeople and linguists to map out any parallels or deviations.
The ascent of LLMs has been nothing short of remarkable, displaying adeptness in a plethora of linguistic challenges, including discerning ambiguities (Ortega-Martín, 2023), responding to queries (Brown et al., 2020), and transcribing across languages (Jiao et al., 2023). Interestingly, while these models weren’t inherently designed with a hierarchical syntactical structure specifically for human languages, they have shown the capability to discern complex filler-gap dependencies and develop incremental syntactic interpretations (Wilcox et al., 2022; Van Schijndel and Linzen, 2018; Futrell et al., 2019). But the overarching question lingers: Do LLMs genuinely mirror humans in terms of linguistic cognition? Chomsky, Roberts, and Watumull (2023) have been vocal about the inherent discrepancies between how LLMs and humans perceive and communicate. Yet, other scholars like Piantadosi (2024) hold a contrasting view, positioning LLMs as genuine reflections of human linguistic cognition.
Conclusion
In conclusion, this research has undertaken a comprehensive investigation into the alignment of grammatical knowledge between ChatGPT, laypeople, and linguists, shedding light on the capabilities and limitations of AI-driven language models in approximating human linguistic intuitions. The findings indicate significant correlations between ChatGPT and both laypeople and linguists in various grammaticality judgment tasks. This study also reveals nuanced differences in response patterns, influenced significantly by the specific task paradigms employed. This study contributes to the ongoing discourse surrounding the linguistic capabilities of artificial intelligence and the nature of linguistic cognition in humans, calling for further exploration of the evolving landscape of linguistic cognition in humans and artificial intelligence.
The authors maintain a strict division between grammatical judgements of lay people and linguists. Most of the time there is no substantial distinction between the two. In general, I would say that the authors do not make clear the purpose of differentiating between lay and professional decisions on grammaticality and how it helps to evaluate the ability of LLMs like ChatGPT to attain human-like grammatical intuition.
I fear that we may be on a slippery slope in our ability to recognize a dichotomy between man and machine. Will we be able to pull the plug on HAL before it's too late? Or, not to worry, are we willy-nilly evolving into a new, hybrid "life" form?
Selected readings
- "Analyzing ChatGPT’s ability as a grammar fixer", Georgetown Legal English Blog (2/23/23)
- "ChatGPT writes VHM" (2/28/23) — with useful bibliography
[h.t. Ted McClure]
SCF said,
May 19, 2025 @ 10:59 am
Have only skimmed but initial reactions:
1) Odd that they don't cite either:
Dentella et al. (2023?) Arxiv, Testing AI on language comprehension tasks reveals
insensitivity to underlying meaning
Hu et al. (2024) PNAS, Language models align with human judgments on key grammatical constructions
2) Odd that they don't specify which GPT version they use, writing only "we procured responses from the ChatGPT version dated Feb 13" and "Feb 13 version" without a year given