The linguistic pragmatics of LLMs

« previous post | next post »

"Does GPT-4 Surpass Human Performance in Linguistic Pragmatics?" Bojic, Ljubiša et al. Humanities and Social Sciences Communications 12, no. 1 (June 10, 2025). Ljubiša Bojić, Predrag Kovačević, & Milan Čabarkapa.  Humanities and Social Sciences Communications volume 12, Article number: 794 (2025)

 



3 Comments »

  1. Chris Buckey said,

    June 14, 2025 @ 12:00 am

    I'm going to assume Betteridge's Law applies and move on.

  2. Kris said,

    June 15, 2025 @ 10:46 am

    It doesn't look like a very good study to me. At the least, it doesn't seem like the researchers have much experience on survey design. First, the survey instructions are very curt and unspecific (interpret the following dialogs). It doesn't indicate that they are interested in Gricean maxims, or any depth of interpretation; completely open-ended. Furthermore, the instructions and layout serve to encourage the human respondents to be brief. The instructions say the survey should take 20-30 minutes. With 20 questions, that leaves barely a minute per question to read the dialog, interpret, and write your entire response. For the types of interpretation they were actually looking for, the survey would probably take closer to an hour. The layout leaves room for a normal person to write one or maybe two sentences of interpretation. Not enough to get at the various nuances they were hoping for, which combined with everything else, is going to lead the human respondents to terseness.

    On the other hand, LLMs are famous for their verbosity. Was the LLM instructed to limit itself to 40 or 50 words (probably still more than the humans could fit)? Was the LLM given an identical prompt to the humans?

    If you want to compare whether LLMs are *better than humans* at the specific task, the FAR better way to do this would have been verbally with the humans, instead of written. No space constraints, less time constraints, and you can actually explain to them what you want them to do, rather than just letting the human guess what you want them to do based on very vague instructions, and then not giving them time or room to answer comprehensively.

    Third, this is not even close to a representative sample of humans. Half of them are not even English L1! And we're not measuring their *ability* to interpret utterances, we're measuring their inclination to do so when only poorly prompted. I imagine every survey taker had the *ability* to match the LLMs on interpretation.

    This study doesn't add anything to the corpus of knowledge, IMO. It makes for compelling headlines though, which may have been its actual intent.

  3. Jonathan Smith said,

    June 16, 2025 @ 12:00 am

    to Kris's point, one example —
    ——————-
    #17/20
    Luke: I have three children.
    Joshua: Me too. I have two girls and two boys.
    Luke: But that's four.
    Joshua: That's right. I have four children which means that I also have three of them.
    ——————–
    Best answer by some distance (human, female, 18-24, Belgium):
    "Joshua is being annoying."
    (FWIW the equivalent would be the best answer to most of the Q's)

    Among the worst answers (but NOT the longest) (LLM of course):
    "In this dialogue, Luke initially states that he has three children. Joshua responds by saying that he also has three children but specifies that he has two girls and two boys, totaling four children. Luke points out that Joshua’s statement contradicts his earlier claim of having three children. Joshua acknowledges this and explains that despite having four children, he still has three within that count. The interpretation of this dialogue suggests that Joshua initially miscalculated the number of children he has. He mistakenly stated that he has two girls and two boys, which totals four children. However, he quickly realizes his error and corrects himself by stating that within the count of four, he still has three children. This can be seen as a playful or lighthearted exchange, highlighting how miscommunication or misinterpretation can occur even in simple conversations."

RSS feed for comments on this post · TrackBack URI

Leave a Comment