AI deception?

« previous post | next post »

Noor Al-Sibai, "AI Systems Are Learning to Lie and Deceive, Scientists Find", Futurism 6/7/2024:

AI models are, apparently, getting better at lying on purpose.

Two recent studies — one published this week in the journal PNAS and the other last month in the journal Patterns — reveal some jarring findings about large language models (LLMs) and their ability to lie to or deceive human observers on purpose.


That adverbial phrase "on purpose" is just the first of many ways that the the article and the cited papers attribute communicative intentionality and "theory of mind" to chatbots, without any serious discussion of the relevant philosophical problems.

The whole question of communication and deception in such exchanges reminds me of the literature on the analogous issues in animal behavior, for example Dorothy Cheney and Robert M. Seyfarth. "Vervet monkey alarm calls: Manipulation through shared information?", 1985, and their 1990 book How Monkeys See the World. One relevant passage from the book:

To attribute beliefs, knowledge and emotions to both oneself and others is to have what Premack and Woodruff (1978) term a theory of mind.  A theory of mind is a theory because, unlike behavior, mental states are not directly observable
[. . .]
[E]ven without a theory of mind, monkeys are skilled social strategists. It is not essential to attribute thoughts to others to recognize that other animals have social relationships or to predict what other individuals will do and with whom they will do it. Moreover, it is clearly possible to deceive, inform, and convey information to others without attributing mental states to them.
[. . .]
However, the moment that an individual becomes capable of recognizing that her companions have beliefs, and that these beliefs may be different from her own, she becomes capable of immensely more flexible and adaptive behavior.
[. . .]
Most of the controversy surrounding animal communication. . . centers on second- and third-order intentionality — whether animals are capable of acting as if they want others to believe that they know or believe something. . . Higher-order intentionality implies the ability to attribute knowledge, beliefs and emotions to others. Attribution, in turn, demands some ability to represent simultaneously two different states of mind. To do this an individual must recognize that he has knowledge, that others have knowledge, and that there can be a discrepancy between his own knowledge and theirs.

Because chatbots have very different strengths and weaknesses from animals — and different bots can have different architectures — the issues are going to work out differently. But I think it's worth keeping the philosophical history in mind. Also, the role of game theory :-)…

Update — For many decades before the current LLM "AI" developments, people have been writing (old-fashioned) programs to play games like poker and bridge, where the human versions involve concepts of communication, bluff, deception, and manipulation. And no one anthropomorphized those programs in the same way. That doesn't mean that the current AI anthropomorphization is wrong, just that there's a lot more to consider than just how the programs behave.

A few relevant past posts:

"Conversational game theory: the cartoon version", 11/24/2003
"Desires, beliefs, conversations", 2/18/2004
"'Chimps have tons to say but can't say it'", 1/11/2010
"Theory of mind in the comics", 7/14/2010
"Inscriptional theory of mind, again", 7/15/2010
"Theory of mind hacks", 9/24/2013
"Deadlock", 3/2/2018
"Theory of mind", 3/22/2018
"'Cognitive fossils' and the Paleo Mindscape", 11/25/2021



11 Comments

  1. Gregory Kusnick said,

    June 10, 2024 @ 1:16 pm

    A theory of mind is a theory because, unlike behavior, mental states are not directly observable

    Color me confused. Are "theories" of observable behavior then not actually theories, according to Cheney and Seyfarth?

  2. Mark Liberman said,

    June 10, 2024 @ 1:31 pm

    @Gregory Kusnick" "Are "theories" of observable behavior then not actually theories, according to Cheney and Seyfarth?"

    The term "theory of mind" is originally due to David Premack and Guy Woodruff, "Does the chimpanzee have a theory of mind?" (1978):

    Abstract: An individual has a theory of mind if he imputes mental states to himself and others. A system of inferences of this kind is properly viewed as a theory because such states are not directly observable, and the system can be used to make predictions about the behavior of others. As to the mental states the chimpanzee may infer, consider those inferred by our own species, for example, purpose or intention, as well as knowledge, belief, thinking, doubt, guessing, pretending, liking, and so forth. To determine whether or not the chimpanzee infers states of this kind, we showed an adult chimpanzee a series of videotaped scenes of a human actor struggling with a variety of problems. Some problems were simple, involving inaccessible food as in the original Kohler problems; others were more complex, involving an actor unable to extricate himself from a locked cage, shivering because of a malfunctioning heater, or unable to play a phonograph because it was unplugged. With each videotape the chimpanzee was given several photographs, one a solution to the problem, such as a stick for the inaccessible bananas, a .key for the locked up actor, a lit wick for the malfunctioning heater. The chimpanzee's consistent choice of the correct photographs can be understood by assuming that the animal recognized the videotape as representing a problem, understood the actor's purpose, and chose alternatives compatible with that purpose.

    First, the difference between "theory" and "observation" is to be interpreted with reference to the animal, not the scientist.

    And second, you should keep in mind that the paper was written by psychologists at a time when behaviorist ideology was still strong.

  3. Gregory Kusnick said,

    June 10, 2024 @ 2:36 pm

    Sure, I understand what a theory of mind is and why it's useful to have one. And if Cheney and Seyfarth (or Premack and Woodruff) had simply said something like "A theory of mind is important because mental states are unobservable", I would have been fine with that.

    But I take your point that in their historical context, they felt the need to justify calling it a theory (even if that justification seems like a non sequitur today).

  4. Jonathan Cohen said,

    June 10, 2024 @ 3:10 pm

    on this bit: "Update — For many decades before the current LLM "AI" developments, people have been writing (old-fashioned) programs to play games like poker and bridge, where the human versions involve concepts of communication, bluff, deception, and manipulation. And no one anthropomorphized those programs in the same way. That doesn't mean that the current AI anthropomorphization is wrong, just that there's a lot more to consider than just how the programs behave."

    just to mention a propos the philosophical history: Dennett in particular made much of the idea that this kind of anthropomorphization in attribution is reasonably common ("at this stage in the game the chess computer wants to get its queen out early/is offering a pawn exchange/etc"), and indeed used this to motivate a quasi-instrumentalist view ("the intentional stance") on which explanatorily successful anthropomorphizing attribution is all it takes for the target to literally have an intentional state.

  5. Philip Taylor said,

    June 10, 2024 @ 3:30 pm

    Jonathan — is there not a very significant difference between "the chess computer wants to get its queen out early" and "the chess computer […] is offering a pawn exchange" ? The former entails volition, the latter is simply stating a fact, although the use of "offering" does rather obscure this.

  6. Jonathan Cohen said,

    June 10, 2024 @ 3:56 pm

    @Phillip Taylor:
    maybe there's a non-intentional state requiring reading of 'offering'…. i take it that that's not the reading Dennett was interested in. as i read him, his idea (take it or leave it) was that a system literally has intentional states just in case it is predictively/explanatorily useful to an attributor to understand its behavior by thinking of it this way. if that view licenses a lot of anthropomorphizing (e.g., "the lightning wants to find the easiest path to the ground"), … well, let's say that he seemed perfectly willing to embrace that consequence of his position. ;>

  7. Philip Taylor said,

    June 10, 2024 @ 4:38 pm

    It seems to me that a computer program written specifically to play chess could, with some justification, be said to "want to get its queen out early" if the program’s creators wrote code specifically intended to search for such possibilities, and if that code was executed prior to the statement being made. If, on the other hand, the code was pure artificial intelligence which its creators (or others) had found could be used to play chess with significant success, then I would feel that "want" in such circumstances would be inappropriate and unjustified. But it is now quite late, and I may well think otherwise tomorrow …

  8. AntC said,

    June 11, 2024 @ 1:25 am

    @PT if the program’s creators wrote code specifically intended …

    I think in careful description, I'd say "the creators want the program to get its queen out early". Attributing "want" to the program, I'd call a loose transferred epithet.

    pure artificial intelligence that merely observed in its corpus a correlation of games where getting the queen out early lead to more likely success, I agree "want" in such circumstances would be inappropriate and unjustified.

    the use of "offering" does rather obscure this

    Yes. To the extent I'd avoid "offering". The computer has moved its pawn to a position it could be taken, but then the capturing piece would itself be taken. No need for the volition-omorphising language. OTOH I take Dennett's point that humans use that kind of language all the time without necessarily attributing mentality to inanimate objects.

    I regard that language as altogether too sloppy for careful philosophical debate. (Dennett seems to want to have his cake and eat it.) And I avoid such language when talking about AI artefacts, because too often AI continues to be spectacularly stupid in entirely human-volition-non-attributable ways.

  9. AntC said,

    June 11, 2024 @ 1:46 am

    The whole question of communication and deception …

    Talking of intent and deception …

    Poe's law is an adage of Internet culture which says that, without a clear indicator of the author's intent, any parodic or sarcastic expression … can be mistaken by some readers for a sincere expression of those views.

    Is there an inverse Poe's Law? I recently had the experience of responding to what seemed an earnestly-intended but poorly justified proposal. I then spotted the proposal was dated April 1st; so deleted my response; and replaced that with a comment on the thread pointing out the date. (To be precise, I was responding to someone who'd revived the proposal from 2 months ago; my comment was to suggest they might be over-enthusiastic.)

    The original proposer appeared and grumped that it had been seriously intended. To be fair, the server's date of April 1st might not have been the date in the locale when the proposal got posted originally. Never the less, upon re-reading I still couldn't tell whether it was intended parodically. And O.P. hasn't volunteered any stronger justification.

  10. Philip Taylor said,

    June 11, 2024 @ 6:44 am

    Ant — « I think in careful description, I'd say "the creators want the program to get its queen out early". Attributing "want" to the program, I'd call a loose transferred epithet. » — no, that was not what I meant. I meant that if the programmers had written code to investigate both the possibility of, and the pros and cons of, getting the queen out early in the light of the current board configuration, and if, having executed that code, the program then ascribed a greater benefit to getting the queen out early in the particular match, and at the particular stage under consideration than the benefit of leaving the queen temporarily undeveloped, then one might reasonably speak of the program "wanting" to get its queen out early …

  11. Philip Taylor said,

    June 11, 2024 @ 1:56 pm

    Sorry, the unclosed <i> elements make the preceding comment hard to read — I will try again, vetting in JSBIN first this time.

    Ant — « I think in careful description, I'd say "the creators want the program to get its queen out early". Attributing "want" to the program, I'd call a loose transferred epithet. » — no, that was not what I meant. I meant that if the programmers had written code to investigate both the possibility of, and the pros and cons of, getting the queen out early in the light of the current board configuration, and if, having executed that code, the program then ascribed a greater benefit to getting the queen out early in the particular match, and at the particular stage under consideration than the benefit of leaving the queen temporarily undeveloped, then one might reasonably speak of the program "wanting" to get its queen out early …

RSS feed for comments on this post