{"id":64496,"date":"2024-06-10T05:42:54","date_gmt":"2024-06-10T10:42:54","guid":{"rendered":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=64496"},"modified":"2024-06-10T07:01:30","modified_gmt":"2024-06-10T12:01:30","slug":"ai-deception","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=64496","title":{"rendered":"AI deception?"},"content":{"rendered":"<p>Noor Al-Sibai, \"<a href=\"https:\/\/futurism.com\/ai-systems-lie-deceive\" target=\"_blank\" rel=\"noopener\">AI Systems Are Learning to Lie and Deceive, Scientists Find<\/a>\", <em>Futurism<\/em> 6\/7\/2024:<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000080;\">AI models are, apparently, getting better at lying on purpose.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000080;\">Two recent studies \u2014 one <a style=\"color: #000080;\" href=\"https:\/\/www.pnas.org\/doi\/full\/10.1073\/pnas.2317967121\" target=\"_blank\" rel=\"noopener\"><u>published this week in the journal PNAS<\/u><\/a> and the other <a style=\"color: #000080;\" href=\"https:\/\/www.cell.com\/action\/showPdf?pii=S2666-3899%2824%2900103-X\" target=\"_blank\" rel=\"noopener\"><u>last month in the journal Patterns<\/u><\/a> \u2014 reveal some jarring findings about large language models (LLMs) and their ability to lie to or deceive human observers on purpose.<\/span><\/p>\n<p><!--more--><br \/>\nThat adverbial phrase \"on purpose\" is just the first of many ways that the the article and the cited papers attribute communicative intentionality and \"theory of mind\" to chatbots, without any serious discussion of the relevant philosophical problems.<\/p>\n<p>The whole question of communication and deception in such exchanges reminds me of the literature on the analogous issues in animal behavior, for example Dorothy Cheney and Robert M. Seyfarth. \"<a href=\"https:\/\/www.jstor.org\/stable\/4534456\" target=\"_blank\" rel=\"noopener\">Vervet monkey alarm calls: Manipulation through shared information?<\/a>\", 1985, and their 1990 book <a href=\"https:\/\/www.amazon.com\/How-Monkeys-See-World-Another\/dp\/0226102467\/\" target=\"_blank\" rel=\"noopener\"><em>How Monkeys See the World<\/em><\/a>. One relevant passage from the book:<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #ff0000;\">To attribute beliefs, knowledge and emotions to both oneself and others is to have what Premack and Woodruff (1978) term a\u00a0<i>theory of mind<\/i>.\u00a0 A theory of mind is a theory because, unlike behavior, mental states are not directly observable<\/span><br \/>\n<span style=\"color: #ff0000;\">[. . .]<\/span><br \/>\n<span style=\"color: #ff0000;\">[E]ven without a theory of mind, monkeys are skilled social strategists. It is not essential to attribute thoughts to others to recognize that other animals have social relationships or to predict what other individuals will do and with whom they will do it. Moreover, it is clearly possible to deceive, inform, and convey information to others without attributing mental states to them.<\/span><br \/>\n<span style=\"color: #ff0000;\">[. . .]<\/span><br \/>\n<span style=\"color: #ff0000;\">However, the moment that an individual becomes capable of recognizing that her companions have beliefs, and that these beliefs may be different from her own, she becomes capable of immensely more flexible and adaptive behavior.<\/span><br \/>\n<span style=\"color: #ff0000;\">[. . .]<\/span><br \/>\n<span style=\"color: #ff0000;\">Most of the controversy surrounding animal communication. . . centers on second- and third-order intentionality &#8212; whether animals are capable of acting as if they want others to believe that they know or believe something. . . Higher-order intentionality implies the ability to attribute knowledge, beliefs and emotions to others. Attribution, in turn, demands some ability to represent simultaneously two different states of mind. To do this an individual must recognize that he has knowledge, that others have knowledge, and that there can be a discrepancy between his own knowledge and theirs.<\/span><\/p>\n<p>Because chatbots have very different <a href=\"https:\/\/observer.com\/2024\/02\/metas-a-i-chief-yann-lecun-explains-why-a-house-cat-is-smarter-than-the-best-a-i\/\" target=\"_blank\" rel=\"noopener\">strengths and weaknesses<\/a> from animals &#8212; and different bots can have different architectures &#8212; the issues are going to work out differently. But I think it's worth keeping the philosophical history in mind. Also, the role of <a href=\"https:\/\/www.danzettwoch.com\/comics\/deadlock.html\" target=\"_blank\" rel=\"noopener\">game theory<\/a> :-)&#8230;<\/p>\n<p><strong>Update &#8212;<\/strong> For many decades before the current LLM \"AI\" developments, people have been writing (old-fashioned) programs to play games like poker and bridge, where the human versions involve concepts of communication, bluff, deception, and manipulation. And no one anthropomorphized those programs in the same way. That doesn't mean that the current AI anthropomorphization is wrong, just that there's a lot more to consider than just how the programs behave.<\/p>\n<p>A few relevant past posts:<\/p>\n<p>\"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/000150.html\" target=\"_blank\" rel=\"noopener\">Conversational game theory: the cartoon version<\/a>\", 11\/24\/2003<br \/>\n\"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/000471.html\" target=\"_blank\" rel=\"noopener\">Desires, beliefs, conversations<\/a>\", 2\/18\/2004<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=2041\" target=\"_blank\" rel=\"noopener\">'Chimps have tons to say but can't say it'<\/a>\", 1\/11\/2010<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=2444\" target=\"_blank\" rel=\"noopener\">Theory of mind in the comics<\/a>\", 7\/14\/2010<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=2448\" target=\"_blank\" rel=\"noopener\">Inscriptional theory of mind, again<\/a>\", 7\/15\/2010<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=7201\" target=\"_blank\" rel=\"noopener\">Theory of mind hacks<\/a>\", 9\/24\/2013<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=37039\" target=\"_blank\" rel=\"noopener\">Deadlock<\/a>\", 3\/2\/2018<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=24808\" target=\"_blank\" rel=\"noopener\">Theory of mind<\/a>\", 3\/22\/2018<br \/>\n\"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=52804\" target=\"_blank\" rel=\"noopener\">'Cognitive fossils' and the Paleo Mindscape<\/a>\", 11\/25\/2021<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Noor Al-Sibai, \"AI Systems Are Learning to Lie and Deceive, Scientists Find\", Futurism 6\/7\/2024: AI models are, apparently, getting better at lying on purpose. Two recent studies \u2014 one published this week in the journal PNAS and the other last month in the journal Patterns \u2014 reveal some jarring findings about large language models (LLMs) [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[25,322,46],"tags":[],"class_list":["post-64496","post","type-post","status-publish","format-standard","hentry","category-animal-communication","category-artificial-intelligence","category-pragmatics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/64496","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=64496"}],"version-history":[{"count":9,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/64496\/revisions"}],"predecessor-version":[{"id":64505,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/64496\/revisions\/64505"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=64496"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=64496"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=64496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}