{"id":67198,"date":"2024-12-05T06:06:56","date_gmt":"2024-12-05T11:06:56","guid":{"rendered":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=67198"},"modified":"2024-12-05T06:55:20","modified_gmt":"2024-12-05T11:55:20","slug":"the-letter-equity-task-force","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=67198","title":{"rendered":"The \"Letter Equity Task Force\""},"content":{"rendered":"<p><strong>Previous LLOG coverage:<\/strong> \"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=65667\" target=\"_blank\" rel=\"noopener\">AI on Rs in 'strawberry'<\/a>\", 8\/28\/2024; \"<a href=\"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=66206\" target=\"_blank\" rel=\"noopener\">'The cosmic jam from whence it came'<\/a>\", 9\/26\/2024.<\/p>\n<p><strong>Current satire:<\/strong> Alberto Romero, \"<a href=\"https:\/\/albertoromgar.medium.com\/report-openai-spends-millions-a-year-miscounting-the-rs-in-strawberry-1b45c9fd64cf\" target=\"_blank\" rel=\"noopener\">Report: OpenAI Spends Millions a Year Miscounting the R\u2019s in \u2018Strawberry\u2019<\/a>\", <em>Medium<\/em> 11\/22\/2024.<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000080;\">OpenAI, the most talked-about tech start-up of the decade, convened an emergency company-wide meeting Tuesday to address what executives are calling \u201cthe single greatest existential challenge facing artificial intelligence today\u201d: Why can\u2019t their models count the R\u2019s in <em>strawberry<\/em>?<\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000080;\">The controversy began shortly after the release of GPT-4, on March 2023, when users on Reddit and Twitter discovered the model\u2019s inability to count the R\u2019s in <em>strawberry<\/em>. The responses varied from inaccurate guesses to cryptic replies like, \u201cMore R\u2019s than you can handle.\u201d In one particularly unhinged moment, the chatbot signed off with, \u201cCall me Sydney. That\u2019s all you need to know.\u201d<\/span><\/p>\n<p><!--more--><\/p>\n<p id=\"956e\" class=\"pw-post-body-paragraph ln lo gd lp b lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fw bk\" style=\"padding-left: 40px;\" data-selectable-paragraph=\"\"><span style=\"color: #000080;\">\u201cI kept trying to count the R\u2019s and it just wouldn\u2019t do it,\u201d said one user in a 17-post thread that went viral on Bluesky. \u201cSo I made it count other letters \u2014 T\u2019s, B\u2019s, you name it. No chance. Then it hit me: this thing is eating my letters. Letters today, kids tomorrow. Do we want that risk? It\u2019s dangerous. It\u2019s discriminatory. It\u2019s terrifying. We want our children to live, don\u2019t we?!\u201d<\/span><\/p>\n<p id=\"2908\" class=\"pw-post-body-paragraph ln lo gd lp b lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fw bk\" style=\"padding-left: 40px;\" data-selectable-paragraph=\"\"><span style=\"color: #000080;\">At OpenAI headquarters, CEO Sam Altman struck a serious tone at the meeting, describing the R-counting debacle as a \u201ccrisis of faith\u201d for the AI community. \u201cI also think it\u2019s a stupid question,\u201d Altman admitted. \u201cThere are three R\u2019s. I counted them this morning. But our users keep asking, and we are here to serve their revealed preferences. Can we please stop trying to make these things reason and teach them some basic arithmetic?\u201d<\/span><\/p>\n<p style=\"padding-left: 40px;\" data-selectable-paragraph=\"\"><span style=\"color: #000080;\">Sources inside OpenAI say the company has already allocated significant resources to the issue, including a newly formed independent\u00a0<em class=\"ml\">Letter Equity Task Force<\/em>\u00a0(LETF), led by top researchers who previously trained autonomous vehicles to not discriminate between red and green traffic lights. \u201cThis is bigger than ChatGPT. Bigger than AlphaFold. This is about trust,\u201d said one LETF member. \u201cBecause if we can\u2019t count R\u2019s in\u00a0<em class=\"ml\">strawberry<\/em>, what\u2019s next? Misidentifying bananas? Calling tomatoes a vegetable?\u201d<\/span><\/p>\n<p data-selectable-paragraph=\"\">The (fictional) Letter Equity Task Force has done its job on <em>strawberry<\/em>, as of this morning:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/NewChatGPTstrawberry.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/NewChatGPTstrawberry.png\" width=\"490\" \/><\/a><\/p>\n<p>However, ChatGPT4o still has some letter-counting issues. I <a href=\"https:\/\/chatgpt.com\/share\/675182ff-8190-8011-9bac-731399473d49\" target=\"_blank\" rel=\"noopener\">asked it<\/a> for the number of instances of the letter 'e' in the first sentence of the Declaration of Independence, and it started by giving me word-by-word counts (though oddly leaving out some words). 3 of the first ten word-by-word counts\u00a0 are wrong (with 6 words omitted up to that point):<\/p>\n<ul>\n<li><strong>When<\/strong>: 1<\/li>\n<li><strong>the<\/strong>: 1<\/li>\n<li><strong>Course<\/strong>: 1<\/li>\n<li><strong>events<\/strong>: 2<\/li>\n<li><span style=\"color: #800000;\"><strong>becomes<\/strong>: 3<\/span><\/li>\n<li><span style=\"color: #800000;\"><strong>necessary<\/strong>: 1<\/span><\/li>\n<li><strong>one<\/strong>: 1<\/li>\n<li><strong>people<\/strong>: 2<\/li>\n<li><span style=\"color: #800000;\"><strong>dissolve<\/strong>: 2<\/span><\/li>\n<li><strong>the<\/strong>: 1<\/li>\n<\/ul>\n<p>And there are plenty of other counting errors later in the list, e.g.<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #800000;\"><strong>connected<\/strong>: 3<\/span><br \/>\n<span style=\"color: #800000;\"><strong>powers<\/strong>: 0<\/span><br \/>\n<span style=\"color: #800000;\"><strong>separate<\/strong>: 3<\/span><br \/>\n<span style=\"color: #800000;\"><strong>Nature<\/strong>: 2<\/span><br \/>\n<span style=\"color: #800000;\"><strong>Nature's<\/strong>: 2<\/span><br \/>\n<span style=\"color: #800000;\"><strong>causes<\/strong>: 2<\/span><br \/>\n<span style=\"color: #800000;\"><strong>separation<\/strong>: 2<\/span><\/p>\n<p>ChatGPT then offer a sum:<\/p>\n<p style=\"padding-left: 40px;\">Total: <strong>56 instances of 'e'<\/strong>.<\/p>\n<p>Which I think is wrong &#8212; unless I and my program miscounted, the sum of the (right and wrong) word-level counts that ChatGPT offers is actually 55.<\/p>\n<p>The actual number of 'e' letters in that sentence is 50, and oddly enough, when I ask again for a total count, ChatGPT gets it right:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/ChatGPT_Declaration_ecount.png\"><img decoding=\"async\" title=\"Click to embiggen\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/ChatGPT_Declaration_ecount.png\" width=\"490\" \/><\/a><\/p>\n<p>And actually offers counting code, which actually runs and actually works:<\/p>\n<p style=\"padding-left: 40px;\"># The sentence to analyze<br \/>\nsentence = (\"When in the Course of human events, it becomes necessary for one people to dissolve the political bands \"<br \/>\n\"which have connected them with another, and to assume among the powers of the earth, the separate and \"<br \/>\n\"equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions \"<br \/>\n\"of mankind requires that they should declare the causes which impel them to the separation.\")<\/p>\n<p style=\"padding-left: 40px;\"># Counting the occurrences of the letter 'e'<br \/>\ncount_e = sentence.lower().count('e')<br \/>\ncount_e<\/p>\n<p>So why it screwed up the first answer so badly remains a puzzle &#8212; seem like the Letter Equity Task Force still has some things to do, and people should continue to be careful about relying on ChatGPT for even the simplest forms of data analysis.<\/p>\n<p>The whole dialogue is <a href=\"https:\/\/chatgpt.com\/share\/675182ff-8190-8011-9bac-731399473d49\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Previous LLOG coverage: \"AI on Rs in 'strawberry'\", 8\/28\/2024; \"'The cosmic jam from whence it came'\", 9\/26\/2024. Current satire: Alberto Romero, \"Report: OpenAI Spends Millions a Year Miscounting the R\u2019s in \u2018Strawberry\u2019\", Medium 11\/22\/2024. OpenAI, the most talked-about tech start-up of the decade, convened an emergency company-wide meeting Tuesday to address what executives are calling [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[322,23],"tags":[],"class_list":["post-67198","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-humor"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/67198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=67198"}],"version-history":[{"count":11,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/67198\/revisions"}],"predecessor-version":[{"id":67209,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/67198\/revisions\/67209"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=67198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=67198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=67198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}