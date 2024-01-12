« previous post |

The issues discussed in "AI plagiarism" (1/4/2024) are rapidly coming to a boil. But somehow I missed Margaret Atwood's take on the topic, published last summer — "Murdered by my replica", The Atlantic 8/26/2023:

Remember The Stepford Wives? Maybe not. In that 1975 horror film, the human wives of Stepford, Connecticut, are having their identities copied and transferred to robotic replicas of themselves, minus any contrariness that their husbands find irritating. The robot wives then murder the real wives and replace them. Better sex and better housekeeping for the husbands, death for the uniqueness, creativity, and indeed the humanity of the wives.

The companies developing generative AI seem to have something like that in mind for me, at least in my capacity as an author. (The sex and the housekeeping can be done by other functionaries, I assume.) Apparently, 33 of my books have been used as training material for their wordsmithing computer programs. Once fully trained, the bot may be given a command—“Write a Margaret Atwood novel”—and the thing will glurp forth 50,000 words, like soft ice cream spiraling out of its dispenser, that will be indistinguishable from something I might grind out. (But minus the typos.) I myself can then be dispensed with—murdered by my replica, as it were—because, to quote a vulgar saying of my youth, who needs the cow when the milk’s free?

To add insult to injury, the bot is being trained on pirated copies of my books. Now, really! How cheap is that? Would it kill these companies to shell out the measly price of 33 books? They intend to make a lot of money off the entities they have reared and fattened on my words, so they could at least buy me a coffee.

For a few more recent bubbles from the AI Plagiarism pot, see Alex Reisner, "The Flaw That Could Ruin Generative AI", The Atlantic 1/11/2024:

Earlier this week, the Telegraph reported a curious admission from OpenAI, the creator of ChatGPT. In a filing submitted to the U.K. Parliament, the company said that “leading AI models” could not exist without unfettered access to copyrighted books and articles, confirming that the generative-AI industry, worth tens of billions of dollars , depends on creative work owned by other people.

We already know, for example, that pirated-book libraries have been used to train the generative-AI products of companies such as Meta and Bloomberg. But AI companies have long claimed that generative AI “reads” or “learns from” these books and articles, as a human would, rather than copying them. Therefore, this approach supposedly constitutes “fair use,” with no compensation owed to authors or publishers. Since courts have not ruled on this question, the tech industry has made a colossal gamble developing products in this way. And the odds may be turning against them.

And Matteo Wong, "What If We Held ChatGPT to the Same Standard as Claudine Gay?", The Atlantic 1/10/2024:

If you squint and tilt your head, you can see some similarities in the blurry shapes that are Harvard and OpenAI. Each is a leading institution for building minds, whether real or artificial—Harvard educates smart humans, while OpenAI engineers smart machines—and each has been forced in recent days to stare down a common allegation. Namely, that they are represented by intellectual thieves.

Last month, the conservative activist Christopher Rufo and the journalist Christopher Brunet accused then–Harvard President Claudine Gay of having copied short passages without attribution in her dissertation. Gay later admitted to “instances in my academic writings where some material duplicated other scholars’ language, without proper attribution,” for which she requested corrections. Some two weeks later, The New York Times sued Microsoft and OpenAI, alleging that the companies’ chatbots violated copyright law by using human writing to train generative-AI models without the newsroom’s permission.

The two cases share common ground, yet many of the responses to them could not be more different. Typical academic standards for plagiarism, including Harvard ’s, deem unattributed paraphrasing or lackluster citations a grave offense, and Gay — still dealing with the fallout from her widely criticized congressional testimony and a wave of racist comments — eventually resigned from her position. (I should note that I graduated from Harvard, before Gay became president of the university.) Meanwhile the Times’ and similar lawsuits, many legal experts say, are likely to fail, because the legal standard for copyright infringement generally permits using protected texts for “transformative” purposes that are substantially new. Perhaps that includes training AI models, which work by ingesting huge amounts of written texts and reproducing their patterns, content, and information. AI companies have acknowledged, and defended, using human work to train their programs. (OpenAI has said the Times’ case is “without merit.” Microsoft did not immediately respond to a request for comment.)

It seems likely to me that the big tech companies, old and new, will end up somehow paying authors, artists, and musicians. It's not at all clear how to do that in a legally-coherent way (much less in a morally fair way), but the existing mechanisms for collection and distribution of such fees in the non-AI world are legally and socially established, despite being not exactly logical, or even vaguely consistent. And the definitions and enforcement practices for (the wide variety of different things called) "plagiarism" are much worse.

