Stepford authors
« previous post | next post »
The issues discussed in "AI plagiarism" (1/4/2024) are rapidly coming to a boil. But somehow I missed Margaret Atwood's take on the topic, published last summer — "Murdered by my replica", The Atlantic 8/26/2023:
Remember The Stepford Wives? Maybe not. In that 1975 horror film, the human wives of Stepford, Connecticut, are having their identities copied and transferred to robotic replicas of themselves, minus any contrariness that their husbands find irritating. The robot wives then murder the real wives and replace them. Better sex and better housekeeping for the husbands, death for the uniqueness, creativity, and indeed the humanity of the wives.
The companies developing generative AI seem to have something like that in mind for me, at least in my capacity as an author. (The sex and the housekeeping can be done by other functionaries, I assume.) Apparently, 33 of my books have been used as training material for their wordsmithing computer programs. Once fully trained, the bot may be given a command—“Write a Margaret Atwood novel”—and the thing will glurp forth 50,000 words, like soft ice cream spiraling out of its dispenser, that will be indistinguishable from something I might grind out. (But minus the typos.) I myself can then be dispensed with—murdered by my replica, as it were—because, to quote a vulgar saying of my youth, who needs the cow when the milk’s free?
To add insult to injury, the bot is being trained on pirated copies of my books. Now, really! How cheap is that? Would it kill these companies to shell out the measly price of 33 books? They intend to make a lot of money off the entities they have reared and fattened on my words, so they could at least buy me a coffee.
For a few more recent bubbles from the AI Plagiarism pot, see Alex Reisner, "The Flaw That Could Ruin Generative AI", The Atlantic 1/11/2024:
Earlier this week, the Telegraph reported a curious admission from OpenAI, the creator of ChatGPT. In a filing submitted to the U.K. Parliament, the company said that “leading AI models” could not exist without unfettered access to copyrighted books and articles, confirming that the generative-AI industry, worth tens of billions of dollars, depends on creative work owned by other people.
We already know, for example, that pirated-book libraries have been used to train the generative-AI products of companies such as Meta and Bloomberg. But AI companies have long claimed that generative AI “reads” or “learns from” these books and articles, as a human would, rather than copying them. Therefore, this approach supposedly constitutes “fair use,” with no compensation owed to authors or publishers. Since courts have not ruled on this question, the tech industry has made a colossal gamble developing products in this way. And the odds may be turning against them.
And Matteo Wong, "What If We Held ChatGPT to the Same Standard as Claudine Gay?", The Atlantic 1/10/2024:
If you squint and tilt your head, you can see some similarities in the blurry shapes that are Harvard and OpenAI. Each is a leading institution for building minds, whether real or artificial—Harvard educates smart humans, while OpenAI engineers smart machines—and each has been forced in recent days to stare down a common allegation. Namely, that they are represented by intellectual thieves.
Last month, the conservative activist Christopher Rufo and the journalist Christopher Brunet accused then–Harvard President Claudine Gay of having copied short passages without attribution in her dissertation. Gay later admitted to “instances in my academic writings where some material duplicated other scholars’ language, without proper attribution,” for which she requested corrections. Some two weeks later, The New York Times sued Microsoft and OpenAI, alleging that the companies’ chatbots violated copyright law by using human writing to train generative-AI models without the newsroom’s permission.
The two cases share common ground, yet many of the responses to them could not be more different. Typical academic standards for plagiarism, including Harvard’s, deem unattributed paraphrasing or lackluster citations a grave offense, and Gay — still dealing with the fallout from her widely criticized congressional testimony and a wave of racist comments — eventually resigned from her position. (I should note that I graduated from Harvard, before Gay became president of the university.) Meanwhile the Times’ and similar lawsuits, many legal experts say, are likely to fail, because the legal standard for copyright infringement generally permits using protected texts for “transformative” purposes that are substantially new. Perhaps that includes training AI models, which work by ingesting huge amounts of written texts and reproducing their patterns, content, and information. AI companies have acknowledged, and defended, using human work to train their programs. (OpenAI has said the Times’ case is “without merit.” Microsoft did not immediately respond to a request for comment.)
It seems likely to me that the big tech companies, old and new, will end up somehow paying authors, artists, and musicians. It's not at all clear how to do that in a legally-coherent way (much less in a morally fair way), but the existing mechanisms for collection and distribution of such fees in the non-AI world are legally and socially established, despite being not exactly logical, or even vaguely consistent. And the definitions and enforcement practices for (the wide variety of different things called) "plagiarism" are much worse.
Here are a few of our previous posts on plagiarism-adjacent topics, in reverse chronological order. (I'll spare you the posts on intellectual property, copyright, etc. …)
"The plagiarism circus", 1/6/2024
"AI plagiarism", 1/4/2024
"Plagiarism: Double (and triple and quadruple) standards", 12/27/2023
"Tortured phrases, LLMs, and Goodhart's Law", 6/20/2023
"Retraction Watch: Swamp Man Thing", 5/10/2023
"Tortured phrases: Degrading the flag to clamor proportion", 3/22/2022
"Tortured phrases", 8/14/2021
"The spam technology ecosystem expands", 3/16/2019
"The British Bad Dream", 10/4/2017
"Citation crimes and misdemeanors", 9/9/2017
"Asleep at the wheel at Zombie Lingua?", 9/30/2016
"Intersecting hypocrisies", 7/20/2016
"The extent of Melania's plagiarism", 7/20/2016
"Patchwriting by Rick Perlstein (and Craig Shirley)", 8/8/2014
"Patchwriting", 6/13/2014
"'Plagiarism' vs. 'ghostwriting' again", 2/14/2014
"SOTU plagiarism?", 1/30/2014
"John McIntyre on varieties of plagiarism", 3/30/2013
"Rand Paul's (staffers') plagiarism", 11/7/2013
"Write new speeches, don't borrow from Hollywood", 1/26/2012
"Is 'plagiarism' in a judicial decision wrong?", 4/14/2011
"Visualization of plagiarism", 5/7/2011
"Academic ghostwriting", 12/5/2010
"'The writer I hired was a plagiarist!'", 7/13/2010
"An experiment", 5/18/2009
"Plagiarism and restrictions on delegated agency", 10/1/2008
"Moist aversion: the cartoon version", 8/27/2008
"The fine line between phrasal allusion and plagiarism", 6/4/2008
"Citation plagiarism once again", 4/23/2008
"Citation plagiarism?", 1/15/2007
"Is Mark Steyn guilty of plagiarism?", 5/15/2006
"Some striking similarities", 5/15/2006
"Unwritten rules and uncreated consciences", 5/4/2006
"Literary shoplifting", 4/30/2006
"Probability theory and Viswanathan's plagiarism", 4/25/2006
"Congratulations to Dan Brown", 4/14/2006
ajay said,
January 12, 2024 @ 7:29 am
Atwood seems to have embarked on some sort of bet or challenge to see how many metaphors she can mix in a single paragraph. She's obviously worried that her swan song will be muffled by the tramping jackboots of the fascist octopus.
Seth said,
January 12, 2024 @ 8:44 am
Disclaimer: I'm not a lawyer, but I've read extensively about copyright law, and studied several key court cases. My view is the opposite of "that the big tech companies, old and new, will end up somehow paying authors, artists, and musicians". This is based on the "Google Books" case. In short, Google won, its product was deemed "fair use" and so permitted under (US) copyright law. While not completely identical, the issues here strike me as very similar, with the AI "fair use" arguments in some cases being even stronger for being "transformative". That almost none of the writers talking about this are familiar with such a foundation legal case, which was much discussed in policy circles for many years, is very disappointing. People tend to make up moral stories about how the world should be, and then just claim the law is the same as their moral stories. There is great anxiety about AI now, so we're seeing it expressed in various indirect ways.
Mark Liberman said,
January 12, 2024 @ 9:58 am
@ajay: "Atwood seems to have embarked on some sort of bet or challenge to see how many metaphors she can mix in a single paragraph."
FWIW, I thought the three metaphors in the quoted three paragraphs were effective: the Stepford Wives as representation of Generative AI's socio-economic role; the soft ice cream dispenser as representation of Generative AI's output process; and "buy me a coffee" as representation of the evasion of even trivial payments.
Timothy Rowe said,
January 12, 2024 @ 11:58 am
I am not a lawyer, but can't see how authors can win this one because of the international aspect. For example, my understanding is that the UK does not have a "fair use" exemption – at least not in the way the US has. But I can't see how we could stop US companies training their AI in the USA in a way that doesn't comply with UK law, or stop them providing the resulting service to the UK, any more than we can force overseas call centres to comply with UK employment law. If the AI companies lose in the USA, they'll just shift operations to somewhere with looser copyright law.
Aardvark Cheeselog said,
January 12, 2024 @ 5:46 pm
Personally I would love to see the ship of generative AI wrecked on the rocks of IP protection. I mean, if we have to have those rocks anyway they certainly ought to wreck that ship.
I sorely miss the upvote button for ajay's response.
Philip Taylor said,
January 13, 2024 @ 11:08 am
Re the mixed metaphors, can we be certain that Murdered by my replica was actually written by Margaret Atwood and not by a bot which had been given the command—“Write an article by Margaret Atwood lamenting the ease with which her work can now be generated by AI” ? After all, if one were to do so, the bot would surely glurp forth 500+ words, like soft ice cream spiraling out of its dispenser, that will be indistinguishable from that something she might grind out (but without the typos).
Proof of this hypothesis [Source: ChatGPT]
Brett said,
January 13, 2024 @ 11:14 pm
While I think this is unfair to a lot of authors, I confess that I feel a certain sense of Schadenfreude about Margaret Atwood—who infamously mocked science fiction as "talking squids in outer space"—in particular complaining about this particularly bit of technological dystopia.
@Seth: While Google nominally won that suit, the issues were never really resolved, and Google mooted the issue by basically abandoning the Google Books project as a way of making works freely available online. Much more recently, the Internet Archive, which had persisted in making books available through an online lending system, lost a major lawsuit brought by publishers—although they are appealing.
Seth said,
January 17, 2024 @ 7:11 pm
@Brett – The key aspect here of the Google case was the finding of "transformative use", and hence fair use of the copyright material. The Internet Archive case is different, since they are distributing the whole work itself. I think AI applications have an even greater claim to being "transformative use". The bad argument often seen in discussions appears to be that just using copyrighted works without permission is forbidden, full stop. And then many writers seem to denounce the supposed ignorant AI'ers for not realizing this, and fantasize about them being punished.
ajay said,
January 18, 2024 @ 8:44 am
I am particularly fond of Atwood's idea of a cow that produces soft-serve ice cream. I certainly wouldn't kill that cow!