Language Log

Stepford authors

January 12, 2024 @ 6:41 am · Filed by Mark Liberman under Computational linguistics, Language and the law

The issues discussed in "AI plagiarism" (1/4/2024) are rapidly coming to a boil. But somehow I missed Margaret Atwood's take on the topic, published last summer — "Murdered by my replica", The Atlantic 8/26/2023:

Remember The Stepford Wives? Maybe not. In that 1975 horror film, the human wives of Stepford, Connecticut, are having their identities copied and transferred to robotic replicas of themselves, minus any contrariness that their husbands find irritating. The robot wives then murder the real wives and replace them. Better sex and better housekeeping for the husbands, death for the uniqueness, creativity, and indeed the humanity of the wives.

The companies developing generative AI seem to have something like that in mind for me, at least in my capacity as an author. (The sex and the housekeeping can be done by other functionaries, I assume.) Apparently, 33 of my books have been used as training material for their wordsmithing computer programs. Once fully trained, the bot may be given a command—“Write a Margaret Atwood novel”—and the thing will glurp forth 50,000 words, like soft ice cream spiraling out of its dispenser, that will be indistinguishable from something I might grind out. (But minus the typos.) I myself can then be dispensed with—murdered by my replica, as it were—because, to quote a vulgar saying of my youth, who needs the cow when the milk’s free?

To add insult to injury, the bot is being trained on pirated copies of my books. Now, really! How cheap is that? Would it kill these companies to shell out the measly price of 33 books? They intend to make a lot of money off the entities they have reared and fattened on my words, so they could at least buy me a coffee.

For a few more recent bubbles from the AI Plagiarism pot, see Alex Reisner, "The Flaw That Could Ruin Generative AI", The Atlantic 1/11/2024:

Earlier this week, the Telegraph reported a curious admission from OpenAI, the creator of ChatGPT. In a filing submitted to the U.K. Parliament, the company said that “leading AI models” could not exist without unfettered access to copyrighted books and articles, confirming that the generative-AI industry, worth tens of billions of dollars, depends on creative work owned by other people.

We already know, for example, that pirated-book libraries have been used to train the generative-AI products of companies such as Meta and Bloomberg. But AI companies have long claimed that generative AI “reads” or “learns from” these books and articles, as a human would, rather than copying them. Therefore, this approach supposedly constitutes “fair use,” with no compensation owed to authors or publishers. Since courts have not ruled on this question, the tech industry has made a colossal gamble developing products in this way. And the odds may be turning against them.

And Matteo Wong, "What If We Held ChatGPT to the Same Standard as Claudine Gay?", The Atlantic 1/10/2024:

If you squint and tilt your head, you can see some similarities in the blurry shapes that are Harvard and OpenAI. Each is a leading institution for building minds, whether real or artificial—Harvard educates smart humans, while OpenAI engineers smart machines—and each has been forced in recent days to stare down a common allegation. Namely, that they are represented by intellectual thieves.

Last month, the conservative activist Christopher Rufo and the journalist Christopher Brunet accused then–Harvard President Claudine Gay of having copied short passages without attribution in her dissertation. Gay later admitted to “instances in my academic writings where some material duplicated other scholars’ language, without proper attribution,” for which she requested corrections. Some two weeks later, The New York Times sued Microsoft and OpenAI, alleging that the companies’ chatbots violated copyright law by using human writing to train generative-AI models without the newsroom’s permission.

The two cases share common ground, yet many of the responses to them could not be more different. Typical academic standards for plagiarism, including Harvard’s, deem unattributed paraphrasing or lackluster citations a grave offense, and Gay — still dealing with the fallout from her widely criticized congressional testimony and a wave of racist comments — eventually resigned from her position. (I should note that I graduated from Harvard, before Gay became president of the university.) Meanwhile the Times’ and similar lawsuits, many legal experts say, are likely to fail, because the legal standard for copyright infringement generally permits using protected texts for “transformative” purposes that are substantially new. Perhaps that includes training AI models, which work by ingesting huge amounts of written texts and reproducing their patterns, content, and information. AI companies have acknowledged, and defended, using human work to train their programs. (OpenAI has said the Times’ case is “without merit.” Microsoft did not immediately respond to a request for comment.)

It seems likely to me that the big tech companies, old and new, will end up somehow paying authors, artists, and musicians. It's not at all clear how to do that in a legally-coherent way (much less in a morally fair way), but the existing mechanisms for collection and distribution of such fees in the non-AI world are legally and socially established, despite being not exactly logical, or even vaguely consistent. And the definitions and enforcement practices for (the wide variety of different things called) "plagiarism" are much worse.

Here are a few of our previous posts on plagiarism-adjacent topics, in reverse chronological order. (I'll spare you the posts on intellectual property, copyright, etc. …)

"The plagiarism circus", 1/6/2024
"AI plagiarism", 1/4/2024
"Plagiarism: Double (and triple and quadruple) standards", 12/27/2023
"Tortured phrases, LLMs, and Goodhart's Law", 6/20/2023
"Retraction Watch: Swamp Man Thing", 5/10/2023
"Tortured phrases: Degrading the flag to clamor proportion", 3/22/2022
"Tortured phrases", 8/14/2021
"The spam technology ecosystem expands", 3/16/2019
"The British Bad Dream", 10/4/2017
"Citation crimes and misdemeanors", 9/9/2017
"Asleep at the wheel at Zombie Lingua?", 9/30/2016
"Intersecting hypocrisies", 7/20/2016
"The extent of Melania's plagiarism", 7/20/2016
"Patchwriting by Rick Perlstein (and Craig Shirley)", 8/8/2014
"Patchwriting", 6/13/2014
"'Plagiarism' vs. 'ghostwriting' again", 2/14/2014
"SOTU plagiarism?", 1/30/2014
"John McIntyre on varieties of plagiarism", 3/30/2013
"Rand Paul's (staffers') plagiarism", 11/7/2013
"Write new speeches, don't borrow from Hollywood", 1/26/2012
"Is 'plagiarism' in a judicial decision wrong?", 4/14/2011
"Visualization of plagiarism", 5/7/2011
"Academic ghostwriting", 12/5/2010
"'The writer I hired was a plagiarist!'", 7/13/2010
"An experiment", 5/18/2009
"Plagiarism and restrictions on delegated agency", 10/1/2008
"Moist aversion: the cartoon version", 8/27/2008
"The fine line between phrasal allusion and plagiarism", 6/4/2008
"Citation plagiarism once again", 4/23/2008
"Citation plagiarism?", 1/15/2007
"Is Mark Steyn guilty of plagiarism?", 5/15/2006
"Some striking similarities", 5/15/2006
"Unwritten rules and uncreated consciences", 5/4/2006
"Literary shoplifting", 4/30/2006
"Probability theory and Viswanathan's plagiarism", 4/25/2006
"Congratulations to Dan Brown", 4/14/2006

January 12, 2024 @ 6:41 am · Filed by Mark Liberman under Computational linguistics, Language and the law

Permalink

9 Comments

ajay said,

January 12, 2024 @ 7:29 am

Atwood seems to have embarked on some sort of bet or challenge to see how many metaphors she can mix in a single paragraph. She's obviously worried that her swan song will be muffled by the tramping jackboots of the fascist octopus.
Seth said,

January 12, 2024 @ 8:44 am

Disclaimer: I'm not a lawyer, but I've read extensively about copyright law, and studied several key court cases. My view is the opposite of "that the big tech companies, old and new, will end up somehow paying authors, artists, and musicians". This is based on the "Google Books" case. In short, Google won, its product was deemed "fair use" and so permitted under (US) copyright law. While not completely identical, the issues here strike me as very similar, with the AI "fair use" arguments in some cases being even stronger for being "transformative". That almost none of the writers talking about this are familiar with such a foundation legal case, which was much discussed in policy circles for many years, is very disappointing. People tend to make up moral stories about how the world should be, and then just claim the law is the same as their moral stories. There is great anxiety about AI now, so we're seeing it expressed in various indirect ways.
Mark Liberman said,

January 12, 2024 @ 9:58 am

@ajay: "Atwood seems to have embarked on some sort of bet or challenge to see how many metaphors she can mix in a single paragraph."

FWIW, I thought the three metaphors in the quoted three paragraphs were effective: the Stepford Wives as representation of Generative AI's socio-economic role; the soft ice cream dispenser as representation of Generative AI's output process; and "buy me a coffee" as representation of the evasion of even trivial payments.
Timothy Rowe said,

January 12, 2024 @ 11:58 am

I am not a lawyer, but can't see how authors can win this one because of the international aspect. For example, my understanding is that the UK does not have a "fair use" exemption – at least not in the way the US has. But I can't see how we could stop US companies training their AI in the USA in a way that doesn't comply with UK law, or stop them providing the resulting service to the UK, any more than we can force overseas call centres to comply with UK employment law. If the AI companies lose in the USA, they'll just shift operations to somewhere with looser copyright law.
Aardvark Cheeselog said,

January 12, 2024 @ 5:46 pm

Personally I would love to see the ship of generative AI wrecked on the rocks of IP protection. I mean, if we have to have those rocks anyway they certainly ought to wreck that ship.

I sorely miss the upvote button for ajay's response.
Philip Taylor said,

January 13, 2024 @ 11:08 am

Re the mixed metaphors, can we be certain that Murdered by my replica was actually written by Margaret Atwood and not by a bot which had been given the command—“Write an article by Margaret Atwood lamenting the ease with which her work can now be generated by AI” ? After all, if one were to do so, the bot would surely glurp forth 500+ words, like soft ice cream spiraling out of its dispenser, that will be indistinguishable from that something she might grind out (but without the typos).

Proof of this hypothesis [Source: ChatGPT]

Title: The Erosion of Creativity: Margaret Atwood Reflects on the Ease of AI-Generated Works
In the realm of literature, where the written word has long been regarded as a sacred space for human expression and creativity, a new wave of technology has emerged, bringing both marvel and melancholy. As I, Margaret Atwood, reflect upon the ease with which artificial intelligence can now generate works similar to my own, I find myself grappling with the implications of this technological advancement on the very essence of human creativity.
The advent of AI-generated content has ushered in an era where the once sacred act of crafting literature is being mechanized, distilled into algorithms and lines of code. As a writer who has spent decades honing my craft, pouring my soul into every word, the notion that a machine can replicate the complexity of human thought and emotion is disheartening, to say the least.
My works, such as "The Handmaid's Tale" and "Alias Grace," were born from a deeply personal well of experience, observation, and introspection. They are a testament to the intricate tapestry of human existence, woven with threads of passion, pain, and perseverance. Each sentence, each metaphor, is a reflection of the human condition, a mirror held up to society's conscience. The act of creating such works demands a connection to the world that extends beyond mere data points and statistical probabilities.
Yet, AI, in its ever-advancing capabilities, now boasts the ability to mimic the style and tone of established authors, including myself. The very thought that my unique voice, shaped by a lifetime of experiences, can be replicated by an algorithm is disconcerting. It raises questions about the nature of creativity, the soul of literature, and the irreplaceable human touch that infuses every page with life.
Artificial intelligence operates on patterns, data, and pre-existing examples. It can analyze my writing style, dissect the intricacies of my narratives, and reproduce what might seem like an authentic Margaret Atwood piece. However, what it lacks is the lived experience, the emotional reservoir, and the nuanced understanding of the human condition that inform every word I pen.
Literature is not merely a collection of words strung together in a pleasing manner; it is a vessel for human connection, empathy, and shared understanding. The act of writing is an inherently human endeavor, a dance between intellect and emotion that transcends the mechanical precision of algorithms. The subtle nuances, the poetic cadence, the ineffable essence of a writer's voice — these are elements that no machine can truly replicate.
As I lament the ease with which my work can now be generated by AI, I urge readers and creators alike to recognize the profound difference between the output of a machine and the labor of a human mind. The beauty of literature lies not just in the words on the page but in the authenticity of the voice behind those words. It is my hope that we, as a society, will continue to celebrate and preserve the irreplaceable artistry of human expression, resisting the allure of facile imitation offered by artificial intelligence.
Brett said,

January 13, 2024 @ 11:14 pm

While I think this is unfair to a lot of authors, I confess that I feel a certain sense of Schadenfreude about Margaret Atwood—who infamously mocked science fiction as "talking squids in outer space"—in particular complaining about this particularly bit of technological dystopia.

@Seth: While Google nominally won that suit, the issues were never really resolved, and Google mooted the issue by basically abandoning the Google Books project as a way of making works freely available online. Much more recently, the Internet Archive, which had persisted in making books available through an online lending system, lost a major lawsuit brought by publishers—although they are appealing.
Seth said,

January 17, 2024 @ 7:11 pm

@Brett – The key aspect here of the Google case was the finding of "transformative use", and hence fair use of the copyright material. The Internet Archive case is different, since they are distributing the whole work itself. I think AI applications have an even greater claim to being "transformative use". The bad argument often seen in discussions appears to be that just using copyrighted works without permission is forbidden, full stop. And then many writers seem to denounce the supposed ignorant AI'ers for not realizing this, and fantasize about them being punished.
ajay said,

January 18, 2024 @ 8:44 am

I am particularly fond of Atwood's idea of a cow that produces soft-serve ice cream. I certainly wouldn't kill that cow!

RSS feed for comments on this post

Stepford authors

9 Comments

ajay said,

Seth said,

Mark Liberman said,

Timothy Rowe said,

Aardvark Cheeselog said,

Philip Taylor said,

Brett said,

Seth said,

ajay said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta