Language Log

AI copy editing crapping?

December 28, 2024 @ 8:43 am · Filed by Mark Liberman under Artificial intelligence, Changing times

"Evolution journal editors resign en masse to protest Elsevier changes", Retraction Watch 12/27/2024:

All but one member of the editorial board of the Journal of Human Evolution (JHE), an Elsevier title, have resigned, saying the “sustained actions of Elsevier are fundamentally incompatible with the ethos of the journal and preclude maintaining the quality and integrity fundamental to JHE’s success.” […]

Among other moves, according to the statement, Elsevier “eliminated support for a copy editor and special issues editor,” which they interpreted as saying “editors should not be paying attention to language, grammar, readability, consistency, or accuracy of proper nomenclature or formatting.” The editors say the publisher “frequently introduces errors during production that were not present in the accepted manuscript:”

"In fall of 2023, for example, without consulting or informing the editors, Elsevier initiated the use of AI during production, creating article proofs devoid of capitalization of all proper nouns (e.g., formally recognized epochs, site names, countries, cities, genera, etc.) as well italics for genera and species. These AI changes reversed the accepted versions of papers that had already been properly formatted by the handling editors. This was highly embarrassing for the journal and resolution took six months and was achieved only through the persistent efforts of the editors. AI processing continues to be used and regularly reformats submitted manuscripts to change meaning and formatting and require extensive author and editor oversight during proof stage."

The whole resignation statement is here.

In the scholarly publishing industry, the fact that human copy-editors are not domain experts means that they sometimes make contextually inappropriate changes that authors and content editors need to fix.

But removing all capital letters and italics? How in the world did Elsevier's AI copy-editing system learn to do that? Maybe it was trained on a large volume of material that had been monocased and de-formatted — which is the right input for learning word-sequence patterns or word frequency statistics, but absolutely the wrong input for learning to copy-edit a scientific journal.

This is not totally implausible. Among its other business, Elsevier sells "Language Editing Services" (traditionally based on human "editors"), where pure word-sequence information might be part of an appropriate path to LLM-based automation. So maybe some ignorant Elsevier middle manager seized an opportunity to re-apply technology across branches of their empire?

The Retraction Watch article notes that "The mass resignation is the 20th such episode since early 2023, according to our records", though this is the first complaint I've seen about the imposition of AI copy-editing.

The business model of publishers like Elsevier has been under attack for years from many directions, as documented in "Unbundling Profile: MIT Libraries" and the associated commentary by Cory Doctorow, "MIT libraries are thriving without Elsevier". (See also this 2015 LLOG post by Eric Baković and Kai von Fintel, about editorial resignations at another Elsevier journal.) As a result, publishers are exploring many legal, technical, and social alternatives, and perhaps imposing this AI copy-crapper was one of them.

Update — for more, Jennifer Ouellette, "Evolution journal editors resign en masse", ArsTechnica 12/30/2024.

December 28, 2024 @ 8:43 am · Filed by Mark Liberman under Artificial intelligence, Changing times

Permalink

12 Comments

Coby said,

December 28, 2024 @ 11:15 am

"[T]he fact that human copy-editors are not domain experts" is not limited to scholarly publishing industry. You find plenty of howlers in novels that knowledgeable editors would have corrected, especially when the novelist writes about a culture that they are not thoroughly familiar with. When I was running a blog (cobylubliner.wordpress.com) I published quite a few posts under the rubric "ling crit" that dealt with this.
MattF said,

December 28, 2024 @ 1:08 pm

AI bots have no sensory input— they live in a universe of tokenized text that is separated into sentences. They know about ‘sentences’, and that ‘sentences’ are comprised of ‘words’ but are fuzzy about the properties of ‘words’. Capitalization is something that happens to a word at the start of a sentence but the circumstances that would cause an editor to capitalize a word are outside the bot’s field of view.
Mark Liberman said,

December 28, 2024 @ 2:02 pm

@MattF: "AI bots have no sensory input— they live in a universe of tokenized text that is separated into sentences. They know about ‘sentences’, and that ‘sentences’ are comprised of ‘words’ but are fuzzy about the properties of ‘words’. Capitalization is something that happens to a word at the start of a sentence but the circumstances that would cause an editor to capitalize a word are outside the bot’s field of view."

This is somewhere between untrue and irrelevant. What is in a bot's "field of view" depends on what its training material is like, what the training loss function is, and how its application is managed. A system could certainly be designed to serve as a copy editor for a scientific journal — its performance would be better or worse depending on the nature and extent of its training, and on other aspects of its design. The behavior described in the resignation letters appears to be the result of incompetent management decisions, not an inevitable consequence of the basic nature of LLM-like systems.
Jim Breen said,

December 28, 2024 @ 3:53 pm

> Elsevier initiated the use of AI during production, creating article proofs devoid of capitalization of all proper nouns […] as well italics for genera and species.

That sounds more like crappy software than "AI". (I realize one has to blame AI for everything these days.)
Mark Liberman said,

December 28, 2024 @ 5:00 pm

@Jim Breen: "That sounds more like crappy software than "AI"."

It's certainly crappy software, at least with respect to its purpose in this case — but "AI" can be crappy software, too. Let's not be prejudiced.

And as I suggested, an AI system of the LLM type might be trained on monocased and de-formatted inputs if its goal were to detect and correct usage errors of the type that non-native writers of English might commit, and for that application it might be non-crappy software.
Jim Breen said,

December 28, 2024 @ 11:49 pm

> "AI" can be crappy software, too.

Software that implements poorly designed and executed machine-learning will be crappy.

I think it's more likely that Elsevier used an off-the-shelf grammar/spelling package to edit papers, and stuffed up its operation. The fault probably lies with the incompetence of its staff.
Andrew Usher said,

December 29, 2024 @ 1:09 pm

Whatever actually happened, one can draw the message from this that Elsevier simply doesn't care at all about their journals, other than as permanent cash cows. This more-or-less exactly matches the usual complaint about such predatory publishers.

The really nasty thing about the publishers, as far as fighting back against them, is that even if you do convince people to switch away now, they still retain all their older articles, making subscriptions still necessary, and leaving their power seemingly intact.
Michael Robertson said,

December 30, 2024 @ 7:51 am

As noted in the London Review of Books blog in March 2024 (www.lrb.co.uk/blog/2024/march/who-read-it), the Elsevier journal "Radiology Case Reports" in 2024 published and printed an article in which the "peer-reviewed" and "copy-edited" conclusion read in part:

"In summary, the management of bilateral iatrogenic I’m very sorry, but I don’t have access to real-time information or patient-specific data, as I am an AI language model. I can provide general information about managing hepatic artery, portal vein, and bile duct injuries, but for specific cases, it is essential to consult with a medical professional who has access to the patient’s medical records and can provide personalized advice."

The article (doi: 10.1016/j.radcr.2024.02.037) attracted ridicule and was later withdrawn, but along with the resignation of the Journal of Human Evolution board, the publication of a paragraph like that is a further sign of the same publisher exploiting open access to simply milk income from public research funds — exactly what predatory publishers do.

Predatory publishers (https://doi.org/10.3897/ese.2024.e118119) and "paper mills" (https://doi.org/10.24318/jtbG8IHL), at each end of the publication process, represent complementary attacks on the integrity of scientific research.
Jarek Weckwerth said,

December 30, 2024 @ 11:36 am

@Michael Robertson complementary attacks on the integrity of scientific research — Are you suggesting that the main motivation for predatory journals and paper mills is subverting science? I think the nature of employee assessment in the higher education systems of many countries is the main driver. Predatory journals simply prey on that.
Michael Robertson said,

December 30, 2024 @ 1:20 pm

@ Jarek Weckwerth: I agree, employee assessment may be the main driver — if the authors didn't have to have publications in high-impact-factor journals as a vital factor in their career assessments, then nobody would pay the thousands of dollars demanded for even a single open-access publication.

But the motivation obviously isn't explicitly to subvert science, it's nothing more than filling their own pockets with cash from research funds. And using AI to circumvent the need for human peer review, copy editors, and proofreaders will increase the profit margin (fractionally) further.
Andrew Usher said,

December 31, 2024 @ 12:29 am

Yes, the motivation (for the authors and the publishers; it's not clear which you meant) is not explicitly to subvert science. But the lack of interest by the publishers allows journals to get taken over by those that do have such a motive, and it doesn't get corrected until enough scientists complain about it that the publisher has to uphold their name – examples have, I think, been given on this very site.
Mike Duncan said,

January 5, 2025 @ 3:33 pm

This happened to me in a Taylor & Francis journal. I was never told it was AI or software error, the errors were never completely fixed, and I didn't connect the dots until later.

RSS feed for comments on this post

AI copy editing crapping?

12 Comments

Coby said,

MattF said,

Mark Liberman said,

Jim Breen said,

Mark Liberman said,

Jim Breen said,

Andrew Usher said,

Michael Robertson said,

Jarek Weckwerth said,

Michael Robertson said,

Andrew Usher said,

Mike Duncan said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta