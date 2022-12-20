« previous post |

As I observed in "Alexa down, ChatGPT up?" (12/8/2022), there's reason to fear that LLMs ("Large Language Models") like ChatGPT will force major changes in writing education, by offered a cheap and easy way to generate essay assignments. A small sample of the extensive published discussion:

Stephen Marche, "The College Essay is Dead", The Atlantic 12/6/2022

Daniel Lametti, "A.I. Could Be Great for College Essays", Slate 12/7/2022

Daniel Herman, "ChatGPT will end High School English", The Atlantic 12/9/2022

Beth McMurtrie, "AI and the Future of Undergraduate Writing: Teaching experts are concerned, but not for the reasons you think", The Chronicle of Higher Education 12/13/2022

Of course, various other forms of cheating have been common for hundreds of years, starting with simple plagiarism and ghost-written submissions. The internet has made it easier to find texts to copy or ghostwriters to hire — but modern technology has also brought us plagiarism-detection systems, which catch at least the simplest cases. Will we see effective LLM-detection software?

There are already (purported) examples — thus Sun Kim, "How to Detect OpenAI's ChatGPT Output", Medium 12/11/2022:

The output (e.g., essays) provided by ChatGPT is so good, if I was a student, I would be using ChatGPT to complete most of my school assignment with minor revisions.

[…]

The good news is that there is a tool from Hugging Face called GPT-2 Output Detector Demo (https://huggingface.co/openai-detector) that was developed in conjunction with OpenAI (see details here => https://huggingface.co/roberta-base-openai-detector).

Kim tests that detector on a (short) ChatGPT-generated "essay", with impressive results:

But one example doesn't tell us much — and the "GPT-2" part of the title should give us pause, since ChatGPT is based on a later and bigger system, GPT-3.5.

Here's one more test case, where the system fails badly. A few days ago, I came across a paper with the title "Notch-dependent cooperativity between myeloid lineages promotes Langerhans cell histiocytosis pathology", and so I asked ChatGPT

What is "Notch-dependent cooperativity between myeloid lineages"?

…and got the answer:

Notch signaling is a pathway that plays a role in the development and differentiation of many cell types, including myeloid cells. Myeloid cells are a type of immune cell that includes macrophages, neutrophils, and monocytes, as well as some types of dendritic cells and megakaryocytes.

Notch-dependent cooperativity between myeloid lineages refers to the process by which different myeloid cell types interact and cooperate with each other through the Notch signaling pathway. This can involve the activation of Notch receptors on one cell type by ligands on another cell type, or the production of signaling molecules by one cell type that affect the differentiation or function of another cell type.

The Notch signaling pathway is important for the development and function of the immune system, and defects in the pathway can lead to immune disorders or cancer. Understanding how myeloid cells interact and cooperate through the Notch pathway can help researchers understand how the immune system functions and how to treat or prevent immune-related diseases.

Result from the detector? Not good:

Perhaps we'll see another version, trained with the latest LLM input, that works better — though at best that would lead to a complex creation/detection arms race.

Some other ideas can be found in Melissa Heikkilä, "How to spot AI-generated text", MIT Technology Review 12/19/2022, which focuses not on AI ghost-writing for students, but rather on the potential for broader misinformation pollution:

In an already polarized, politically fraught online world, these AI tools could further distort the information we consume. If they are rolled out into the real world in real products, the consequences could be devastating.

We’re in desperate need of ways to differentiate between human- and AI-written text in order to counter potential misuses of the technology, says Irene Solaiman, policy director at AI startup Hugging Face, who used to be an AI researcher at OpenAI and studied AI output detection for the release of GPT-3’s predecessor GPT-2.

[…]

There are various ways researchers have tried to detect AI-generated text. One common method is to use software to analyze different features of the text—for example, how fluently it reads, how frequently certain words appear, or whether there are patterns in punctuation or sentence length.

“If you have enough text, a really easy cue is the word ‘the’ occurs too many times,” says Daphne Ippolito, a senior research scientist at Google Brain, the company’s research unit for deep learning.

Because large language models work by predicting the next word in a sentence, they are more likely to use common words like “the,” “it,” or “is” instead of wonky, rare words. This is exactly the kind of text that automated detector systems are good at picking up, Ippolito and a team of researchers at Google found in research they published in 2019.

But Ippolito’s study also showed something interesting: the human participants tended to think this kind of “clean” text looked better and contained fewer mistakes, and thus that it must have been written by a person.

“A typo in the text is actually a really good indicator that it was human written,” she adds.

Those seem like good ideas. But if the simple the-frequency cue ever worked reliably, it doesn't now.

For example, the ChatGPT output from Ben Jacobs, "That AI Chatbot Wrote a Pretty Decent New York Article" (New York Magazine 12/5/2022), does have 20 instances of the in 261 words, or 7.7% — which is towards the high side of the frequencies in English text. But Ben Jacob's most recent real New York Magazine article ("GOP Leadership Fight in Congress Shows a Party of Warlords", 11/17/2022) has 22 instances of the in 279 words, or 7.9%.

And ChatGPT's answer to my question about "Notch-dependent cooperativity between myeloid lineages" has 12 instances of the in 162 words, or 7.3% — but Wikipedia's section on Notch signaling pathway mechanisms has 24 instances of the in 280 words, or 8.6%.

Of course the 2019 Google paper tests more serious detection methods than simple the frequency, and proposes a variety of more sophisticated directions of research — so we should stay tuned and hope for some open-access methods to test.

It'll be interesting to learn how students actually use LLMs in their work — and how writing courses adapt to the situation.

With respect to the issue of disinformation, it seem to me that LLMs are far from the biggest problem. And the publications of most public figures are already ghostwritten anyhow, so there's no (additional) ethical issue there.

For some earlier discussions of authorship and plagiarism, see e.g.

And there's a much longer list of posts on apparently authoritative human-created disinformation — more on that later.

