Are LLMs writing PubMed articles?
Kyle Orland, "The telltale words that could identify generative AI text", ars technica 7/1/2024
In a pre-print paper posted earlier this month, four researchers from Germany's University of Tubingen and Northwestern University said they were inspired by studies that measured the impact of the COVID-19 pandemic by looking at excess deaths compared to the recent past. By taking a similar look at "excess word usage" after LLM writing tools became widely available in late 2022, the researchers found that "the appearance of LLMs led to an abrupt increase in the frequency of certain style words" that was "unprecedented in both quality and quantity."
To measure these vocabulary changes, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024, tracking the relative frequency of each word as it appeared across each year. They then compared the expected frequency of those words (based on the pre-2023 trendline) to the actual frequency of those words in abstracts from 2023 and 2024, when LLMs were in widespread use.
The results found a number of words that were extremely uncommon in these scientific abstracts before 2023 that suddenly surged in popularity after LLMs were introduced. The word "delves," for instance, shows up in 25 times as many 2024 papers as the pre-LLM trend would expect; words like "showcasing" and "underscores" increased in usage by nine times as well. Other previously common words became notably more common in post-LLM abstracts: the frequency of "potential" increased 4.1 percentage points; "findings" by 2.7 percentage points; and "crucial" by 2.6 percentage points, for instance.
Read the rest of this entry »