Language Log

Replication of failure to replicate

July 19, 2025 @ 6:59 am · Filed by Mark Liberman under Linguistics in the comics

Mouseover title: "Maybe encouraging the publication of null results isn't enough–maybe we need a journal devoted to publishing results the study authors find personally annoying."

Actually, there's a long history of scientific and scholarly publications based on personal annoyance — my favorite is the 1955-1961 back-and-forth between Herb Simon and Benoit Mandelbrot, discussed in "The long tail of religious studies?", 8/5/2010. And I have to confess that an occasional bit of annoyance has motivated some LLOG posts.

Anyhow, there's been some progress in relevant attitudes at journals, scientific and technical societies, and funders, towards promoting (and even requiring) the replication-friendly open publication of data, code, etc. — though there's still a long way to go…

A few relevant past posts:

"Open Data and Reproducible Research: Blurring the Boundaries between Research and Publication", Berlin 6 Open access Conference (11/12/2008)
"Human Language Technologies in the United States:Reflections 1966-2008", MYL Berlin 6 slides, 11/12/2008
"Reproducible research", 11/13/2008
"Reproducible Science at AAAS 2011", 2/18/2011
"Replication Rumble", 3/17/2012
"Textual narcissism", 7/13/2012
"Textual narcissism, replication 2", 7/14/2012
"Literate programming and reproducible research", 2/22/2014
“Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results”, NRC Workshop 2/26/2015
"Reliability", 2/28/2015
"Replicability vs. reproduciblity — or is it the other way around?", 10/31/2015
"Replicate vs. reproduce (or vice versa?)", 2/15/2018

Update — We should note that publishing open data and code is only one step towards a solution. In honest and intelligent research, there are still the problems of parameter choices, analysis method choices, and uncontrolled co-variates. And across the spectrum of motivated, biased, and less honest research, those problems get worse.

Still, access to data and code makes it easier to detect and fix such problems. And shared data, collected and distributed by reliable sources, eliminates or at least reduces some of the concerns.

July 19, 2025 @ 6:59 am · Filed by Mark Liberman under Linguistics in the comics

Permalink

10 Comments »

JPL said,

July 21, 2025 @ 1:50 am

Maybe the fact that up till now there are no comments on this one is because the first panel was supposed to say, "… found that no major scientific results could be reproduced".
Christopher Chiesa said,

July 21, 2025 @ 6:48 am

I think the biggest problem is that a lot of people simply fake their data. I was working on a Master's degree one time and getting all worked up about not being able to finish because I had nothing to write about in a thesis, and someone at work who shall remain nameless told me — and this is more or less a direct quote — "Just fake the data, that's what I did for my PhD." I was appalled and horrified, but apparently this is quite common. Oh, and I never did get my Master's because I never did finish a thesis. I wasn't willing to fake it.
MiniFi said,

July 21, 2025 @ 7:50 am

When I was researching for my thesis, one of the most memorable papers I read put sensors on the bridge, measured for a year or something, and then concluded that they couldn't find any correlation over that time. I admired that boldness.

My paper did find correlating data, but my professor recommended an analysis method that I had to pull from a very complicated paper from China. I think I did it right but it was way over my head so I have no idea.
Jonathan said,

July 21, 2025 @ 8:12 am

I remember when I was first in college and heard about peer reviewed journals. I initially thought they actually repeated the studies others did and ensured they were replicable before they were published! How naive I was!
S. said,

July 21, 2025 @ 8:40 am

A big issue with publications is a jobs now require numerous amounts of publications so that leads to a decline in quality. Also resulting in more publications results in people reading fewer beyond an abstract leaving people to blindly trust in the results. Also there is no incentive to validate results post publication.

I work for government agency that stands by there fundamental science practice, but I frequently see people pushing through publications just so they can put the citation on there research science records so that they can then request a promotion. Meanwhile the publication is not reproducible but at the same time no one either attempts to reproduce it, or if they point out it's not reproducible they receive response that they're just not doing it correctly with no help on how to reproduce it.

My spouse that works as a university project scientist, is frequently pressured to fake results. Especially if they are doing an experiment based on a publication from a previous member of the lab. In short what happens is that they run the experiment, can't reproduce the result, and instead of documenting that the previous publication is flawed my spouse is instead pressured by the senior full professor to just claim it works and move on. Of course the full professor does not say that directly, instead they phrase it in a way that legally is not pushing you to fake results, but rather insinuating it so that you either fake the results or look for a new job.

For myself personally, I have focused on less frequent, high quality research publications and often teased that I would make twice as much money if I would just dilute my quality and for every publication instead squeeze out three simpler ones based around the same topic.

Unfortunately this is the mindset of much of the research community with ultimately being forced to publish or perish, and to publish to improve your reliability as a scientist, even though typically those with lots of low quality publications probably are less reliable.

After all, most skim through a CV and if they see a thousand publications they think wow this person is a genius well they only see a couple dozen they think this person's new in their career.
Connie said,

July 21, 2025 @ 10:45 am

Wow… Thoroughly disturbed. Truly didn't expect to be so widespread with this problem. And we are training AI on this data!!!
BioProf said,

July 21, 2025 @ 3:54 pm

It’s just amazing how – since you all think that all the data are faked and the analysis arbitrary – all the drugs currently saving your lives were discovered, imroved, screened for safety, and brought to market.
Philip Taylor said,

July 22, 2025 @ 4:26 am

Drugs such as thalidomide, Prof ? Or any of those listed at https://en.wikipedia.org/wiki/List_of_withdrawn_drugs ? Not to mention (from personal experience) the most unpleasant side effects of amiodorone.
Mark Liberman said,

July 22, 2025 @ 7:50 am

@BioProf:

The motivations behind arguments both pro and con the validity of scientific results can obviously be good, neutral, or bad, just as the arguments themselves can be right or neutral or wrong. For a current claim about bad motives, see David Michaels and Wendy Wagner, "Trump's 'Gold Standard' for Science Manufactures Doubt", The Atlantic 7/20/2025 ("By emphasizing scientific uncertainty above other values, political appointees can block any regulatory action they want to.")

That doesn't mean the "Replication Crisis" wasn't (or isn't) real, or that all peer-reviewed publications should be taken as truth.
Timely Jaguar said,

July 22, 2025 @ 10:57 am

So … unfortunately lots of drug trial data is faked by excluding data from those who suffered adverse results. It's not actually that rare for drug studies to intentionally understate adverse effect data given the economic incentives typically involved. Billions are spent and billions are intended to be made back, so given those incentives it would be surprising if data was never faked, or credulity to believe it doesn't happen. Just like it's credulity to believe the US government didn't/doesn't fund black projects with drug money (co-intel pro), didn't intentionally torture thousands in an attempt to develop mind control protocols (MK Ultra), and doesn't record all communications (Snowden revelations). We definitely haven't covertly toppled governments just for the economic benefit of single companies either…

RSS feed for comments on this post · TrackBack URI

Replication of failure to replicate

10 Comments »

JPL said,

Christopher Chiesa said,

MiniFi said,

Jonathan said,

S. said,

Connie said,

BioProf said,

Philip Taylor said,

Mark Liberman said,

Timely Jaguar said,

Leave a Comment

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta