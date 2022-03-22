« previous post |

Guillaume Cabanac, Cyril Labbé & Alexander Magazinov, "'Bosom peril' is not 'breast cancer': How weird computer-generated phrases help researchers find scientific publishing fraud", Bulletin of the Atomic Scientists, 1/13/2022:

In 2020, despite the COVID pandemic, scientists authored 6 million peer-reviewed publications, a 10 percent increase compared to 2019. At first glance this big number seems like a good thing, a positive indicator of science advancing and knowledge spreading. Among these millions of papers, however, are thousands of fabricated articles, many from academics who feel compelled by a publish-or-perish mentality to produce, even if it means cheating. […]

We have been able to spot fraudulent research thanks in large part to one key tell that an article has been artificially manipulated: The nonsensical “tortured phrases” that fraudsters use in place of standard terms to avoid anti-plagiarism software. Our computer system, which we named the Problematic Paper Screener, searches through published science and seeks out tortured phrases in order to find suspect work. While this method works, as AI technology improves, spotting these fakes will likely become harder, raising the risk that more fake science makes it into journals.

As of January 2022, we’ve found tortured phrases in 3,191 peer-reviewed articles published (and counting), including in reputable flagship publications.

See also (by the same authors) "Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals", arXiv.org 7/12/2021.

You can explore their results at the Problematic Paper Screener site — put a word or phrase into the search box and check out the hits.

For example, the most recent hit for "speech" is a book chapter published by Springer Nature in 2021, "Comparative Analysis of GUI-Based Prediction of Parkinson Disease by Speech Using Machine Learning Approach". Here are the 22 "tortured phrases" that their software discovered in that document:

* **Alzheimer's infection** instead of the established _‘Alzheimer's disease’_

* **Parkinson's infection** instead of the established _‘Parkinson's disease’_

* **Parkinson's sickness** instead of the established _‘Parkinson's disease’_

* **R2 esteem** instead of the established _‘R2 value’_

* **arbitrary backwoods** instead of the established _‘random forest’_

* **backing vector machine** instead of the established _‘support vector machine (SVM)’_

* **dimensionality decrease** instead of the established _‘dimensionality reduction’_

* **fake neural** instead of the established _‘artificial neural (network)’_

* **fluffy C implies** instead of the established _‘fuzzy C-means’_

* **gullible Bayes** instead of the established _‘naive Bayes’_

* **head part investigation** instead of the established _‘principal component analysis (PCA)’_

* **help vector machine** instead of the established _‘support vector machine (SVM)’_

* **inertial estimation unit** instead of the established _‘inertial measurement unit’_

* **invulnerable framework** instead of the established _‘immune system’_

* **man-made consciousness** instead of the established _‘artificial intelligence’_

* **mean outright mistake** instead of the established _‘absolute error’_

* **molecule swarm** instead of the established _‘particule swarm’_

* **profound neural organization** instead of the established _‘deep neural network’_

* **square mistake** instead of the established _‘(mean) squared error’_

* **squared blunder** instead of the established _‘(mean) squared error’_

* **fluffy AND fuzzy** instead of the established _‘fuzzy (logics)’_

* **irregular subspace** instead of the established _‘random subspace (caveat: ‘irregular subspace’ is a regular term in anomaly detection and its applications doi:10.1109/TPWRS.2012.2224144)’_

But there are plenty of other substitutions that are apparently not on their list yet. Here's the chapter's abstract:

Parkinson’s ailment is the most predominant neurodegenerative issue influencing in excess of 10 million individuals around the world. There is no single test that can be directed for diagnosing Parkinson’s illness. Due to these troubles, to research an AI way to deal with precisely analyze Parkinson’s, utilizing a given dataset. To forestall this issue in medicinal divisions need to foresee the sickness influenced or not by discovering exactness figuring utilizing AI systems. The point is to research AI-based systems for Parkinson ailment by expectation brings about the best precision with discovering arrangement reports. The examination of a dataset by regulated AI technique (SMLT) to catch a few data resembles variable recognizable proof, uni-variate investigation, bi-variate, and multi-variate examination, missing worth medications and break down the information approval, information cleaning/getting ready and information representation will be done on the whole given dataset. To propose, an AI-based strategy to precisely anticipate the illness by discourse side effect by forecast brings about the type of best exactness and also analyze the presentation of different AI calculations from the given medical clinic dataset with assessment arrangement report, distinguish the outcome shows that GUI with the best exactness with accuracy, Recall, F1 Score explicitness and affectability.

For example, the final phrase "explicitness and affectabity" is a tortured substitutions for the standard "specificity and sensitivity" — and in that context, I'm guessing that their use of "accuracy" was a substitution for "precision".

The whole paper is pretty much incomprehensible without diagnosing and inverting those substitutions. The very first sentence is

Human stride is the procedure of motion accomplished through facilitated appendage development and the controlled removal of the person’s focal point of mass.

I'm not sure what "facilitated appendage development" and "controlled removal" substitute for, but "focal point of mass" is clearly "center of gravity". Later in the paper, in addition to the 22 tortured phrases identified by the Problematic Paper Screener, we see "various sclerosis" for "multiple sclerosis", "blunder pace" for "error rate", "Bolster Vector Machines" for "Support Vector Machines", and on and on.

It's a shock to me that Springer Nature puts this stuff out — but a few seconds of search show that this esteemed (?) publisher is not an isolated case, with Wiley, IEEE, and Elsevier also among the guilty parties.

The tortured phrase "flag to clamor" (substituting for "signal to noise") gets 94 Google Scholar hits, for example this 2017 IEEE publication, whose abstract ends

Preliminary results shows a quite improvement in compression ratio, mean square blunder and the pinnacle flag to clamor proportion (PSNR).

For readers not from the relevant fields, "PSNR" would normally be "peak signal to noise ratio" rather than "pinnacle flag to clamor proportion"…

