Language Log

"Reliability is confused with truth"

June 26, 2021 @ 6:17 am · Filed by Mark Liberman under Clinical applications

Laurent Mottron, "A radical change in our autism research strategy is needed: Back to prototypes", Autism Research 6/2/2021:

ABSTRACT: The evolution of autism diagnosis, from its discovery to its current delineation using standardized instruments, has been paralleled by a steady increase in its prevalence and heterogeneity. In clinical settings, the diagnosis of autism is now too vague to specify the type of support required by the concerned individuals. In research, the inclusion of individuals categorically defined by over-inclusive, polythetic criteria in autism cohorts results in a population whose heterogeneity runs contrary to the advancement of scientific progress. Investigating individuals sharing only a trivial resemblance produces a large-scale type-2 error (not finding differences between autistic and dominant population) rather than detecting mechanistic differences to explain their phenotypic divergences. The dimensional approach of autism proposed to cure the disease of its categorical diagnosis is plagued by the arbitrariness of the dimensions under study. Here, we argue that an emphasis on the reliability rather than specificity of diagnostic criteria and the misuse of diagnostic instruments, which ignore the recognition of a prototype, leads to confound autism with the entire range of neurodevelopmental conditions and personality variants. We propose centering research on cohorts in which individuals are selected based on their expert judged prototypicality to advance the theoretical and practical pervasive issues pertaining to autism diagnostic thresholds. Reversing the current research strategy by giving more weight to specificity than reliability should increase our ability to discover the mechanisms of autism.

I'm glad to see these issues getting more prominent attention — for some background, see "Translating 'phenotypically diverse'", 5/12/2020.

Mottron's article is the start of a conversation in the same journal issue:

David Amaral, "Introduction to commentary by Laurent Mottron and reponses".
Laurent Mottron, "A radical change in our autism research strategy is needed: Back to prototypes".
John Constantino, "Response to “A Radical Change in Our Autism Research Strategy is Needed: Back to Prototypes” by Mottron et al.".
Michael Lombardo, "Prototyping as subtyping strategy for studying heterogeneity in autism".
Christopher Gilberg, "Response to Mottron".
Laurent Mottron, "Progress in autism research requires several recognition-definition-investigation cycles".

The articles are all worth reading, but it seems to me that there's something missing from all of them, namely the effects of small non-shared datasets. As I wrote last year about the study of behaviorally-defined conditions in general:

The relatively small amount of available data, and its relatively poor quality in most respects, is a big problem. It's like trying to analyze a language based on a sample of a few hundred sentences. Clinical research is several decades behind the Big Data curve, partly due to valid concerns for privacy and confidentiality, but also simply for cultural reasons, especially researchers' possessive attitude about "their" data. Finding ways to solve this problem is a key issue — maybe THE key issue.

Mottron addresses this issue, but argues the opposite conclusion:

The dogma for the diagnosis of autism is the use of validated and standardized instruments that unify the operationalization of DSM criteria and reduce the discrepancy between individual judgments. We suspect such standardization of diagnostic procedures to be largely responsible for the plateauing of autism research, by adding artifactual or criteria- or instrument-based heterogeneity to the natural variability of autistic presentation due to sex, age and outcome. The diagnosis of autism is obtained using these instruments when reaching a threshold summary score by adding individual item scores (Randall et al., 2018). Their cut-off threshold scores are determined by a specificity-sensitivity trade-off, expert agreement long ago being their reference. Multiple warnings, especially by C. Lord, that they should not be used alone and without a clinical judgment have been essentially abolished by their commercial presentation as diagnostic instruments. However, we now know that such instruments are over-inclusive (Molloy et al., 2011), influenced by nonspecific dimensions (Fombonne et al., 2020; Havdahl et al., 2016), and vulnerable to large-scale temporal evolution (Arvidsson et al., 2018). Despite such warnings, most research articles use them as an entry point without further refinement. Autism in the clinical and research world of today is what is measured by the ADI-R and ADOS-G and reliability is confused with truth.

A conviction shared by the scientific community in autism is that the first research on small samples biased the results in favor of their initial hypotheses, whereas studies on a large N, with high standards, brought the previously found results into a more just light. This belief is consistent with the belief that meta-analyses provide us with a safer message than individual studies. However, there are undeniable examples (e.g., in intervention: Pickles et al., 2016) in which a single study is better than a thousand studies with lower standards (Dawson & Fletcher-Watson, 2020). Moreover, in the current state of the definition of the autism spectrum, the primacy attributed to the size of the sample over the resemblance of the individuals who compose it creates a level of noise that increases dramatically with the size of the sample. The avoidance of the type-1 risk associated with small samples must be balanced against the type-2 risk associated with large, heterogeneous samples. The phenomenon of Simpson's paradox (Pearl & Mackenzie, 2018) similarly describes the excessive weight given to outliers in a small sample and that of a diverging subgroup within a large sample. If the reliance on a large N is at the cost of an uncontrolled increase in heterogeneity, the gain obtained by increasing the N will be more than offset by the loss of information resulting from the noise of heterogeneity.

I agree with Mottron's criticism of "summary scores of polythetic criteria". But large and appropriate samples of complex phenomenon, properly analyzed, can allow us to decrease rather than increase the "noise of heterogeneity" — especially if the data is shared so that others can discover and correct analytic mistakes.

June 26, 2021 @ 6:17 am · Filed by Mark Liberman under Clinical applications

Permalink

10 Comments

Cervantes said,

June 26, 2021 @ 7:53 am

In fact similar issues apply to many neuro-psychiatric diagnoses. There is no reason to believe that the diagnostic categories incorporate degrees or manifestations of a single underlying phenomenon. The human brain is sufficiently complicated that a variation that someone considers worthy of a diagnosis (I don't want to say pathological because even that is frequently controversial, certainly in the case of autism) could in fact be sui generis. They speak of an autism "spectrum" as though variation is just a matter of degree — it's all electromagnetic radiation, just different wavelengths — but there is in fact no evidence for that. DSM categories are labels of convenience, for purposes of billing and legitimizing prescriptions. They don't refer to actually existing entities.

[(myl) Indeed — as I wrote in the linked post:

This discussion deals with "disorders" like autism spectrum disorder, obsession-compulsive disorder, attention deficit/hyperactivity disorder, bipolar disorder, schizophrenia, and many others, which are what clinical researchers call "phenotypically diverse" — in each case, exhibiting some subset of a wide range of symptoms. Medical culture has always assumed that this diverse array of behavioral manifestations must somehow be grouped into a hierarchy of natural kinds, like species of mushrooms, even if it takes experience and expertise to recognize which box a particular specimen belongs in. And modern medical insurance amplifies this prejudice. But as the cited Nature article notes, this boxology doesn't really work very well in the area of behaviorally-defined mental disorders.

A friend of mine jokes that in this domain , "phenotypically diverse" is the Greek translation of "we have no fucking clue".

]
bks said,

June 26, 2021 @ 7:58 am

And on the other side of the looking glass, who, exactly, is neurotypical?
AntC said,

June 26, 2021 @ 8:57 am

I'm in the IT industry. There seem to be squads of programmers/technicians who self-diagnose themselves as 'Autism Spectrum', whilst immediately rushing to add they're HFA — High-Functioning Autism.

who, exactly, is neurotypical?

Indeed. Whilst IT is relatively recent, since the Industrial Revolution (and probably well before) we've had 'technicians': printers, weavers, quill-pen sharpeners, billiards-markers, mariners, wheelwrights, … a Dickens of a lot of specialist trades. People with deep knowledge in a narrow area, with poor skills at interacting outside their field. I daresay there are even language nerds. (Peter Ustinov characterised as: fluent in 7 languages, but unable to say anything interesting in any of them.)

In IT, this self-diagnosis seems to justify: folk with deep knowledge, being paid to spend their time deepening it, but entirely unable to apply the knowledge to deliver anything useful or value-generating for their employer or for people condemned to battle with software to get a job done. Hence we get the 'nerdview' examples Mark brings up here.

I fear 'autism' and its 'phenotypical diversity' has so much bled into popular culture that it should be a skunked term.
Philip Taylor said,

June 26, 2021 @ 9:43 am

Ant — Is there any a priori reason (or even any a posteriori reason) to believe that printers, weavers, etc., have (or had) poor skills at interacting outside [of] their field ?
J.W. Brewer said,

June 26, 2021 @ 9:51 am

I think something like this is a recurrent issue in all sorts of areas of research, where the thing that's easiest to measure in an objective/quantified way with your existing instruments and datasets is not actually the thing you claim to be trying to study but something else that has some rough statistical correlation with your supposed object of study, which you haven't yet figured out how to measure more directly. Trying to figure out how to measure something that's at least a less-rough and better-fitting proxy for your ultimate object of interest may be a lot harder than publishing a bunch of interesting-sounding findings about your current crude proxy.
mg said,

June 26, 2021 @ 12:27 pm

Having worked in psychiatry in the past, I'm very aware of these issues as well as the difficulties in surmounting them. Psychiatric phenomena don't have any easy ways of differentiating them – no bacteria or misshaped organs or out-of-range blood tests. So researchers and clinicians are left with a sort of "successive approximations" approach to classification, in hopes that coming up with good classification paradigms can help guide better understanding and treatment. The best researchers I worked with were very aware of these issues.

No one has come up with a solution to this problem. The best people have been able to do is come up with subtypes that are less heterogeneous than the large umbrella diagnoses (dysthymia vs. major depression, ADHD (itself misnamed) with hyperactivity, inattention, or combined type, etc. Stumbling in the dark is never easy, even with flashlights.
Rose Eneri said,

June 27, 2021 @ 9:59 am

Isn't every person somewhere on every spectrum? Where along any spectrum do disorders begin? At what point do we declare a "disorder" to exist? At what point is treatment warranted? I believe that many people with a diagnosis of autism do not consider themselves to have a disorder and do not want any kind of treatment.

The impetus for expanding the number of persons diagnosed with autism, by lowering the threshold for diagnosis, comes from 2 sources; parents and mental-health practitioners. Parents want their children "on the spectrum" so the children will qualify for special treatment and benefits, and the mental-health care industry wants to expand its customer base.
stephen said,

June 29, 2021 @ 8:20 pm

People with many different problems were regarded as "retarded". Sometimes the problem was other people being…unkind, or whatever.
Then "retarded" became a schoolyard insult, so it couldn't be used any more. So we say "autistic" now. How do kids in schoolyards use "autistic" now? Is "autistic" going to become a schoolyard insult, so we have to use a different term instead?
Sometimes a person might *seem* autistic because *we* don't know how to handle them.
Like the stereotype of a Chinese person being "inscrutable" because of the non-Chinese person's ignorance of Chinese people.

I was wondering if that stereotype of Chinese people being inscrutable is considered offensive, or how the Chinese feel about that stereotype?
Rodger C said,

June 30, 2021 @ 11:54 am

"Autistic" is already a schoolyard insult.
번하드 said,

July 4, 2021 @ 5:49 pm

@Rose Eneri:
I see one more group with a stake in inflating those numbers: anti-vaxxers.

RSS feed for comments on this post

"Reliability is confused with truth"

10 Comments

Cervantes said,

bks said,

AntC said,

Philip Taylor said,

J.W. Brewer said,

mg said,

Rose Eneri said,

stephen said,

Rodger C said,

번하드 said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta