Language Log

Ouch

April 20, 2019 @ 8:00 am · Filed by Mark Liberman under Computational linguistics

Eliza Strickland, "How IBM Watson Overpromised and Underdelivered on AI Health Care", IEEE Spectrum 4/2/2019 (subhead: "After its triumph on Jeopardy!, IBM’s AI seemed poised to revolutionize medicine. Doctors are still waiting"):

In 2014, IBM opened swanky new headquarters for its artificial intelligence division, known as IBM Watson. Inside the glassy tower in lower Manhattan, IBMers can bring prospective clients and visiting journalists into the “immersion room,” which resembles a miniature planetarium. There, in the darkened space, visitors sit on swiveling stools while fancy graphics flash around the curved screens covering the walls. It’s the closest you can get, IBMers sometimes say, to being inside Watson’s electronic brain.

One dazzling 2014 demonstration of Watson’s brainpower showed off its potential to transform medicine using AI—a goal that IBM CEO Virginia Rometty often calls the company’s moon shot. In the demo, Watson took a bizarre collection of patient symptoms and came up with a list of possible diagnoses, each annotated with Watson’s confidence level and links to supporting medical literature.

Within the comfortable confines of the dome, Watson never failed to impress: Its memory banks held knowledge of every rare disease, and its processors weren’t susceptible to the kind of cognitive bias that can throw off doctors. It could crack a tough case in mere seconds. If Watson could bring that instant expertise to hospitals and clinics all around the world, it seemed possible that the AI could reduce diagnosis errors, optimize treatments, and even alleviate doctor shortages—not by replacing doctors but by helping them do their jobs faster and better.

Outside of corporate headquarters, however, IBM has discovered that its powerful technology is no match for the messy reality of today’s health care system.

And it gets worse, from a PR point of view. The teaser on the magazine cover reads "IBM's MEDICAL AI DEBACLE: Where are the Watsons?" Inside, the article opens with this two-page spread:

The article features a quote from Yoshua Bengio about text analysis:

In many attempted applications, Watson’s NLP struggled to make sense of medical text—as have many other AI systems. “We’re doing incredibly better with NLP than we were five years ago, yet we’re still incredibly worse than humans,” says Yoshua Bengio, a professor of computer science at the University of Montreal and a leading AI researcher. In medical text documents, Bengio says, AI systems can’t understand ambiguity and don’t pick up on subtle clues that a human doctor would notice. Bengio says current NLP technology can help the health care system: “It doesn’t have to have full understanding to do something incredibly useful,” he says. But no AI built so far can match a human doctor’s comprehension and insight. “No, we’re not there,” he says.

I agree that "we're not there", though I have my doubts about the "comprehension and insight" of many human doctors. But I think that the article partly misses the point.

From "subtle clues" in several different sorts of interactions with various sorts of IBMers over the past few years, my conclusion is that most IBM researchers were aware of the technology's capabilities and limitations, but IBM management was hot to monetize their Jeopardy! PR triumph, without really understanding it.

One example: As a result of a 2016 NAACL paper "Exploring Autism Spectrum Disorders Using HLT", I was contacted by a Watson sales rep specializing in autism. (Apparently they had assigned every possible disorder to one or more sales reps — at least they had an international working group on Autism. Which may have done some great things, but this interaction is all I know about it.)

The sales rep tried to persuade me that everything would be much better if we only purchased Watson technology for analyzing the diagnostic interviews involved. As far as either of us knew, no one had ever used a combination of ASR and transcript understanding successfully on any clinical interview recordings, much less to deal with this particular domain. I happened to know that IBM's speech technology at that time was not capable of diarizing ADOS recordings, because I'd tried it. (Not that anyone else's technology was any better.) And the sales rep couldn't give me any plausible story about how we would use Watson's manifold other capabilities to do things we weren't doing already, or couldn't write our own software to do, although he had a well-rehearsed spiel about what those capabilities were.

So from these and other "subtle clues", I inferred that Medical Watson was yet another example of premature commercialization, where the "suits" push misunderstood technology beyond its limits. As in the earlier examples that I'm familiar with, I expect that the people responsible will fail upwards.

This pattern is especially pernicious in the biomedical area — there are real opportunities for important advances, but research is hampered by the lack of access to crucial data. And premature commercialization makes that problem much, much worse.

April 20, 2019 @ 8:00 am · Filed by Mark Liberman under Computational linguistics

Permalink

6 Comments

BillR said,

April 20, 2019 @ 9:51 am

A typical Dilbert 4-panel scenario. Or maybe an 8-panel Sunday version.
MikeA said,

April 20, 2019 @ 11:09 am

It has been pretty common "chatter" in the nerdosphere for a few years that IBM seems to slapping the "Watson" label on everything in sight, somewhat "gluten free" grapefruit. To a kid with a hammer…

Now, when can I get a Watson(tm) film ribbon for my Selectric?
Kristian said,

April 21, 2019 @ 4:43 am

I remember wondering at the time of the Jeopardy win how exactly winning at Jeopardy was supposed to translate into "revolutionizing health care". Taking a list of symptoms and coming up with a list of differential diagnoses is something almost anyone can do with Google or even a medical textbook. It's not that hard to come up with potential applications for AI in medicine but I could never really see how Watson (apparently an advance mostly in NLP) was applicable to medicine in particular.

I guess that it is not only popular and lucrative to promise advances in medicine but also that medicine is mysterious enough to people that it is hard to evaluate these claims. (Also engineering is mysterious to doctors, so they can't evaluate them either.)
eub said,

April 22, 2019 @ 12:27 am

If anyone wasn't yet around for the 80s, look up "expert systems" for medicine. Similar kinds of hype around different technology (rule-based decision trees for diagnosis basically).
KeithB said,

April 22, 2019 @ 8:40 am

Kristian:
If diagnosis is so easy, why did House last for 8 seasons? 8^)
Kristian said,

April 22, 2019 @ 10:04 am

Diagnosis isn't easy, but looking up diseases that correspond to particular symptoms is. And eight seasons of House probably added significantly to the mystique of medicine in the public's eyes.

RSS feed for comments on this post

Language Log

Ouch

6 Comments

BillR said,

MikeA said,

Kristian said,

eub said,

KeithB said,

Kristian said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta