Language Log

Sleepless in Samsung?

March 7, 2019 @ 6:53 am · Filed by Mark Liberman under Computational linguistics

I'm spending a couple of days at the DARPA AI Colloquium — about which more later — and during yesterday's afternoon session, I experienced an amusing conjunction of events. Pedro Szekeley gave a nice presentation on "Advances in Natural Language Understanding", after which one of the questions from the audience was "Hasn't Google solved all these problems?" Meanwhile, during the session, I got a quasi-spam cell-phone call trying to recruit me for a medical study, and since my (Google Fi) phone was turned off, it went to voicemail, and Google helpfully offered me a text as well as audio version of the call.

The result illustrates one of the key ways that modern technology, including Google's, fails to solve all the problems of natural language understanding.

Here's Google's text version of the voicemail:

Hello, we're contacting you from the University of Pennsylvania's behavioral sleep medicine program about a study for people who have trouble falling asleep or staying asleep. If this is true for you, please visit www.samsung.com. If not, perhaps you could share this information with appropriate friends or family. The website again is w w w. Sleepless in philly.com. If you have any questions, please feel free to contact us by phone at XXX-XXX-XXXX. Thank you for your time.

Since I don't have any problems falling or staying asleep, I wasn't tempted to click on the links. But still, I wondered what Samsung could possibly have to do with it, and so I listened to the audio version (presented below with the phone number noisified):

In this example, Google's speech-to-text accuracy is extremely good in terms of Word Error Rate — depending on how you do the division into "words", it's somewhere in the low single digits.

But as I suspected, the caller was not trying to send me to www.samsung.com to enroll in the study — Google somehow mistranscribed "sleepless in philly" as "samsung".

Pedro joined most of the other DARPA AI speakers in noting that a key failure of contemporary "artificial intelligence" is that it's actually pretty stupid, in the sense of having no common sense — and this failure illustrates the point. Any sentient human inhabitant of the modern world knows that you probably can't enroll in a sleep study at www.samsung.com. Another simple clue that Google missed is provided by the wording "The website again is …", which tells us that the two URLs should not be as wildly different as they are in the transcript.

And finally, while it's good that Google correctly recognized "w w w dot sleepless in philly dot com" in the second rendition of the website name, it's a failure of modern-era common sense not to join that sequence into a coherent URL. The transcription of the first version of the website sends me to an electronics company, and the second one sends me to a local news aggregator — neither of which lets me sign up for the sleep study.

In fact this voicemail transcription was good enough for me to figure out what's going on, even without listening. But if we were using this stuff to populate a knowledge base …

Update — I should add that transcribing "sleepless in philly" as "samsung" is obviously a failure of phonetic analysis as well as a failure of common-sense reasoning. In fact it's apparently a case where an a priori top-down language model (which is our technology's best approximation to the notion of what it would make sense to say) has inappropriately overwhelmed what the system's bottom-up speech analysis components would otherwise prefer.

March 7, 2019 @ 6:53 am · Filed by Mark Liberman under Computational linguistics

Permalink

8 Comments

mistah charley, ph.d. said,

March 7, 2019 @ 8:24 am

Speaking of Samsung, they're discontinuing manufacturing BluRay players – an indication of how content is increasingly being distributed by streaming instead of on physical media. Admittedly, this is only peripherally related to speech recognition and transcription – there is a connection inasmuch as I am under the impression that the speech processing is done centrally, not on your local device.
Michael Watts said,

March 7, 2019 @ 1:15 pm

This reminds me of something I don't like about autocorrect. An ordinarg mistake in text will be something like substituting a letter, or leaving one out, or adding one in. Text entry miistakes are almost always trivial for the reader to correct, and the reader will be cofident that they've understood what the mistake was and what the original intention was.

Autocorrect has none of those features. It involves replacing an entire mistyped word with some other word that is spelled correctly but often only tenuously related to the original. It can be very difficult to determine what the writer meant by a word that autocorrect druids into the sentence.
BZ said,

March 7, 2019 @ 3:56 pm

I assume the problem is that samsung.com and philly.com are both known to Google, whereas sleeplessinphilly.com is probably newly registered. And you cannot really rely on things that sound like a website to be one. There used to be a restaurant in Philly called burger.org. Ithink its website was burgerorg.com (or it may have been .net).
Michael Watts said,

March 7, 2019 @ 4:40 pm

And you cannot really rely on things that sound like a website to be one.

If this were true, the message would be so much nonsense to its intended human recipients too… but it isn't.
Andrew Usher said,

March 7, 2019 @ 10:59 pm

Correct. And the enunciated letters WWW are pretty much a perfect clue that the following will be a web URL, especially if it ends with 'dot com'. Any modern program should be able to get that.

And the speaking is clear enough to indicate that it couldn't possibly be 'Samsung'! I think the thing is that they are now relying exclusively on 'smart' AI, that is 'self-taught' algorithms, which work most of the time but can mess up (when they do) more badly than any sensible algorithm should be able to. This is similar to the catalog of Google Translate fails that have appeared here.

Also, one would think speech recognition would have an [unintelligible] output in preference to wild guessing (overridable, I suppose). Again 'smart' algorithms can't know when they're not working.

k_over_hbarc at yahoo.com
TIC said,

March 12, 2019 @ 7:46 am

@Michael Watts: "… a word that autocorrect druids into the sentence …"

'Druid' as a verb?… Perhaps meaning something along the lines of 'inaccurately or inappropriately substitutes or places'?…

Or maybe a witty play on the phenomenon (which went right over my head)..

Enlighten me, please…
Andrew Usher said,

March 14, 2019 @ 8:59 pm

That passage contained intentional, illustrative, errors. 'Druid' was one of them, intended to illustrate a silly word placed by autocorrect. I assume the 'intended' word was 'drop', mistyped as 'droip', then 'corrected' by the computer to 'druid'. Not any more crazy than some of the autocorrect errors ('Cupertinos') that actually have been recorded, really.

I guess you've confirmed his statement that 'it can be very difficult to determine …'!
TIC said,

March 15, 2019 @ 7:45 am

Yup… Thanks, Andrew, for confirming my latter theory… I'd come to the tentative conclusion, sadly, that it was so… And that I was perhaps the only reader over whose head it went… Or at least that I was the first and only one to admit such confusion…

Interestingly, Michael's extremely well-crafted comment also elegantly illustrates his other point… Your reference to his (multiple) "intentional, illustrative, errors" led me — and not for the first time — to reread his comment… And I'm pretty sure that I obliviously read right through each of his intentional mistypings… And more than once…

Thanks again, Andrew… And kudos to Michael… And shame on me…

RSS feed for comments on this post

Sleepless in Samsung?

8 Comments

mistah charley, ph.d. said,

Michael Watts said,

BZ said,

Michael Watts said,

Andrew Usher said,

TIC said,

Andrew Usher said,

TIC said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta