## "Unparalleled accuracy" == "Freud as a scrub woman"

A couple of years ago, in connection with the JSALT2017 summer workshop, I tried several commercial speech-to-text APIs on some clinical recordings, with very poor results. Recently I thought I'd try again, to see how things have progressed. After all, there have been recent claims of "human parity" in various speech-to-text applications, and (for example) Google's Cloud Speech-to-Text tells us that it will "Apply the most advanced deep-learning neural network algorithms to audio for speech recognition with unparalleled accuracy", and that "Cloud Speech-to-Text accuracy improves over time as Google improves the internal speech recognition technology used by Google products."

So I picked one of the better-quality recordings of neuropsychological test sessions that we analyzed during that 2017 workshop, and tried a few segments. Executive summary: general human parity in automatic speech-to-text is still a ways off, at least for inputs like these.

Read the rest of this entry »

## Deep learning stumbles again

At least I think that's what happened here. Gita Jackson, "Tumblr's New Algorithm Thinks Garfield Is Explicit Content", Kotaku 12/4/2018:

Yesterday, Tumblr announced that it will ban all adult content starting December 17th. As users logged into their accounts, they have seen that some of their posts now have a red banner across them, marking them as flagged for explicit content. The problem is, a lot of these posts are hilariously far from being pornographic.

It's pretty clear that these flags are being done based on an algorithm, and the algorithm is finding false positives. Here's a list of things that got flagged: a fully clothed woman, a drawing of a dragon, fan-art of of characters from the anime Haikyu!!, art from the children's book The Princess Who Saved Herself that the author of said book posted, a drawing of a bowl of fruit with mouths, a video of abstract blurs, Garfield.

Read the rest of this entry »

## Today's Google Translate poetry

Just checking to see that Google Translate is still into hallucinatory automatic writing

Today's input is five random hiragana characters — あっぉぉを — repeated various numbers of times:

 あっぉぉを Oh yeah 2X I am afraid that 3X We have an Omote 4X We will hold an Om to Oh no 5X We will send out a certain number of employees 6X We will send out a certain number of employees to a certain number of employees 7X We will hold a certain number of employees and one million yen 8X We do not want to be an omen 9X We will transfer a certain amount of money to a certain number of employees … … 13X We did not wish to be a member of the company. Ah

## More Google Translate hallucinations on YouTube

1,237,159 views so far:

[Warning: Loud background music.]

Read the rest of this entry »

## Call it what?

Gráinne Ní Aodha, "German students say English exam that asked them to explain Brexit was unfair", The Journal (Dublin) 5/4/2018:

German students have complained that an English exam that asked them to discuss Brexit, among other things, was too difficult and "unfair".

Over 35,000 people have signed an online petition to voice their opposition to the challenging English paper, saying that the reading comprehensions and current affairs topics were unfair.

Christopher Schuetze, "Thousands of German Students Protest 'Unfair' English Exam", NYT 5/5/2018:

Complaining that your final school exams are too tough is a rite of passage — almost a tradition.

But German students in the southwestern state of Baden-Württemberg who hunkered down in April to take pivotal final secondary-school exams have gone a step further in their protests about the English-language portion of the test, which they said was absurd, with obscure and outdated references.

More coverage e.g. here.

Read the rest of this entry »

## Colossal translation fail at the Boao Forum for Asia

China is currently hosting the Boao Forum for Asia in Hainan, the smallest and southernmost province of the PRC.  The BFA bills itself as the "Asian Davos", after the World Economic Forum held annually in Davos, Switzerland.  The BFA draws representatives from many countries, so naturally they have to provide translation services.  Unfortunately, the machine translation system they used this year failed miserably.  Here are screenshots of a couple of examples:

Read the rest of this entry »

## AI triumph of the week

Posted to twitter by Ariel Waldman, with the comment "tell me again how AI will take over the world":

Read the rest of this entry »

## The architecture of speech

Or maybe it should be the sound pattern of architecture? Anyhow, Ariel Goldberg sends this interesting demonstration of the fact that Google Books still sometimes gets jiggy with its category choices:

Read the rest of this entry »

## AI hallucinations

Tom Simonite, "AI has a hallucination problem that's proving tough to fix", Wired 3/9/2018:

Tech companies are rushing to infuse everything with artificial intelligence, driven by big leaps in the power of machine learning software. But the deep-neural-network software fueling the excitement has a troubling weakness: Making subtle changes to images, text, or audio can fool these systems into perceiving things that aren't there.

Simonite's article is all about "adversarial attacks", where inputs are adjusted iteratively to hill-climb towards an impressively (or subversively) wrong result. But anyone who's been following the "Elephant semifics" topic on this blog knows that for Google's machine translation, at least, spectacular hallucinations can be triggered by shockingly simple inputs: random strings of vowels, the Vietnamese alphabet, repetitions of single hiragana characters, random Thai keyboard banging, etc.

Read the rest of this entry »

## Alexa laughs

Now that speech technology is good enough that voice interaction with devices is becoming widespread and routine, success has created a new problem: How should a device tell when to attend to ambient sounds and try to interpret them as questions or commands?

One solution is to require a mouse click or a finger press to start things off — but this can degrade the whole "ever-attentive servant" experience. So increasingly such systems rely on a key phrase like "Hey Siri" or "OK Google" or "Alexa". But this solution brings up other problems, since users don't like the idea of their entire life's soundtrack streaming to Apple or Google or Amazon. And anyhow, streaming everything to the Mother Ship might strain battery life and network bandwidth for some devices. The answer: Create simple, low-power device-local programs that do nothing but monitor ambient audio for the relevant magic phrase.

Problem: these programs aren't yet very good. Result: lots of false positives. Mostly the false positives are relatively benign — see e.g. "Annals of helpful surveillance", 5/9/2017. But recently, many people have been creeped out by Alexa laughing at them, apparently for no reason:

Read the rest of this entry »

## o ai aaa oa ueui

As ktschwarz pointed out in the comments on yesterday's post "Easy going crazy", Google Translate is disposed to recognize text consisting only of vowels and spaces as Hawaiian, and to hallucinate a coherent if sometimes chilling translation into English.

In order to exercise this option more fully, I wrote and tested a simple R script to generate random messages of this type:

 N = 150
Letters = c("a","e","i","o","u"," ")
cat(sprintf("%s\n",paste0(sample(Letters,N,replace=TRUE),collapse="")))

So for example:

Read the rest of this entry »

## Easy going crazy

Today Josh Tenenbaum gave a talk here in the Interdisciplinary Mind and Brain Seminar Series, under the title "On what you can't learn from (merely) all the data in the world, and what else is needed". One of his themes was that current RNN systems lack common sense, and so in honor of that point, here's another episode in our ongoing Elephant Semifics series. This one is based on repetitions of  0x306C "HIRAGANA LETTER NU", which Google Translate correctly diagnoses as Japanese.

Read the rest of this entry »

## Adversarial attacks on modern speech-to-text

In a post on this blog recently Mark Liberman raised the lively area of so-called "adversarial" attacks for modern machine learning systems. These attacks can do amusing and somewhat frightening things such as force an object recognition algorithm to identify all images as toasters with remarkably high confidence. Seeing these applied to image recognition, he hypothesized they could also be applied to modern speech recognition (STT, or speech-to-text) based on e.g. deep learning. His hypothesis has indeed been recently confirmed.

Read the rest of this entry »