It's often observed that current AI systems often generalize confidently to areas far away from anything in their training, where the right answer should be "huh?" This is true even when other available algorithms, often simple ones, could easily diagnose the lack of fit to expectations.

We've seen many amusing examples, which we've filed in the category Elephant Semifics, named for a phrase emerging from one of Google's hallucinatory translations of meaningless repetitions of Japanese or Thai characters, or random strings of ascii vowels. Obviously a human translator would immediately notice the unexpected properties of the inputs — and in fact it's trivial to create algorithms that could screen for such things. Google and its colleagues don't bother, or at least didn't do so in the past, because why should they? Except that in real world applications, noticing that inputs are nonsense is a clue that something has gone wrong, and maybe business-as-usual is not the right response.



Most of the repeated or random character jokes in Google Translate have now been fixed — it's not clear whether this is due to better overall algorithms or to a special front-end check. But something similar remains true in today's best AI speech-to-text algorithms. If you give a human being a sound clip that's not in the language they expect, they'll notice, and tell you about it. "That's not Engish, it's French." Or "I have not idea what language that is, but it's not English." But today's speech-to-text systems just forge ahead, doing the best they can without complaint.

Here's a simple demonstration of what happens when you give a French sentence to some current speech-to-text systems, telling them that it's English:

Le travail n'est pas venu tout seul briser nos vies, comme une catastrophe céleste.

Google: Recovering a pavoni to sell Breezy Novi Community, Catalyst of Celeste.

AWS: Rojava in a parvenu to solve briseno V communicate to staff celeste.

IBM: Recover a new permanent to celebrities interviewed current Kustoff Celeste.

These same systems would do quite well if you told them the input was French. And current language-identification technology is very good. So this is just another example of failing to recognize input outside the boundaries of a system's training.

A recent paper diagnosing one aspect of this problem, and suggesting an algorithm-internal fix, is Augustinus Kristiadi, Matthias Hein, and Philipp Hennig, "Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks", PMLR 2020:

The point estimates of ReLU classification networks—arguably the most widely used neural network architecture—have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is “to be a bit Bayesian”. These theoretical results validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.

