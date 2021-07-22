« previous post |

Language-identification from digital text has been a solved problem for many years, so I was surprised yesterday to see Gmail offering to translate from Afrikaans an email written in perfectly idiomatic English:

The body of the email had a few acronyms and Israeli place-names, and it did mention "Hebrew", but those features don't help solve the mystery of why Gmail assigned it to the category of Afrikaans. My guess is that it's one of the (essentially inexplicable) vagaries of modern deep-learning technology, but who knows?

