Archive for Elephant semifics

"Train hard, dream big"

[This is a guest post by Bernhard Riedel]

I stumbled across what was probably a mis-MT in the context of the Olympic Games.  (article in Korean)

"During a foot kick on the way to the gold medal, some hangul became visible. But…"

On the black belt of the athlete from Spain, one can see "기차 하드, 꿈 큰" which is wonderful gibberish. Netizens in Korea were puzzled but also quick to guess an erroneous machine translation.

기차(汽車): (railway) train (definitely *not* related to "to train")
하드: (en:hard, transliterated)
꿈: dream (noun built from the verb 꾸다(to dream) with the nominalizer ㅁ/음)
큰: big (from the verb 크다) in the form used when modifying a noun that follows

Read the rest of this entry »

Comments (1)

English as Afrikaans?

Language-identification from digital text has been a solved problem for many years, so I was surprised yesterday to see Gmail offering to translate from Afrikaans an email written in perfectly idiomatic English, which started this way:

Read the rest of this entry »

Comments (10)

Knowing when you don't know

It's often observed that current AI systems will generalize confidently to areas far away from anything in their training, where the right answer should be "huh?" This is true even when other available algorithms, often simple ones, could easily diagnose the lack of fit to expectations.

We've seen many amusing examples, which we've filed in the category Elephant Semifics, named for a phrase emerging from one of Google's hallucinatory translations of meaningless repetitions of Japanese or Thai characters, or random strings of ascii vowels. Obviously a human translator would immediately notice the unexpected properties of the inputs — and in fact it's trivial to create algorithms that could screen for such things. Google and its colleagues don't bother, or at least didn't do so in the past, because why should they? Except that in real world applications, noticing that inputs are nonsense is a clue that something has gone wrong, and maybe business-as-usual is not the right response.

Read the rest of this entry »

Comments (16)

Maltese email ARC

Yesterday I got a strange email message, apparently from American Express. The first strange thing: gmail showed it with no Subject and no content:

But then it got stranger…

Read the rest of this entry »

Comments (7)

Covered. Nineteen. At pain medicine

Google Fi screens my calls, so that my phone doesn't even ring unless the caller is in my contacts, or passes some kind of quasi-Turing Test. This is a Good Thing, since I get half a dozen spam calls a day, often at inconvenient times. As a result, robocalls generally end up as voicemail, which Google Fi helpfully turns into a convenient text message — which is often amusing. For example, a couple of days before my second vaccine shot last month, a robocall from Penn Medicine got transcribed like this:

Hello, this is pain medicine reaching out to you regarding covered. Nineteen. We've implemented a short sentence screening survey before coming into your appointment. All pain medicine patients are being asked to complete this brief electronic symptom checker to answer the questions, please call 215-NNN-NNNN. If your appointment has been canceled or rescheduled, please disregard this message patients and visitors. I'm presenting two pain medicine locations for inpatient outpatient or emergency department care should be wearing a cloth face covering in accordance with current CDC and state guidance. Thank you.

[Callback number obscured]

Read the rest of this entry »

Comments (7)

Advances in topic modeling

In the middle to late 1990s, "Topic Detection and Tracking" was an active research area (see also this). And by the early 2000s, the technology was good enough to support the creation of Google News. Twenty years later, these and other innovations have transformed the mass media, for good or ill. I don't know what algorithms the AI in charge of Topic Modeling at Google News is using these days, but I'm happy to see it developing a sense of humor:

Read the rest of this entry »

Comments (21)

Articulate Tory gestures

At our most recent Penn Phonetic Lab meeting, we heard a (virtual) talk by Marc Garellek on the topic "Reconsidering voicing during glottal sounds". The talk was quite interesting, but more relevant for a general audience was what happened when someone turned on Zoom's "Live Transcription" feature:

Read the rest of this entry »

Comments (13)

Image search results

Yesterday my wife challenged me to identify the person in a photo she sent. I decided to cheat, by using Google Image Search — and the results were very strange.

We've posted often about weird AI behavior in Speech-to-Text and Machine Translation and other NLP applications. Image processing has its own litany of weirdness, which is not often a topic here for obvious reasons. But this case does have a linguistic aspect, namely the cited links…

Read the rest of this entry »

Comments (4)

"The inspirations to be more inoperative"

Recently I was doing some background research on Central Auditory Processing Disorder (CAPD), and one of the references that Google Scholar handed me was a Semantic Scholar page for J.A. Willeford and J. Burleigh, "Handbook of central auditory processing disorders in children", 1985, with the following abstract:

The handbook of central auditory processing disorders in children that we provide for you will be ultimate to give preference. This reading book is your chosen book to accompany you when in your free time, in your lonely. This kind of book can help you to heal the lonely and get or add the inspirations to be more inoperative. Yeah, book as the widow of the world can be very inspiring manners. As here, this book is also created by an inspiring author that can make influences of you to do more.

Read the rest of this entry »

Comments (11)

Haunted laptop?

Daniel Sturman writes:

I was looking for information about the composer Sébastien de Brossard, and used Bing’s automatic translation to open each of his seven non-English Wikipedia pages.

The translation from Russian made a rather curious choice that gave the page a Lovecraftian feel, with a tinge of Zalgo:

Read the rest of this entry »

Comments (13)

Pecan, pecan, let's call the whole thing off…

If you ask Google (in various ways) how to pronounce pecan, you'll get suggested additional questions like these:

Read the rest of this entry »

Comments (64)

Google Translate doesn't know Latin

Sean Hannity's new book, Live Free or Die, was released a couple of days ago. The original cover featured a Latin motto, "Vivamus vel libero perit Americae", whose source was apparently Google Translate's version of "Live Free or America Dies":

As Spencer Alexander McDaniel observed, this is gobbledygook — or perhaps we should say "googledygook".

The title of McDaniel's post ("Sean Hannity does not know Latin") is unfair, though probably true, since the cover design was most likely created by the publisher. But a better observation might have been that Google Translate doesn't know Latin. Of course modern machine translation systems don't really "know" any languages, but have just memorized patterns of contextual correspondences. When the available training data is merely a few million words, as in the case of Latin, the results are often bad, as here.

Read the rest of this entry »

Comments (41)

Santa Claus is full

Today's adventure in AI brought yet another robocall, which my Google Assistant intercepted since the calling number (probably spoofed) was not in my contacts list. Here's Google Assistant's rendering of the interaction

Read the rest of this entry »

Comments (2)