Archive for Elephant semifics

New frontiers in dataset corruption

In a comment on yesterday's "Software testing day" post, ernie in berkeley offered a nice "QA Engineer walks into a bar" joke, and pointed us to its origin in an old xkcd comic "Exploits of a Mom":

…which in turn reminded me of an old problem, discussed in "Excel invents genes", 8/26/2016:

Read the rest of this entry »

Comments (31)

ChatGPT having a stroke?

Or a psychotic episode? ICYMI — Maxwell Zeff, "ChatGPT Went Berserk, Giving Nonsensical Responses All Night", Gizmodo 2/21024:

ChatGPT started throwing out “unexpected responses” on Tuesday night according to OpenAI’s status page. Users posted screenshots of their ChatGPT conversations full of wild, nonsensical answers from the AI chatbot.

Read the rest of this entry »

Comments (12)

Eliza reborn?

Meta is inviting everyone to try out its BlenderBot3:

By releasing the chatbot to the general public, Meta wants to collect feedback on the various problems facing large language models. Users who chat with BlenderBot will be able to flag any suspect responses from the system, and Meta says it’s worked hard to “minimize the bots’ use of vulgar language, slurs, and culturally insensitive comments.” Users will have to opt in to have their data collected, and if so, their conversations and feedback will be stored and later published by Meta to be used by the general AI research community.

So following up on my earlier-reported "Conversations with GPT-3" (6/25/2022), here's BlenderBot3 chatting with a young person interested in philosophy:

Read the rest of this entry »

Comments (14)

COURTHOUHAING TOGET T ROCESS.WHE

HE HAS ALL THE SOU OF COURSE
0:05 AND LOADED, READTOO.K
0:11 TING
0:16 A TVERY CONFIDENT.CONWAY
0:21 COURTHOUHAING TOGET T ROCESS.WHE
0:28 COIDATE'
0:30 TTACUTION'S CATHATE'
0:36 SE.
0:36 CHCEN'T KNHA
0:37 TAER OFURDI

That's the start of the automatically-generated transcript on YouTube for "See George Conway's reaction to Trump's reported plan if he wins again", CNN 7/24/2022.

Read the rest of this entry »

Comments (3)

"Train hard, dream big"

[This is a guest post by Bernhard Riedel]

I stumbled across what was probably a mis-MT in the context of the Olympic Games.  (article in Korean)

"During a foot kick on the way to the gold medal, some hangul became visible. But…"

On the black belt of the athlete from Spain, one can see "기차 하드, 꿈 큰" which is wonderful gibberish. Netizens in Korea were puzzled but also quick to guess an erroneous machine translation.

기차(汽車): (railway) train (definitely *not* related to "to train")
하드: (en:hard, transliterated)
꿈: dream (noun built from the verb 꾸다(to dream) with the nominalizer ㅁ/음)
큰: big (from the verb 크다) in the form used when modifying a noun that follows

Read the rest of this entry »

Comments (1)

English as Afrikaans?

Language-identification from digital text has been a solved problem for many years, so I was surprised yesterday to see Gmail offering to translate from Afrikaans an email written in perfectly idiomatic English, which started this way:

Read the rest of this entry »

Comments (10)

Knowing when you don't know

It's often observed that current AI systems will generalize confidently to areas far away from anything in their training, where the right answer should be "huh?" This is true even when other available algorithms, often simple ones, could easily diagnose the lack of fit to expectations.

We've seen many amusing examples, which we've filed in the category Elephant Semifics, named for a phrase emerging from one of Google's hallucinatory translations of meaningless repetitions of Japanese or Thai characters, or random strings of ascii vowels. Obviously a human translator would immediately notice the unexpected properties of the inputs — and in fact it's trivial to create algorithms that could screen for such things. Google and its colleagues don't bother, or at least didn't do so in the past, because why should they? Except that in real world applications, noticing that inputs are nonsense is a clue that something has gone wrong, and maybe business-as-usual is not the right response.

Read the rest of this entry »

Comments (16)

Maltese email ARC

Yesterday I got a strange email message, apparently from American Express. The first strange thing: gmail showed it with no Subject and no content:

But then it got stranger…

Read the rest of this entry »

Comments (7)

Covered. Nineteen. At pain medicine

Google Fi screens my calls, so that my phone doesn't even ring unless the caller is in my contacts, or passes some kind of quasi-Turing Test. This is a Good Thing, since I get half a dozen spam calls a day, often at inconvenient times. As a result, robocalls generally end up as voicemail, which Google Fi helpfully turns into a convenient text message — which is often amusing. For example, a couple of days before my second vaccine shot last month, a robocall from Penn Medicine got transcribed like this:

Hello, this is pain medicine reaching out to you regarding covered. Nineteen. We've implemented a short sentence screening survey before coming into your appointment. All pain medicine patients are being asked to complete this brief electronic symptom checker to answer the questions, please call 215-NNN-NNNN. If your appointment has been canceled or rescheduled, please disregard this message patients and visitors. I'm presenting two pain medicine locations for inpatient outpatient or emergency department care should be wearing a cloth face covering in accordance with current CDC and state guidance. Thank you.

[Callback number obscured]

Read the rest of this entry »

Comments (7)

Advances in topic modeling

In the middle to late 1990s, "Topic Detection and Tracking" was an active research area (see also this). And by the early 2000s, the technology was good enough to support the creation of Google News. Twenty years later, these and other innovations have transformed the mass media, for good or ill. I don't know what algorithms the AI in charge of Topic Modeling at Google News is using these days, but I'm happy to see it developing a sense of humor:

Read the rest of this entry »

Comments (21)

Articulate Tory gestures

At our most recent Penn Phonetic Lab meeting, we heard a (virtual) talk by Marc Garellek on the topic "Reconsidering voicing during glottal sounds". The talk was quite interesting, but more relevant for a general audience was what happened when someone turned on Zoom's "Live Transcription" feature:

Read the rest of this entry »

Comments (13)

Image search results

Yesterday my wife challenged me to identify the person in a photo she sent. I decided to cheat, by using Google Image Search — and the results were very strange.

We've posted often about weird AI behavior in Speech-to-Text and Machine Translation and other NLP applications. Image processing has its own litany of weirdness, which is not often a topic here for obvious reasons. But this case does have a linguistic aspect, namely the cited links…

Read the rest of this entry »

Comments (4)

"The inspirations to be more inoperative"

Recently I was doing some background research on Central Auditory Processing Disorder (CAPD), and one of the references that Google Scholar handed me was a Semantic Scholar page for J.A. Willeford and J. Burleigh, "Handbook of central auditory processing disorders in children", 1985, with the following abstract:

The handbook of central auditory processing disorders in children that we provide for you will be ultimate to give preference. This reading book is your chosen book to accompany you when in your free time, in your lonely. This kind of book can help you to heal the lonely and get or add the inspirations to be more inoperative. Yeah, book as the widow of the world can be very inspiring manners. As here, this book is also created by an inspiring author that can make influences of you to do more.

Read the rest of this entry »

Comments (11)