Archive for Elephant semifics

Image search results

Yesterday my wife challenged me to identify the person in a photo she sent. I decided to cheat, by using Google Image Search — and the results were very strange.

We've posted often about weird AI behavior in Speech-to-Text and Machine Translation and other NLP applications. Image processing has its own litany of weirdness, which is not often a topic here for obvious reasons. But this case does have a linguistic aspect, namely the cited links…

Read the rest of this entry »

Comments (4)

"The inspirations to be more inoperative"

Recently I was doing some background research on Central Auditory Processing Disorder (CAPD), and one of the references that Google Scholar handed me was a Semantic Scholar page for J.A. Willeford and J. Burleigh, "Handbook of central auditory processing disorders in children", 1985, with the following abstract:

The handbook of central auditory processing disorders in children that we provide for you will be ultimate to give preference. This reading book is your chosen book to accompany you when in your free time, in your lonely. This kind of book can help you to heal the lonely and get or add the inspirations to be more inoperative. Yeah, book as the widow of the world can be very inspiring manners. As here, this book is also created by an inspiring author that can make influences of you to do more.

Read the rest of this entry »

Comments (11)

Haunted laptop?

Daniel Sturman writes:

I was looking for information about the composer Sébastien de Brossard, and used Bing’s automatic translation to open each of his seven non-English Wikipedia pages.

The translation from Russian made a rather curious choice that gave the page a Lovecraftian feel, with a tinge of Zalgo:

Read the rest of this entry »

Comments (13)

Pecan, pecan, let's call the whole thing off…

If you ask Google (in various ways) how to pronounce pecan, you'll get suggested additional questions like these:

Read the rest of this entry »

Comments (64)

Google Translate doesn't know Latin

Sean Hannity's new book, Live Free or Die, was released a couple of days ago. The original cover featured a Latin motto, "Vivamus vel libero perit Americae", whose source was apparently Google Translate's version of "Live Free or America Dies":

As Spencer Alexander McDaniel observed, this is gobbledygook — or perhaps we should say "googledygook".

The title of McDaniel's post ("Sean Hannity does not know Latin") is unfair, though probably true, since the cover design was most likely created by the publisher. But a better observation might have been that Google Translate doesn't know Latin. Of course modern machine translation systems don't really "know" any languages, but have just memorized patterns of contextual correspondences. When the available training data is merely a few million words, as in the case of Latin, the results are often bad, as here.

Read the rest of this entry »

Comments (41)

Santa Claus is full

Today's adventure in AI brought yet another robocall, which my Google Assistant intercepted since the calling number (probably spoofed) was not in my contacts list. Here's Google Assistant's rendering of the interaction

Read the rest of this entry »

Comments (2)

Resurrection in Herzliya

News sources around the world reported recently on a tragedy — "Officials: Chinese Ambassador to Israel Found Dead in Home", Associated Press 5/17/2020:

JERUSALEM — The Chinese ambassador to Israel was found dead in his home north of Tel Aviv on Sunday, Israel’s Foreign Ministry said.

Israeli Police Spokesman Micky Rosenfeld said the ambassador's death was believed to be from natural causes.

Du Wei, 58, was appointed envoy in February in the midst of the coronavirus pandemic. He previously served as China’s envoy to Ukraine. He was found dead at the ambassador's official residence in Herzliya.

But if you read it on Israel Hayom (or other Hebrew sources) via Google Translate, you'll get a different picture:

Read the rest of this entry »

Comments (4)

The standard deduckling

Comments (5)

Edwin's re-sonnets

Email today from Edwin Williams:

I constructed "new" sonnets from Shakespeare's sonnets by this formula: from a set of 7 randomly selected Shakespeare sonnets (a…g) I made a new sonnet "a b a b c d c d e f e f g g", which means, the first line is taken from the first line of sonnet a, the second line from sonnet the second line of sonnet b, etc. So no two adjacent lines were from the same sonnet, except the last two. I made 154 of these (same number as S made).

I did it for fun but was startled by the result–the new sonnets were sonnetlike, felt syntactically coherent, and begged for interpretation. People I sent them to were fascinated by them, even when they saw what I had done, or after I told them. One of them, Craig Dworkin, a poet I got to know when he was at Princeton in the 90s, asked to include them in his e-poetry site, and there they sit: http://eclipsearchive.org/projects/EDWIN/.

Read the rest of this entry »

Comments (15)

Mrs. Transformer-XL Tittlemouse

This is another note on the amazing ability of modern AI learning techniques to imitate some aspects of natural-language patterning almost perfectly, while managing to miss common sense almost entirely. This probably tells us something about modern AI and also about language, though we probably won't understand what it's telling us until many years in the future.

Today's example comes from Zihang Da et al., "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context", arXiv 6/2/2019.

Read the rest of this entry »

Comments (5)

AI is brittle

Following up "Shelties On Alki Story Forest" (11/26/2019) and "The right boot of the warner of the baron" (12/6/2019), here's some recent testimony from engineers at Google about the brittleness of contemporary speech-to-text systems: Arun Narayanan et al., "Recognizing Long-Form Speech Using Streaming End-To-End Models", arXiv 10/24/2019.

The goal of that paper is to document some methods for making things better. But I want to underline the fact that considerable headroom remains, even with the massive amounts of training material and computational resources available to a company like Google.

Modern AI (almost) works because of machine learning techniques that find patterns in training data, rather than relying on human programming of explicit rules. A weakness of this approach has always been that generalization to material different in any way from the training set can be unpredictably poor. (Though of course rule- or constraint-based approaches to AI generally never even got off the ground at all.) "End-to-end"  techniques, which eliminate human-defined layers like words, so that speech-to-text systems learn to map directly between sound waveforms and letter strings, are especially brittle.

Read the rest of this entry »

Comments (6)

Kabbalist NLP

Oscar Schwartz, "Natural Language Processing Dates Back to Kabbalist Mystics", IEEE Spectrum 10/28/2019 ("Long before NLP became a hot field in AI, people devised rules and machines to manipulate language"):

The story begins in medieval Spain. In the late 1200s, a Jewish mystic by the name of Abraham Abulafia sat down at a table in his small house in Barcelona, picked up a quill, dipped it in ink, and began combining the letters of the Hebrew alphabet in strange and seemingly random ways. Aleph with Bet, Bet with Gimmel, Gimmel with Aleph and Bet, and so on.

Abulafia called this practice “the science of the combination of letters.” He wasn’t actually combining letters at random; instead he was carefully following a secret set of rules that he had devised while studying an ancient Kabbalistic text called the Sefer Yetsirah. This book describes how God created “all that is formed and all that is spoken” by combining Hebrew letters according to sacred formulas. In one section, God exhausts all possible two-letter combinations of the 22 Hebrew letters.

By studying the Sefer Yetsirah, Abulafia gained the insight that linguistic symbols can be manipulated with formal rules in order to create new, interesting, insightful sentences. To this end, he spent months generating thousands of combinations of the 22 letters of the Hebrew alphabet and eventually emerged with a series of books that he claimed were endowed with prophetic wisdom.

Comments (6)

TO THE CONTRARYGE OF THE AND THENESS

Yiming Wang et al., "Espresso: A fast end-to-end neural speech recognition toolkit", ASRU 2019:

We present ESPRESSO, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESPRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. ESPRESSO achieves state-of-the-art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-to-end systems without data augmentation, and is 4–11× faster for decoding than similar systems (e.g. ESPNET)

Read the rest of this entry »

Comments (13)