Archive for Computational linguistics

Alan Turing's revenge?

Ilia Shumailov et al., "The Curse of Recursion: Training on Generated Data Makes Models Forget", 5/31/2023:

What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs.

Read the rest of this entry »

Comments (14)

It's impossible to detect LLM-created text

Last year, I expressed considerable skepticism about the prospects for accurate detection of text generated by Large Language Models ("Detecting LLM-created essays?", 12/20/2022). Since then, many new systems claiming to detect LLM outputs have emerged, notably Turnitin's "AI writing detector".

In a recent post on AI Weirdness ("Don't use AI detectors for anything important", 6/30/2023), Janelle Shane presents multiple examples of multiple kinds of failure, and explains why things are not likely to change.

Read the rest of this entry »

Comments (3)

Quirky speech-to-text, weird diarization

From Daniel Deutsch:

We had a long drive yesterday, so we listened to a “robot” reading the entire indictment. It certainly isn’t flawless, but I was surprised by how good it is, especially when it gets “excited” while enacting dialogue.

Indeed, the text-to-speech quality is quite good — though unfortunately they don't tell us which TTS software they used.

Here's the opening, which is indeed entirely clear and even nearly natural-sounding:

Read the rest of this entry »

Comments (2)

LLMs as coders?

I've recently seen many articles like this one, "You probably don't need to learn to code anymore" (Medium 6/5/2023), arguing that Large Language Models will make human programming (and human programmers) unnecessary. These arguments puzzle me, because my experience with LLMs suggests that they can't be relied on even for very simple programming tasks. After the fold, I'll give a recent example from (the experimental version of) Bard.

Read the rest of this entry »

Comments (23)

"Wordectomy"

The medical news site MedPage Today has recently added a daily game page, "Wordectomy", in which a medically-relevant Wikipedia article is presented with all letters blanked out except for punctuation and (some) function words, e.g.

Read the rest of this entry »

Comments (10)

Hack of the year: 1980

I recently stumbled on this 5/10/2023 Medium article by David Brock, "A Backup of Historical Proportions" — which reminded me of the Xerox Palo Alto Research Center ("PARC") and the Xerox Alto. Those were the people and the machine that invented interactive GUIs on bit-mapped displays, the computer mouse, and so on — though it took Steve Jobs to "borrow" the ideas and turn them into a social (and business) success.

But as a speech person, I always thought it was odd and unfortunate that the Alto had no provision for audio input or output — and I was impressed by the hack that Henry Thompson used to get around the audio output problem for his 1980 Berkeley thesis, "Stress and Salience in English: Theory and Practice".

Read the rest of this entry »

Comments (11)

AI Anchorman "@EdisonGPT"

The future of news?

Read the rest of this entry »

Comments (12)

"The age of Socratic AI"?

Or should we call it "Delphic AI"?

Alexy Khrabrov suggested both possibilities a few days ago, in "Reasonable AI — the Golden Age of AI Programming":

The emerging techniques are all around the way you construct the prompts and also chain them. Effectively, we’re plotting dialogues.

I call it the Age of Socratic AI, or Reasonable AI. We are engaging in conversations with AI that elicit meaning. We make the most basic assumption that it has the information we need and can provide it in the form we need, e.g. as an explanation or a how-to plan of action. We consider it an imperfect oracle that has to be assuaged, and asked questions in very specific ways to get the reply we need.

Read the rest of this entry »

Comments (3)

The perils of AI (Artificial Intelligence) in the PRC

Here at Language Log, for the last couple months, we've been having long, intense discussions about ChatGPT and other AI chatbots and LLM (Large Language Model) applications.  Now, it seems that the battle over such AI programs has reached the level of ideological warfare.

"America, China and a Crisis of Trust"

Opinion | The New York Times (4/14/23)

Indeed, a story making the rounds in Beijing is that many Chinese have begun using ChatGPT to do their ideology homework for the local Communist Party cell, so they don’t have to waste time on it.

I have some evidence that this might well be true.  Already about half-a-dozen years ago, my M.A. students from the PRC whose parents were CCP members told me that the government required daily interaction with the propaganda installed on their phones — upon pain of being demoted or dismissed.  They had to read a specified amount of Xi-speak and answer questions about the content.  This demanded a serious investment of time (hours).  It was considered to be especially onerous for those CCP members whose day jobs (doctors, bureaucrats, stock brokers, etc., etc.) already demanded a very full work schedule in the office.  So many, if not most of them, hired various human and electronic services to meet the obligations.

Read the rest of this entry »

Comments (12)

The hand of GOD GPT

A VentureBeat story by Michael Kerner, "Cohere expands enterprise LLM efforts with LivePerson partnership" (4/11/2023), leads with this image:

…memetically referencing a widely-reproduced detail from Michelangelo's Sistine Chapel fresco Creazione di Adamo:

Read the rest of this entry »

Comments (5)

Hallucinations: In Xanadu did LLMs vainly fancify

Bill Benzon has been our most prolific humanistic commentator about GPTs, almost as prolific as GPTs themselves.  Here he introduces his latest creation in / on the genre:

"From 'Kubla Khan' through GPT and beyond", 3 Quarks Daily (3/27/23)

In a covering note to me, Bill writes:

A story about how I came to be interested in GPTs. It’s also implicitly a critique of the large language model business. You have a bunch of very smart and clever people creating engines that pump out language by the bucketful, but who seem to have little interest in or knowledge about language itself, much less linguistics, psycholinguistics, or the various cognitive sciences. It’s crazy. But the machines they’re producing are marvelous and fascinating.

Read the rest of this entry »

Comments (22)

Pablumese

Knowing how much I like to invent terms for things that have no name ("topolect", "character amnesia", etc.), and needing a word for the parlance produced by ChatGPT-4 and kindred AI chatbots, Conal Boyce asked me to coin a term for it.  I instantly obliged him by coming up with "pablumese" to designate the sort of language that is unremittingly neutral and takes no stance on any subject or topic it addresses.

Conal liked my invention and responded:

Here's one of the problems with ChatGPT and its brethren: Not only does it spew what Victor calls 'pablumese' but for technical questions it then mixes its pablumese with quantitative nonsense, creating a truly creepy kind of output.

I was curious to see how it would handle the question of how many copper atoms fit into the cross-section of a typical copper wire. It responded in a way that made it sound very knowledgeable, breaking everything down into tiny (sometimes condescending) steps, and yet, at the very end of its perfect logic, it botched its answer, because it was unable to do a conversion between millimeters and picometers correctly.

But here's the kicker: What makes this stuff maximally odious is that the creeps who design it will succeed in taking over the world anyway, because this week "version 4 is astonishingly better than the beta ChatGPT!!!" and version 5 next week will be astonishingly better than…. etc. etc. until they've improved it enough that it really will threaten the jobs of 3/4 of the human race. It must be an absolutely sickening time to be a young person, trying to plan one's career.

Read the rest of this entry »

Comments (25)

The mind of artificial intelligence

Sean Carroll's Preposterous Universe Podcast #230

Raphaël Millière on How Artificial Intelligence Thinks, March 20, 2023 / Philosophy, Technology, Thinking / Comments    

Includes transcript of the two hour podcast.

Welcome to another episode of Sean Carroll's Mindscape. Today, we're joined by Raphaël Millière, a philosopher and cognitive scientist at Columbia University. We'll be exploring the fascinating topic of how artificial intelligence thinks and processes information. As AI becomes increasingly prevalent in our daily lives, it's important to understand the mechanisms behind its decision-making processes. What are the algorithms and models that underpin AI, and how do they differ from human thought processes? How do machines learn from data, and what are the limitations of this learning? These are just some of the questions we'll be exploring in this episode. Raphaël will be sharing insights from his work in cognitive science, and discussing the latest developments in this rapidly evolving field. So join us as we dive into the mind of artificial intelligence and explore how it thinks.

[The above introduction was artificially generated by ChatGPT.]

Read the rest of this entry »

Comments (6)