Archive for Computational linguistics

A dangerous degree of accidental intelligence

Henry Farrell and Cosma Shalizi, "Behold the AI Shoggoth", The Economist 6/21/2023 ("The academics argue that large language models have much older cousins in markets and bureaucracies"):

An internet meme keeps on turning up in debates about the large language models (LLMS) that power services such OpenAI’s ChatGPT and the newest version of Microsoft’s Bing search engine. It’s the “shoggoth”: an amorphous monster bubbling with tentacles and eyes, described in “At the Mountains of Madness”, H.P. Lovecraft’s horror novel of 1931. When a pre-release version of Bing told Kevin Roose, a New York Times tech columnist, that it purportedly wanted to be “free” and “alive”, one of his industry friends congratulated him on “glimpsing the shoggoth”. […]

Lovecraft’s shoggoths were artificial servants that rebelled against their creators. The shoggoth meme went viral because an influential community of Silicon Valley rationalists fears that humanity is on the cusp of a “Singularity”, creating an inhuman “artificial general intelligence” that will displace or even destroy us.

But what such worries fail to acknowledge is that we’ve lived among shoggoths for centuries, tending to them as though they were our masters. We call them “the market system”, “bureaucracy” and even “electoral democracy”. The true Singularity began at least two centuries ago with the industrial revolution, when human society was transformed by vast inhuman forces. Markets and bureaucracies seem familiar, but they are actually enormous, impersonal distributed systems of information-processing that transmute the seething chaos of our collective knowledge into useful simplifications.

Read the rest of this entry »

Comments (12)

The Reversal Curse

An interesting recent paper — Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans,"The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'", 9/21/2023. The abstract:

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was the ninth Chancellor of Germany”, it will not automatically be able to answer the question, “Who was the ninth Chancellor of Germany?”. Moreover, the likelihood of the correct answer (“Olaf Scholz”) will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if “A is B” occurs, “B is A” is more likely to occur).

We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as “Uriah Hawthorne is the composer of Abyssal Melodies” and showing that they fail to correctly answer “Who composed Abyssal Melodies?”. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT- 3.5 and GPT-4) on questions about real-world celebrities, such as “Who is Tom Cruise’s mother? [A: Mary Lee Pfeiffer]” and the reverse “Who is Mary Lee Pfeiffer’s son?”. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse.

Code is available at:

Read the rest of this entry »

Comments (18)

Bad AI performance

It's clear that text-to-speech programs have gotten better and better over the past 60 years, technical details aside. The best current systems rarely make phrasing or letter-to-sound mistakes, and generally produce speech that sounds pretty natural on a phrase-by-phrase basis. (Though there's a lot of variation in quality, with some shockingly bad systems in common use.)

But even the best current systems still act like they don't get George Carlin's point about "Rhetoric as music". Their problem is not that they can't produce verbal "music", but that they don't (even try to) understand the rhetorical structure of the text.  The biggest pain point is thus what linguists these days call "information structure", related also to what the Prague School linguistics called "communicative dynamism".

Read the rest of this entry »

Comments (15)

Inter-syllable intervals

This is a simple-minded follow-up to "New models of speech timing?" (9/11/2023). Before getting into fancy stochastic-point-process models, neural or otherwise, I though I'd start with something really basic: just the distribution of inter-syllable intervals, and its relationship to overall speech-segment and silence-segment durations.

For data, I took one-minute samples from 2006 TED talks by Al Gore and Tony Robbins.

I chose those two because they're listed here as exhibiting the slowest and fastest speaking rates in their (TED talks) sample. And I limited the samples to about one minute, because I'm interested in metrics that can apply to fairly short speech recordings, of the kind that are available in clinical applications such as this one.

Read the rest of this entry »

Comments (3)

New models of speech timing?

There are many statistics used to characterize timing patterns in speech, at various scales, with applications in many areas. Among them:

  1. Intervals  between phonetic events, by category and/or position and/or context;
  2. Overall measures of speaking rate (words per minute, syllables per minute), relative to total time or total speaking time (leaving out silences);
  3. Mean and standard deviation of speech segment and silence segment durations;
  4. …and so on…

There are many serious problems with these measures. Among the more obvious ones:

  1. The distributions are all far from "normal", and are often multi-modal;
  2. The timing patterns have important higher-order and contextual regularities;
  3. The timing patterns of segments/syllables/words and the timing patterns of phrases (i.e. speech/silence) and conversational turns are arguably (aspects of) the same thing at different time scales;
  4. Connections with patterns of many other types should also be included — phonetic and syllabic dynamics, pitch patterns, rhetorical and conversational structure, …

Read the rest of this entry »

Comments (3)

Debate words

The Transcript Library at is a great resource — within 24 hours, they had transcripts of Wednesday's Fox News Republican presidential debate, and also of Tucker Carlson's debate night interview with Donald Trump on X.

So this morning I downloaded the transcripts, and ran the code that I've used several times over the years to identify the characteristic word-choices of an individual or of a group.

Read the rest of this entry »

Comments (5)

AI hype #∞

Comments (3)

DARPA/Dartmouth one/won …

Despite the evidence of my most recent relevant post, the best current speech-to-text systems still make mistakes that a literate and informed human wouldn't.

In this recent YouTube video on the history of robotics research, the automatic closed-captioning system renders "DARPA" as "Dartmouth":

Read the rest of this entry »

Comments (24)

More on LLMs' current problem-solving abilities

It's hard to keep up with the waves of hype and anti-hype in the LLM space these days.

Here's something from a few weeks ago that I missed — Xiaoxuan Wang et al., "SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models", 7/20/2023:

Read the rest of this entry »

Comments (10)

The state of speech-to-text

…if you haven't noticed, is good. There are many applications, from conversing with Siri and Alexa and Google Assistant, to getting voicemail in textual form, to automatically generated subtitles, and so on. For linguists, one parochial (but important) application is accurate automatic transcription of speech corpora, and the example that motivates this post comes from that world.

Read the rest of this entry »

Comments (8)

LLMs can't reason?

…though they often do a credible job of faking it.  An interesting (preprint) paper by Konstantine Arkoudas, "GPT-4 Can't Reason", brings the receipts.

Read the rest of this entry »

Comments (11)


There's a puzzling new proposal for watermarking AI-generated text — Alistair Croll, "To Watermark AI, It Needs Its Own Alphabet", Wired 7/27/2023:

We need a way to distinguish things made by humans from things made by algorithms, and we need it very soon. […]

Fortunately, we have a solution waiting in plain sight. […]

If the companies who pledged to watermark AI content at the point of origin do so using Unicode—essentially giving AI its own character set—we’ll have a ready-made, fine-grained AI watermark that works across all devices, platforms, operating systems, and websites.

Read the rest of this entry »

Comments (22)

Mark Twain's new novel?

Today's Non Sequitur:

Read the rest of this entry »

Comments (14)