Language Log

Alan Turing's revenge?

July 5, 2023 @ 3:15 pm · Filed by Mark Liberman under Computational linguistics

Ilia Shumailov et al., "The Curse of Recursion: Training on Generated Data Makes Models Forget", 5/31/2023:

What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs.

Read the rest of this entry »

Permalink Comments (14)

It's impossible to detect LLM-created text

July 5, 2023 @ 8:01 am · Filed by Mark Liberman under Computational linguistics

Last year, I expressed considerable skepticism about the prospects for accurate detection of text generated by Large Language Models ("Detecting LLM-created essays?", 12/20/2022). Since then, many new systems claiming to detect LLM outputs have emerged, notably Turnitin's "AI writing detector".

In a recent post on AI Weirdness ("Don't use AI detectors for anything important", 6/30/2023), Janelle Shane presents multiple examples of multiple kinds of failure, and explains why things are not likely to change.

Read the rest of this entry »

Permalink Comments (3)

My garden path of the day

July 5, 2023 @ 6:30 am · Filed by Mark Liberman under ambiguity, coordination

"Alligator Kills 69-Year-Old Woman in South Carolina", NYT 7/4/2023:

A 69-year-old woman was attacked and killed by an alligator on Tuesday as she was walking her dog in her neighborhood in Hilton Head Island, S.C., the authorities said.

The Beaufort County Sheriff’s Office said it was the second fatal alligator attack in the county in less than a year. […]

Jay Butfiloski, the furbearer and alligator program coordinator with the state’s Natural Resources Department, could not be reached on Tuesday.

Read the rest of this entry »

Permalink Comments (7)

"Communism" in Korean

July 5, 2023 @ 4:49 am · Filed by Victor Mair under Language and politics, Language change

As I have demonstrated here, communism is still very much a thing in North Korea, and apparently under the leadership of Kim Jung Un increasingly more so.

Now, the word for "communism" in the Korean of South Korea is gongsanjuui 공산주의 (共産主義), which simply adopts the Chinese gòngchǎn zhǔyì 共産主義. Since that usage goes against the regime's general principle of replacing words from Chinese characters with native morphemes, it caused me to wonder what the word for "communism" must be in the Korean of North Korea, inasmuch as gongsanjuui 공산주의 (共産主義) is a wholly Sino-Korean term.

Read the rest of this entry »

Permalink Comments (5)

Transitive "blink"

July 4, 2023 @ 6:14 am · Filed by Victor Mair under Grammar, Language and the law, Standard language

Reader Scott Mauldin asks:

I am curious about a unique usage I read in SCOTUS Justice Ketanji Jackson's dissent to the recent cases on affirmative action. She says “This contention blinks both history and reality in ways too numerous to count.” To me, the usage of "blink" as an transitive verb to mean [I assume] something like "ignore" was completely novel. To see what to me is a nonstandard usage show up in a Supreme Court dissent was strange. Is this common usage in some communities, and if so would you or your readers happen to have information on that usage?

Read the rest of this entry »

Permalink Comments (37)

In North Korea, it's a dire crime to speak like a South Korean, part 2

July 3, 2023 @ 10:10 am · Filed by Victor Mair under Accents, Censorship, Language and politics, Language and the law

This is a language war that has been going on for years, and there will never be an end to it, so long as there is a communist North Korea and a democratic South Korea. It is as deadly as a shooting war, because people die for using the language of the enemy. I'm not talking about the content of their speech, but rather its very nature.

North Koreans face execution for using South Korean idioms

The Times (6/30/23)

How does this work out in practice?

North Koreans who use the “obsequious” accent and expressions of South Korea face execution under a harsh new law aimed at eliminating South Korea's growing influence on the language used by its communist neighbour.

Read the rest of this entry »

Permalink Comments (13)

Xi Jinping's faux classicism

July 2, 2023 @ 12:32 pm · Filed by Victor Mair under Grammar, Rhetoric, Style and register

This new article in The Economist (6/29/23) has a familiar ring to it:

To understand Xi Jinping, it helps to be steeped in the classics

China’s leader has invented a phrase—and an image

Take four Chinese characters, all of them in everyday use. Put them in a certain order and, lo, they become a phrase that looks like classical Chinese—the kind of language used by the literati of yore. The idea they convey could be expressed just as succinctly in colloquial Chinese, but the classical style has gravitas. And it is a phrase loved by Xi Jinping, China’s leader, so all must follow suit.

More than any of his predecessors, Mr Xi likes to spice up his speeches with quotations from classical literature, especially poetry and philosophy. It fits one of his stated missions: instilling “cultural self-confidence” (alongside confidence in the political system). And it helps to buff up his image. In Chinese history, rulers were expected to be erudite. Two volumes have been published providing explanations of Mr Xi’s classical aphorisms.

Read the rest of this entry »

Permalink Comments (17)

Antakshari recitation in India

July 1, 2023 @ 8:10 am · Filed by Victor Mair under Language and entertainment, Language and literature, Language play, Memorization

This is part of a long series of Language Log posts in which we pondered the phenomenal memorization skills of persons of Indian heritage (see "Selected readings" below).

So you know what's happening in the following astonishing video, let me begin by giving a basic definition, etymology, and explication of what happens in this intricate word game:

Antakshari, also known as Antyakshari (अंताक्षरी transl. The game of the ending letter) is a spoken parlor game played in India. Each contestant sings the first verse of a song (often Classical Hindustani or Bollywood songs) that begins with the consonant of Hindi alphabet on which the previous contestant's song ended.

The word is derived from two Sanskrit words: antya (अन्त्य) meaning end + akshara (अक्षर) meaning letter of the alphabet. When these words are combined and an '-i' suffixed, the term means "The game of the ending letter". Due to schwa syncope in Hindi and other Indo-Aryan languages, Antyakshari is pronounced antakshri. A dialectical variation of the word is इन्ताक्षरी or intakshri.

Read the rest of this entry »

Permalink Comments (10)

The spiny terminological conundrum of ekhidna and ekhinos

June 30, 2023 @ 2:15 pm · Filed by Victor Mair under Classification, Language and biology

[This is a guest post by Stewart Nicol]

Greek particles

I am a zoologist and comparative physiologist who has worked extensively on the monotremes, the platypus and the echidna. I have been putting together some notes on the naming of the these animals. After originally being placed in the genus Myrmecophaga with the other, totally unrelated, anteaters, the echidna was given the specific name Myrmecophaga aculeata (prickly anteater) by George Shaw in 1792. It was named Echidna histrix by Georges Cuvier, misspelling Hystrix (Greek for porcupine). In 1811 Johann Illiger published an overhaul of the Linnaean system and replaced Cuvier’s genus name Echidna with Tachylossus (fast tongue) making the full binomial Tachyglossus aculeatus. The Genus name Echidna would have had priority but it had previously been applied to a genus of Moray eels, so the echidna became Tachyglossus aculeatus, but popularly known as the echidna. Cuvier doesn’t say why he used the name echidna, but the general assumption is that it alludes to a monster in Greek mythology , ἔχιδνα or ekhidna, half woman (mammal) and half snake (reptile), because the echidna was believed to combine characteristics of reptiles and mammals. Unfortunately, the word ekhidna is very similar to the ekhinos (ἐχῖνος) which is the Ancient Greek word for hedgehog, and appears in the names echinoderm and echinacea because they have spines, giving rise to the misapprehension that the name echidna means spiny.

Read the rest of this entry »

Permalink Comments (3)

The AI threat: keep calm and carry on

June 29, 2023 @ 1:36 pm · Filed by Victor Mair under Artificial intelligence, Language and education

Three weekends ago, I delivered a keynote here:

New Directions in Chinese Language Education in the 21st Century

The Eighth International Conference on Teaching Chinese as a Second Language

Swarthmore College, June 9-10, 2023

———–

Abbreviations:

AI — Artificial Intelligence

DT — Digital Technology

IT — Information Technology

DH — Digital Humanities

AGI — Artificial General Intelligence, where machines supposedly can accomplish any intellectual task that a human can (to me that's a pipe dream)

(given for present and future reference and use)

Title "Aspects of AI and digital technologies in Chinese language teaching"

Abstract

In recent decades, language processing hardware and software have progressed at an astonishing rate, one that is geometric rather than arithmetic. The opportunities these advances offer and the challenges they pose require our thoughtful attention and careful response, lest the machines get out of control and affect our students in detrimental ways. DeepL, ChatGPT, and other constantly evolving technologies possess enormous power to manipulate language, power that we can utilize for the enhancement of Chinese language pedagogy. On the other hand, we must monitor and adapt this potential in such a manner that it fits our purposes and meets the needs of our students.

Read the rest of this entry »

Permalink Comments (3)

Cooperative creation with Generative AI

June 29, 2023 @ 8:59 am · Filed by Mark Liberman under Artificial intelligence, Language and art

A couple of weeks ago, John Hansen tried "an experiment to see if I could successfully combine random and seemingly unconnected topics into one poem", and reported the results on Medium. This experiment was quickly reproduced by Adrian CDTPPW, Block Wife, and Robert G. Longpré.

Read the rest of this entry »

Permalink Comments (1)

Flash sale

June 28, 2023 @ 5:06 pm · Filed by Victor Mair under Idioms, Language and advertising, Language and business, Lost in translation, Writing

Ben Zimmer spotted this interesting street sign in the New York Times photo essay, "DMs from New York City" (June 26, 2023).

Read the rest of this entry »

Permalink Comments (11)

Today I learned a new word

June 28, 2023 @ 12:46 pm · Filed by Mark Liberman under Words words words

The new-to-me word: assembloid.

It occurred in the second (of 20!) bullet points that the blurb for a new publication, Brain Organoid & Systems Neuroscience Journal, lists under the heading

Specific areas of interest include, but are not limited to:

Brain organogenesis and Neuronal cultures
Methods for generating brain assembloids

Read the rest of this entry »

Permalink Comments (8)

Language Log

Alan Turing's revenge?

It's impossible to detect LLM-created text

My garden path of the day

"Communism" in Korean

Transitive "blink"

In North Korea, it's a dire crime to speak like a South Korean, part 2

Xi Jinping's faux classicism

Antakshari recitation in India

The spiny terminological conundrum of ekhidna and ekhinos

The AI threat: keep calm and carry on

Cooperative creation with Generative AI

Flash sale

Today I learned a new word

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta