Archive for September, 2023

Share your language

If you can't make up your mind what to do about something, then in French you would say "je suis partagé":  I'm torn or divided over it.  You can't decide what to do about it.  You can't make up your mind whether to be pleased or angry with something.  But the verb "partager" means "to share".  So how do we get from "share" to "torn"?

Etymology tells us that partager is from partage +‎ -er, i.e., Displaced partir in the sense of "to share, to divide", e.g.,
Nous allons partager les bénéficesWe are going to share the benefits

(source)

My attention was drawn (see below) to this subject by the following editorial in today's The Yomiuri Shimbun:

Japanese Language Survey:

As Words Constantly Evolve, Let’s Share Them Across Generations

(9/30/23)

Read the rest of this entry »

Comments (9)

Juridical tautology: "illegal crime"

The news is flooded with stories about Hui Ka Yan 许家印 (MSM Xǔ Jiāyìn), one of China's wealthiest individuals, Chairman and Party Committee secretary of Evergrande Group, the mega real estate corporation that is currently going belly up, being arrested on suspicion of "illegal crimes".  That expression sounded so strange that I had to find out what the Chinese expression was.

Turns out that it is "wéifǎ fànzuì 违法犯罪".  Since this phrase occurs frequently in Chinese texts (221,000,000 ghits), it is a firmly established expression in the common legal lexicon of China.  It is not a slipup.  Furthermore, the English translation "illegal crime" is frequently met in official Chinese media accounts.

Read the rest of this entry »

Comments (13)

Furious sleeping continues

Several people have sent me pointers to the linguistically-themed 9/27/2023 NYT crossword puzzle. For some discussion by Sam Corbin, see "Talk, Talk, Talk", NYT 9/26/2023 ("Scott Koenig puts silly thoughts to bed with a clever crossword"), which includes a quotation from the puzzle's author:

I first learned about Professor Chomsky as an undergraduate linguistics minor. The man has been a public intellectual and an absolute legend in the field for more than seven decades, and still remains active today, earlier this year penning a guest opinion essay contrasting ChatGPT’s approach to language with that of a human. (I’d like to call special attention to the wonderfully clever title of the paper that the essay references.)

[Spoiler alert: a solved version of the puzzle is presented after the fold…]

Read the rest of this entry »

Comments (11)

Kimchee is Korean

Not Chinese.  Do you understand?

This has long been a cabbage of contention, but make no mistake about it:  fermented kimchee / kimchi  (gimchi 김치 (IPA [kim.tɕʰi]) (lit., "soaked [in their own juices of fermentation] vegetables") is not the same thing as pickled paocai / pao tsai 泡菜 (lit., "soaked [in brine] vegetables").

Kimchee and paocai are made differently, have different ingredients and spices, and taste different.  To call "kimchee" "paocai" would be like calling "wine" (pútáojiǔ 葡萄酒) "beer" (píjiǔ 啤酒).

Linguistically, kimchee has its own pedigree, of which I will here give an extended account.

Borrowed from Korean 김치 (gimchi), ultimately composed within Korea of Chinese-derived morphemes (chén, submerged, soaked) and (cài, vegetable), i.e. "fermented vegetable". Doublet of kimuchi.

(Wiktionary)

Read the rest of this entry »

Comments (15)

The Reversal Curse

An interesting recent paper — Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans,"The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'", arXiv.org 9/21/2023. The abstract:

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was the ninth Chancellor of Germany”, it will not automatically be able to answer the question, “Who was the ninth Chancellor of Germany?”. Moreover, the likelihood of the correct answer (“Olaf Scholz”) will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if “A is B” occurs, “B is A” is more likely to occur).

We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as “Uriah Hawthorne is the composer of Abyssal Melodies” and showing that they fail to correctly answer “Who composed Abyssal Melodies?”. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT- 3.5 and GPT-4) on questions about real-world celebrities, such as “Who is Tom Cruise’s mother? [A: Mary Lee Pfeiffer]” and the reverse “Who is Mary Lee Pfeiffer’s son?”. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse.

Code is available at:
https://github.com/lukasberglund/reversal_curse

Read the rest of this entry »

Comments (18)

Corporeal grammar

Recent article in Scientific American:

This Ancient Language Has the Only Grammar Based Entirely on the Human Body

An endangered language family suggests that early humans used their bodies as a model for reality

By Anvita Abbi on June 1, 2023

From just a small handful of Andaman Islanders, the last speakers of their languages, Anvita Abbi was able to piece together what she believes to be the basic principles of their grammar.  What she found was astonishing.  Keep your antennae up and out, however, because her article begins with the much debunked story of how thousands of the indigenous people escaped death when the devastating tsunami of December 26, 2004 struck their islands by relying on the deep, autochthonous knowledge bequeathed by their ancestors, although she does not directly attribute their actions to the grammatical features of their language as many popularizers had done at the time of the disaster (see "Selected readings" below), but rather, more sophisticatedly, to the wisdom transmitted over thousands of generations through their mother tongue.

A language embodies a worldview and, like a civilization, changes and grows in layers. Words or phrases that are frequently used morph into ever more abstract and compressed grammatical forms. For instance, the suffix “-ed,” signifying the past tense in modern English, originated in “did” (that is, “did use” became “used”); Old English's in steed and on gemong became “instead” and “among,” respectively. These kinds of transitions make historical linguistics rather like archaeology. Just as an archaeologist carefully excavates a mound to reveal different epochs of a city-state stacked on one another, so can a linguist separate the layers of a language to uncover the stages of its evolution.

Read the rest of this entry »

Comments (16)

A new Indo-European language

Many LL readers are familiar with the archeological site of Boğazköy-Hattusha in north-central Turkey, which was the capital of the Hittite Empire and the place where the Hittite Royal Archives (17th-13th c. BC) were discovered, making it the oldest historically attested Indo-European language (scattered Hittite words in Akkadian documents stretch back to the 20th c. BC).

"New Indo-European Language Discovered"

Presse- und Öffentlichkeitsarbeit der Uni Würzburg (09/21/2023)

"New Indo-European Language Discovered during Excavation in Turkey." PhysOrg, September 21, 2023

Includes an aerial photograph of the excavation site with the following caption:  "At this excavation site at the foot of Ambarlikaya in Boğazköy-Hattusha in Turkey, a cuneiform tablet with a previously unknown Indo-European language was discovered. (Image: Andreas Schachner / Deutsches Archäologisches Institut)"

Read the rest of this entry »

Comments (16)

A bad thing about social media is also good

Jill Lepore recently presented an illustrative example of how social media amplifies bad stuff ("The World According to Elon Musk's Grandfather", 9/19/2023):

Walter Isaacson’s new biography of Musk […] only glancingly discusses Musk’s grandfather J. N. Haldeman, whom he presents as a risk-taking adventurer and whose politics he dismisses as “quirky.” In fact, Haldeman was a pro-apartheid, antisemitic conspiracy theorist who blamed much of what bothered him about the world on Jewish financiers.

Elon Musk is not responsible for the political opinions of his grandfather, who died when Musk was three years old. But Haldeman’s legacy casts light on what social media does: the reason that most people don’t know about Musk’s grandfather’s political writings is that in his lifetime social media did not exist, and the writings of people like him were not, therefore, amplified by it.

Bu a few days after the publication of Lepore's article, something happened that showed an effect in the opposite direction.

Read the rest of this entry »

Comments (24)

Sweden's renewed emphasis on books and handwriting

Sweden brings more books and handwriting practice back to its tech-heavy schools

Charlene Pele, AP (9/10/23)

Accompanied by 10 photographs showing young children (3rd grade?) practicing handwriting.

As young children went back to school across Sweden last month, many of their teachers were putting a new emphasis on printed books, quiet reading time and handwriting practice and devoting less time to tablets, independent online research and keyboarding skills.

The return to more traditional ways of learning is a response to politicians and experts questioning whether the country's hyper-digitalized approach to education, including the introduction of tablets in nursery schools, had led to a decline in basic skills.

Read the rest of this entry »

Comments (9)

Bad AI performance

It's clear that text-to-speech programs have gotten better and better over the past 60 years, technical details aside. The best current systems rarely make phrasing or letter-to-sound mistakes, and generally produce speech that sounds pretty natural on a phrase-by-phrase basis. (Though there's a lot of variation in quality, with some shockingly bad systems in common use.)

But even the best current systems still act like they don't get George Carlin's point about "Rhetoric as music". Their problem is not that they can't produce verbal "music", but that they don't (even try to) understand the rhetorical structure of the text.  The biggest pain point is thus what linguists these days call "information structure", related also to what the Prague School linguistics called "communicative dynamism".

Read the rest of this entry »

Comments (15)

"Hurting the feelings of the Chinese people", part 3

Shared by John Rohsenow and David Cahill / Isham Cook:

From Arthur Meursault (@emptymeursault)

Read the rest of this entry »

Comments (2)

Annals of AI bias

The Large Language Model DistilBert is "a distilled version of BERT: smaller, faster, cheaper and lighter".

A trained DistilBert model is available from Hugging Face, and recommended applications include "text classification", with the featured application being "sentiment analysis":

And as with many similar applications, it's been noted that this version of "sentiment analysis" has picked up lots of (sometimes unexpected?) biases from its training material, like strong preferences among types of ethnic food.

Read the rest of this entry »

Comments (10)

Heavily accented Mandarin

In "Voice-activated lights" (9/20/23), we saw how difficult it is even for native speakers of Modern Standard Mandarin to understand other varieties, and can be thankful to Zeyao Wu, who comes from the area where the topolect in the film is spoken, for kindly identifying and transcribing it for all of us.

rit malors writes:

You may also want to try how many native speakers of Sinitic languages can identify or understand this speech from the late Head of Macau, Fernando Chui Sai On (Cant. Ceoi1 Sai3 On1; 2009-2019):

Read the rest of this entry »

Comments (14)