No(t/n)

That's bù 不, plus = a-, il-, im-, in-, ir-, un-, non- prefixes in English.

It can enter into Mandarin contractions, such as 不 ("not") + yòng 用 ("use") = béng ("needn't), and the two Sinoglyphs used to write the constituent morphosyllables can fuse to become béng 甭 ("needn't).

Here's a whole slew of such fusion words and contraction characters:

Included among them are whimsical items such as one composed of bù 不 ("not") above and lǎo 老 ("old") below (= xiān 仙 ["ageless; immortal; transcendent"]), also another fairly well established one with bù 不 ("not") above and 好 ("good") below (= huài 壞 and other words / glyphs meaning "bad; evil; spoiled", etc.) — see if you can spot them. 

Read the rest of this entry »

Comments (4)


Hyper-inclusive Speaker-exclusive we

Yesterday evening in a restaurant, our attentive server frequently asked us things like "Are we ready to order" and "How are we doing?". This waiter-we is pretty common, so I didn't notice it, though one of the other diners did. But when another server brought us a complimentary bit of sushi with the explanation "Here's some unagi for us", that was striking enough to prompt a bit of discussion. Among the three of us at the table, I thought that the we uses were normal but the "for us" was unexpected; another one of us saw all examples of waiter-we as weird and annoying; and the third, a native speaker of Russian, said that in Russian it's called (in translation) the "mom we".

Read the rest of this entry »

Comments (59)


Language, topolect, dialect, idiolect

An educated person will have all four levels of speech.

The more highly educated they are, the higher up the scale their language capacity will go, though they may not be familiar with some of the argot of the lower levels.

Of course, all four levels are language, but that is possible because "language" has two meanings:  a generalized, abstract sense that comprises all human speech and writing, and the officially recognized speech and writing of a nation / country / gens — a politically united group of people.

A topolect is the speech / writing of the people living in a certain place or area.  It is geographically determined.

A dialect is a distinctive form / style / pronunciation / accent shared by two or more people.  To qualify as the speaker of a particular dialect, one must possess a pattern of speech, a lect, that is intelligible to others who speak the same dialect.  As we say in Mandarin, it's a question of whether what you speak is jiǎng dé tōng 講得通 ("mutually intelligible") or jiǎng bùtōng 講不通 ("mutually unintelligible").  If what two people are speaking is jiǎng bùtōng 講不通 ("mutually unintelligible"), then they're not speaking the same dialect.

Read the rest of this entry »

Comments (22)


Mental anguish from having too many English words in Japanese

One thing I revel in about the English language is the huge number of loanwords it has:  French, Latin, Greek, Native American, Arabic, Persian, Turkish, Kurdish, Sanskrit, Hindi, Urdu, Bengali, Tamil, Russian, German, Spanish, Italian, Irish, Swedish, Dutch, Danish, Norwegian, Finnish, Japanese, Cantonese, Mandarin, Maori, Hebrew, Yiddish, Afrikaans, Zulu, Swahili, and so on and on and on.  English has words from more than 350 languages, and they amount to 80% of our total vocabulary. (source)  Not to worry, however, that English will lose its innate identity, since around 70 % of words in a typical text derive from Old English. (source)

I've also long admired Japanese for its rich assemblage of foreign words, perhaps next to English in having the largest proportion of borrowings.  That's quite the opposite of written Sinitic, which has relatively few recognizable foreign words for a major language.  I attribute the difference to Japan having the easy ability to borrow words phonetically via kana and rōmaji ローマ字 ("Roman letters"), whereas the morphosyllabic Sinoglyphic script has not yet developed an officially sanctioned standard for transcribing loanwords directly into Chinese texts.  Informally (on the internet, in private correspondence, etc.), however, writing in China is gradually moving toward a digraphia of Sinoglyphs and the Roman alphabet.  (See the second part of "Selected readings" below.)

Read the rest of this entry »

Comments (24)


A dangerous degree of accidental intelligence

Henry Farrell and Cosma Shalizi, "Behold the AI Shoggoth", The Economist 6/21/2023 ("The academics argue that large language models have much older cousins in markets and bureaucracies"):

An internet meme keeps on turning up in debates about the large language models (LLMS) that power services such OpenAI’s ChatGPT and the newest version of Microsoft’s Bing search engine. It’s the “shoggoth”: an amorphous monster bubbling with tentacles and eyes, described in “At the Mountains of Madness”, H.P. Lovecraft’s horror novel of 1931. When a pre-release version of Bing told Kevin Roose, a New York Times tech columnist, that it purportedly wanted to be “free” and “alive”, one of his industry friends congratulated him on “glimpsing the shoggoth”. […]

Lovecraft’s shoggoths were artificial servants that rebelled against their creators. The shoggoth meme went viral because an influential community of Silicon Valley rationalists fears that humanity is on the cusp of a “Singularity”, creating an inhuman “artificial general intelligence” that will displace or even destroy us.

But what such worries fail to acknowledge is that we’ve lived among shoggoths for centuries, tending to them as though they were our masters. We call them “the market system”, “bureaucracy” and even “electoral democracy”. The true Singularity began at least two centuries ago with the industrial revolution, when human society was transformed by vast inhuman forces. Markets and bureaucracies seem familiar, but they are actually enormous, impersonal distributed systems of information-processing that transmute the seething chaos of our collective knowledge into useful simplifications.

Read the rest of this entry »

Comments (12)


Share your language

If you can't make up your mind what to do about something, then in French you would say "je suis partagé":  I'm torn or divided over it.  You can't decide what to do about it.  You can't make up your mind whether to be pleased or angry with something.  But the verb "partager" means "to share".  So how do we get from "share" to "torn"?

Etymology tells us that partager is from partage +‎ -er, i.e., Displaced partir in the sense of "to share, to divide", e.g.,
Nous allons partager les bénéficesWe are going to share the benefits

(source)

My attention was drawn (see below) to this subject by the following editorial in today's The Yomiuri Shimbun:

Japanese Language Survey:

As Words Constantly Evolve, Let’s Share Them Across Generations

(9/30/23)

Read the rest of this entry »

Comments (9)


Juridical tautology: "illegal crime"

The news is flooded with stories about Hui Ka Yan 许家印 (MSM Xǔ Jiāyìn), one of China's wealthiest individuals, Chairman and Party Committee secretary of Evergrande Group, the mega real estate corporation that is currently going belly up, being arrested on suspicion of "illegal crimes".  That expression sounded so strange that I had to find out what the Chinese expression was.

Turns out that it is "wéifǎ fànzuì 违法犯罪".  Since this phrase occurs frequently in Chinese texts (221,000,000 ghits), it is a firmly established expression in the common legal lexicon of China.  It is not a slipup.  Furthermore, the English translation "illegal crime" is frequently met in official Chinese media accounts.

Read the rest of this entry »

Comments (13)


Furious sleeping continues

Several people have sent me pointers to the linguistically-themed 9/27/2023 NYT crossword puzzle. For some discussion by Sam Corbin, see "Talk, Talk, Talk", NYT 9/26/2023 ("Scott Koenig puts silly thoughts to bed with a clever crossword"), which includes a quotation from the puzzle's author:

I first learned about Professor Chomsky as an undergraduate linguistics minor. The man has been a public intellectual and an absolute legend in the field for more than seven decades, and still remains active today, earlier this year penning a guest opinion essay contrasting ChatGPT’s approach to language with that of a human. (I’d like to call special attention to the wonderfully clever title of the paper that the essay references.)

[Spoiler alert: a solved version of the puzzle is presented after the fold…]

Read the rest of this entry »

Comments (11)


Kimchee is Korean

Not Chinese.  Do you understand?

This has long been a cabbage of contention, but make no mistake about it:  fermented kimchee / kimchi  (gimchi 김치 (IPA [kim.tɕʰi]) (lit., "soaked [in their own juices of fermentation] vegetables") is not the same thing as pickled paocai / pao tsai 泡菜 (lit., "soaked [in brine] vegetables").

Kimchee and paocai are made differently, have different ingredients and spices, and taste different.  To call "kimchee" "paocai" would be like calling "wine" (pútáojiǔ 葡萄酒) "beer" (píjiǔ 啤酒).

Linguistically, kimchee has its own pedigree, of which I will here give an extended account.

Borrowed from Korean 김치 (gimchi), ultimately composed within Korea of Chinese-derived morphemes (chén, submerged, soaked) and (cài, vegetable), i.e. "fermented vegetable". Doublet of kimuchi.

(Wiktionary)

Read the rest of this entry »

Comments (15)


The Reversal Curse

An interesting recent paper — Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans,"The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'", arXiv.org 9/21/2023. The abstract:

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was the ninth Chancellor of Germany”, it will not automatically be able to answer the question, “Who was the ninth Chancellor of Germany?”. Moreover, the likelihood of the correct answer (“Olaf Scholz”) will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if “A is B” occurs, “B is A” is more likely to occur).

We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as “Uriah Hawthorne is the composer of Abyssal Melodies” and showing that they fail to correctly answer “Who composed Abyssal Melodies?”. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT- 3.5 and GPT-4) on questions about real-world celebrities, such as “Who is Tom Cruise’s mother? [A: Mary Lee Pfeiffer]” and the reverse “Who is Mary Lee Pfeiffer’s son?”. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse.

Code is available at:
https://github.com/lukasberglund/reversal_curse

Read the rest of this entry »

Comments (18)


Corporeal grammar

Recent article in Scientific American:

This Ancient Language Has the Only Grammar Based Entirely on the Human Body

An endangered language family suggests that early humans used their bodies as a model for reality

By Anvita Abbi on June 1, 2023

From just a small handful of Andaman Islanders, the last speakers of their languages, Anvita Abbi was able to piece together what she believes to be the basic principles of their grammar.  What she found was astonishing.  Keep your antennae up and out, however, because her article begins with the much debunked story of how thousands of the indigenous people escaped death when the devastating tsunami of December 26, 2004 struck their islands by relying on the deep, autochthonous knowledge bequeathed by their ancestors, although she does not directly attribute their actions to the grammatical features of their language as many popularizers had done at the time of the disaster (see "Selected readings" below), but rather, more sophisticatedly, to the wisdom transmitted over thousands of generations through their mother tongue.

A language embodies a worldview and, like a civilization, changes and grows in layers. Words or phrases that are frequently used morph into ever more abstract and compressed grammatical forms. For instance, the suffix “-ed,” signifying the past tense in modern English, originated in “did” (that is, “did use” became “used”); Old English's in steed and on gemong became “instead” and “among,” respectively. These kinds of transitions make historical linguistics rather like archaeology. Just as an archaeologist carefully excavates a mound to reveal different epochs of a city-state stacked on one another, so can a linguist separate the layers of a language to uncover the stages of its evolution.

Read the rest of this entry »

Comments (16)


A new Indo-European language

Many LL readers are familiar with the archeological site of Boğazköy-Hattusha in north-central Turkey, which was the capital of the Hittite Empire and the place where the Hittite Royal Archives (17th-13th c. BC) were discovered, making it the oldest historically attested Indo-European language (scattered Hittite words in Akkadian documents stretch back to the 20th c. BC).

"New Indo-European Language Discovered"

Presse- und Öffentlichkeitsarbeit der Uni Würzburg (09/21/2023)

"New Indo-European Language Discovered during Excavation in Turkey." PhysOrg, September 21, 2023

Includes an aerial photograph of the excavation site with the following caption:  "At this excavation site at the foot of Ambarlikaya in Boğazköy-Hattusha in Turkey, a cuneiform tablet with a previously unknown Indo-European language was discovered. (Image: Andreas Schachner / Deutsches Archäologisches Institut)"

Read the rest of this entry »

Comments (16)


A bad thing about social media is also good

Jill Lepore recently presented an illustrative example of how social media amplifies bad stuff ("The World According to Elon Musk's Grandfather", 9/19/2023):

Walter Isaacson’s new biography of Musk […] only glancingly discusses Musk’s grandfather J. N. Haldeman, whom he presents as a risk-taking adventurer and whose politics he dismisses as “quirky.” In fact, Haldeman was a pro-apartheid, antisemitic conspiracy theorist who blamed much of what bothered him about the world on Jewish financiers.

Elon Musk is not responsible for the political opinions of his grandfather, who died when Musk was three years old. But Haldeman’s legacy casts light on what social media does: the reason that most people don’t know about Musk’s grandfather’s political writings is that in his lifetime social media did not exist, and the writings of people like him were not, therefore, amplified by it.

Bu a few days after the publication of Lepore's article, something happened that showed an effect in the opposite direction.

Read the rest of this entry »

Comments (24)