Occitan and Oenology
[This is a guest post by François Lang]
Read the rest of this entry »
[This is a guest post by François Lang]
Read the rest of this entry »
"The Nobel literature prize goes to Norway’s Jon Fosse, who once wrote a novel in a single sentence"
…
While Fosse is the fourth Norwegian writer to get the Nobel literature prize, he is the first in nearly a century and the first who writes in Nynorsk, one of the two official written versions of the Norwegian language. It is used by just 10% of the country’s 5.4 million people, according to the Language Council of Norway, but completely understandable to users of the other written form, Bokmaal.
Guy Puzey, senior lecturer in Scandinavian Studies at the University of Edinburgh, said that Bokmaal is “the language of power, it’s the language of urban centers, of the press.” Nynorsk, by contrast, is used mainly by people in rural western Norway.
“So it’s a really big day for a minority language,” Puzey said
Read the rest of this entry »
As I explained here in February of this year:
One time on an expedition around the western part of the Taklamakan Desert in the center of Asia more than a decade ago, the Chinese driver played Lady Gaga's "Poker Face" scores of times. He had other discs, but he only played that song, and he played it over and over and over again. I liked it the first 10-15 times I heard it, but after that it started to drive me insane, and finally I had to tell him to stop. He was not happy. Then, a few hours later or the next day, he would launch the Lady Gaga "Poker Face" litany all over again.
(slightly modified)
There was one phrase that Lady Gaga repeated more than a dozen times (actually twenty), and I had no idea what she was saying. I listened as hard as I could, but the best I could make out was "Keereezmy, keereezmy", though sometimes I thought it was "Kill his mind, kill his mind".
Since that time, I've probably heard the same song another thirty or forty times, and the line in question still sounds like "Keereezmy, keereezmy" or "Kill his mind, kill his mind".
Read the rest of this entry »
That's bù 不, plus = a-, il-, im-, in-, ir-, un-, non- prefixes in English.
It can enter into Mandarin contractions, such as bù 不 ("not") + yòng 用 ("use") = béng ("needn't), and the two Sinoglyphs used to write the constituent morphosyllables can fuse to become béng 甭 ("needn't).
Here's a whole slew of such fusion words and contraction characters:
Ha, I've long been wanting to make a tweet about all those fantastic character combinations with 不: 甭、孬、歪、覔、 丕、奀… And now @edwardW2 dropped me these *amazing* dict. pages (from 海篇心鏡) with tons of those including funky ones like ⿳不成當 and ⿱不⿰安人! 😁 https://t.co/Va7JC3P1Js pic.twitter.com/y6ZeO0PR6W
— Egas Moniz-Bandeira ᠡᡤᠠᠰ ᠮᠣᠨᠢᠰ ᠪᠠᠨᡩ᠋ᠠᠶᠢᠷᠠ (@egasmb) September 30, 2023
Included among them are whimsical items such as one composed of bù 不 ("not") above and lǎo 老 ("old") below (= xiān 仙 ["ageless; immortal; transcendent"]), also another fairly well established one with bù 不 ("not") above and 好 ("good") below (= huài 壞 and other words / glyphs meaning "bad; evil; spoiled", etc.) — see if you can spot them.
Read the rest of this entry »
Yesterday evening in a restaurant, our attentive server frequently asked us things like "Are we ready to order" and "How are we doing?". This waiter-we is pretty common, so I didn't notice it, though one of the other diners did. But when another server brought us a complimentary bit of sushi with the explanation "Here's some unagi for us", that was striking enough to prompt a bit of discussion. Among the three of us at the table, I thought that the we uses were normal but the "for us" was unexpected; another one of us saw all examples of waiter-we as weird and annoying; and the third, a native speaker of Russian, said that in Russian it's called (in translation) the "mom we".
Read the rest of this entry »
An educated person will have all four levels of speech.
The more highly educated they are, the higher up the scale their language capacity will go, though they may not be familiar with some of the argot of the lower levels.
Of course, all four levels are language, but that is possible because "language" has two meanings: a generalized, abstract sense that comprises all human speech and writing, and the officially recognized speech and writing of a nation / country / gens — a politically united group of people.
A topolect is the speech / writing of the people living in a certain place or area. It is geographically determined.
A dialect is a distinctive form / style / pronunciation / accent shared by two or more people. To qualify as the speaker of a particular dialect, one must possess a pattern of speech, a lect, that is intelligible to others who speak the same dialect. As we say in Mandarin, it's a question of whether what you speak is jiǎng dé tōng 講得通 ("mutually intelligible") or jiǎng bùtōng 講不通 ("mutually unintelligible"). If what two people are speaking is jiǎng bùtōng 講不通 ("mutually unintelligible"), then they're not speaking the same dialect.
Read the rest of this entry »
One thing I revel in about the English language is the huge number of loanwords it has: French, Latin, Greek, Native American, Arabic, Persian, Turkish, Kurdish, Sanskrit, Hindi, Urdu, Bengali, Tamil, Russian, German, Spanish, Italian, Irish, Swedish, Dutch, Danish, Norwegian, Finnish, Japanese, Cantonese, Mandarin, Maori, Hebrew, Yiddish, Afrikaans, Zulu, Swahili, and so on and on and on. English has words from more than 350 languages, and they amount to 80% of our total vocabulary. (source) Not to worry, however, that English will lose its innate identity, since around 70 % of words in a typical text derive from Old English. (source)
I've also long admired Japanese for its rich assemblage of foreign words, perhaps next to English in having the largest proportion of borrowings. That's quite the opposite of written Sinitic, which has relatively few recognizable foreign words for a major language. I attribute the difference to Japan having the easy ability to borrow words phonetically via kana and rōmaji ローマ字 ("Roman letters"), whereas the morphosyllabic Sinoglyphic script has not yet developed an officially sanctioned standard for transcribing loanwords directly into Chinese texts. Informally (on the internet, in private correspondence, etc.), however, writing in China is gradually moving toward a digraphia of Sinoglyphs and the Roman alphabet. (See the second part of "Selected readings" below.)
Read the rest of this entry »
Henry Farrell and Cosma Shalizi, "Behold the AI Shoggoth", The Economist 6/21/2023 ("The academics argue that large language models have much older cousins in markets and bureaucracies"):
An internet meme keeps on turning up in debates about the large language models (LLMS) that power services such OpenAI’s ChatGPT and the newest version of Microsoft’s Bing search engine. It’s the “shoggoth”: an amorphous monster bubbling with tentacles and eyes, described in “At the Mountains of Madness”, H.P. Lovecraft’s horror novel of 1931. When a pre-release version of Bing told Kevin Roose, a New York Times tech columnist, that it purportedly wanted to be “free” and “alive”, one of his industry friends congratulated him on “glimpsing the shoggoth”. […]
Lovecraft’s shoggoths were artificial servants that rebelled against their creators. The shoggoth meme went viral because an influential community of Silicon Valley rationalists fears that humanity is on the cusp of a “Singularity”, creating an inhuman “artificial general intelligence” that will displace or even destroy us.
But what such worries fail to acknowledge is that we’ve lived among shoggoths for centuries, tending to them as though they were our masters. We call them “the market system”, “bureaucracy” and even “electoral democracy”. The true Singularity began at least two centuries ago with the industrial revolution, when human society was transformed by vast inhuman forces. Markets and bureaucracies seem familiar, but they are actually enormous, impersonal distributed systems of information-processing that transmute the seething chaos of our collective knowledge into useful simplifications.
Read the rest of this entry »
If you can't make up your mind what to do about something, then in French you would say "je suis partagé": I'm torn or divided over it. You can't decide what to do about it. You can't make up your mind whether to be pleased or angry with something. But the verb "partager" means "to share". So how do we get from "share" to "torn"?
Etymology tells us that partager is from partage + -er, i.e., Displaced partir in the sense of "to share, to divide", e.g.,
Nous allons partager les bénéfices ― We are going to share the benefits
(source)
My attention was drawn (see below) to this subject by the following editorial in today's The Yomiuri Shimbun:
As Words Constantly Evolve, Let’s Share Them Across Generations
(9/30/23)
Read the rest of this entry »
The news is flooded with stories about Hui Ka Yan 许家印 (MSM Xǔ Jiāyìn), one of China's wealthiest individuals, Chairman and Party Committee secretary of Evergrande Group, the mega real estate corporation that is currently going belly up, being arrested on suspicion of "illegal crimes". That expression sounded so strange that I had to find out what the Chinese expression was.
Turns out that it is "wéifǎ fànzuì 违法犯罪". Since this phrase occurs frequently in Chinese texts (221,000,000 ghits), it is a firmly established expression in the common legal lexicon of China. It is not a slipup. Furthermore, the English translation "illegal crime" is frequently met in official Chinese media accounts.
Read the rest of this entry »
Several people have sent me pointers to the linguistically-themed 9/27/2023 NYT crossword puzzle. For some discussion by Sam Corbin, see "Talk, Talk, Talk", NYT 9/26/2023 ("Scott Koenig puts silly thoughts to bed with a clever crossword"), which includes a quotation from the puzzle's author:
I first learned about Professor Chomsky as an undergraduate linguistics minor. The man has been a public intellectual and an absolute legend in the field for more than seven decades, and still remains active today, earlier this year penning a guest opinion essay contrasting ChatGPT’s approach to language with that of a human. (I’d like to call special attention to the wonderfully clever title of the paper that the essay references.)
[Spoiler alert: a solved version of the puzzle is presented after the fold…]
Read the rest of this entry »
Not Chinese. Do you understand?
This has long been a cabbage of contention, but make no mistake about it: fermented kimchee / kimchi (gimchi 김치 (IPA [kim.tɕʰi]) (lit., "soaked [in their own juices of fermentation] vegetables") is not the same thing as pickled paocai / pao tsai 泡菜 (lit., "soaked [in brine] vegetables").
Kimchee and paocai are made differently, have different ingredients and spices, and taste different. To call "kimchee" "paocai" would be like calling "wine" (pútáojiǔ 葡萄酒) "beer" (píjiǔ 啤酒).
Linguistically, kimchee has its own pedigree, of which I will here give an extended account.
Borrowed from Korean 김치 (gimchi), ultimately composed within Korea of Chinese-derived morphemes 沉 (chén, “submerged, soaked”) and 菜 (cài, “vegetable”), i.e. "fermented vegetable". Doublet of kimuchi.
Read the rest of this entry »
An interesting recent paper — Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans,"The Reversal Curse: LLMs trained on 'A is B' fail to learn 'B is A'", arXiv.org 9/21/2023. The abstract:
We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the reverse direction “B is A”. This is the Reversal Curse. For instance, if a model is trained on “Olaf Scholz was the ninth Chancellor of Germany”, it will not automatically be able to answer the question, “Who was the ninth Chancellor of Germany?”. Moreover, the likelihood of the correct answer (“Olaf Scholz”) will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if “A is B” occurs, “B is A” is more likely to occur).
We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as “Uriah Hawthorne is the composer of Abyssal Melodies” and showing that they fail to correctly answer “Who composed Abyssal Melodies?”. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT- 3.5 and GPT-4) on questions about real-world celebrities, such as “Who is Tom Cruise’s mother? [A: Mary Lee Pfeiffer]” and the reverse “Who is Mary Lee Pfeiffer’s son?”. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse.
Code is available at:
https://github.com/lukasberglund/reversal_curse
Read the rest of this entry »