The whole world is now thoroughly familiar with the name "Wuhan", whereas four months ago, only a small number of people outside of China would have heard of it. Since, two days ago, I posted about Dutch curses, many of which just so happen to be linked to diseases, I am prompted to dust off an old post that is about a colorful curse from Wuhan, which, by the way, is famous among all Chinese cities for the proclivity of its inhabitants to indulge in sharp-tongued imprecations at the slightest provocation. I myself have been witness to their talent in this art, at which the women are especially adept.
Thank you for your great explanation of the reasons behind the famous Kennedy "crisis" misquote. When I was in high school, I had a friend who was Chinese and spoke Mandarin fluently, who explained it to my US History class after the teacher quoted Kennedy. That was over 20 years ago and I remembered that his quote was wrong, but could not remember the explanation I was given well enough to explain it to someone else.
A fuller and more specific version of the title of this post would be "Chinese transcriptions of Indic terms in the translations of An Shigao (Chinese: 安世高; pinyin: Ān Shìgāo; Wade–Giles: An Shih-kao, Korean: An Sego, Japanese: An Seikō, Vietnamese: An Thế Cao) (fl. 148-180 CE) and Lokakṣema (लोकक्षेम, Chinese: 支婁迦讖; pinyin: Zhī Lóujiāchèn) (fl. 147-189)".
With the collaboration of Jan Nattier, Nathan Hill was able to digitize some data from Han Buddhist transcriptions back in 2017 and has now published them as a dataset on Zenodo:
Hill, Nathan, Nattier, Jan, Granger, Kelsey, & Kollmeier, Florian. (2020). Chinese transcriptions of Indic terms in the translations of Ān Shìgāo 安世高 and Lokakṣema 支婁迦讖 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3757095
In some on-going research on linguistic features relating to clinical diagnosis and tracking, we've been looking at "lexical diversity". It's easy to measure the rate of vocabulary display — you can just use a type-token graph, which shows the count of distinct words ("types") against the count of total words ("tokens"). It's less obvious how to turn such a curve into a single number that can be compared across sources — for a survey of some alternative measures, see e.g. Scott Jarvis, "Short texts, best-fitting curves and new measures of lexical diversity", Language Testing 2002; and for the measure that we've settled on, see Michael Covington and Joe McFall, "Cutting the Gordian knot: The moving-average type–token ratio (MATTR)", Journal of quantitative linguistics 2010. More on that later.
For now, I want to make a point that depends only on type-token graphs. Over time, I've accumulated a small private digital corpus of more than 100 English-language fiction titles, from Tristram Shandy forward to 2019. It's clear that different authors have different characteristic rates of vocabulary display, and for today's post, I want to present the authors in my collection with the highest and lowest characteristic rates.
Yevgeny Basovskaya, a specialist on public speech at Moscow’s State University of the Humanities, says that the disease has had a "radical" influence on the way Russians speak their language. This begins with the word coronavirus, which has an "a" in the middle. This is in "in complete violation of Russian orthographic rules".
Every day, I get several talk announcements from the various mailing lists that I subscribe to, which represent a rich array of disciplinary sources: linguistics, computer science, anthropology, sociology, communications, math, literary studies, marketing, and so on. Usually I can figure out from the title what the presentation is going to be about — but sometimes my first guess is wrong in an interesting way.
[VHM: This is a guest post by Chris Button. It will be primarily of interest to specialists in the phonological history of Sinitic. Since there are quite a few such scholars on Language Log, I expect that it will occasion the usual lively debate that follows posts on such subjects. It will also undoubtedly be of interest to historical phonologists in general, as well as to a broad spectrum of Sinologists and their colleagues focusing on other Asian cultures and languages.]
I've been thinking about the etymological associations of Hàn 漢. It's often reconstructed with an aspirated coronal nasal as *hn-, in spite of the Middle Chinese x- then being somewhat unexpected (Baxter and Sagart put it down to dialects), largely on the basis of the *n- in 難. But its etymological association with 艱 and its velar *k- make this problematic. A regular source of MC x- would be *hŋ- which then at least would be a velar onset to parallel *k-. The *n- in 難 could perhaps be put down to some sort of assimilation of *ŋ- with the *-n coda (one might compare 般 *pán < *pám where there is dissimilation of the coda unlike in its phonetic 凡 *bàm) . At the very least, 漢 most likely went back to something like *hŋáns and then *xáns with a velar onset and the -s eventually becoming qu-sheng. An alternative option is rhinoglottophilia whereby a *ʔ became *n- as attested in cases like 憂 *ʔə̀w and 獶(夒) *nə́w a I mentioned here.