A fuller and more specific version of the title of this post would be "Chinese transcriptions of Indic terms in the translations of An Shigao (Chinese: 安世高; pinyin: Ān Shìgāo; Wade–Giles: An Shih-kao, Korean: An Sego, Japanese: An Seikō, Vietnamese: An Thế Cao) (fl. 148-180 CE) and Lokakṣema (लोकक्षेम, Chinese: 支婁迦讖; pinyin: Zhī Lóujiāchèn) (fl. 147-189)".
With the collaboration of Jan Nattier, Nathan Hill was able to digitize some data from Han Buddhist transcriptions back in 2017 and has now published them as a dataset on Zenodo:
Hill, Nathan, Nattier, Jan, Granger, Kelsey, & Kollmeier, Florian. (2020). Chinese transcriptions of Indic terms in the translations of Ān Shìgāo 安世高 and Lokakṣema 支婁迦讖 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3757095
In some on-going research on linguistic features relating to clinical diagnosis and tracking, we've been looking at "lexical diversity". It's easy to measure the rate of vocabulary display — you can just use a type-token graph, which shows the count of distinct words ("types") against the count of total words ("tokens"). It's less obvious how to turn such a curve into a single number that can be compared across sources — for a survey of some alternative measures, see e.g. Scott Jarvis, "Short texts, best-fitting curves and new measures of lexical diversity", Language Testing 2002; and for the measure that we've settled on, see Michael Covington and Joe McFall, "Cutting the Gordian knot: The moving-average type–token ratio (MATTR)", Journal of quantitative linguistics 2010. More on that later.
For now, I want to make a point that depends only on type-token graphs. Over time, I've accumulated a small private digital corpus of more than 100 English-language fiction titles, from Tristram Shandy forward to 2019. It's clear that different authors have different characteristic rates of vocabulary display, and for today's post, I want to present the authors in my collection with the highest and lowest characteristic rates.
Yevgeny Basovskaya, a specialist on public speech at Moscow’s State University of the Humanities, says that the disease has had a "radical" influence on the way Russians speak their language. This begins with the word coronavirus, which has an "a" in the middle. This is in "in complete violation of Russian orthographic rules".
Every day, I get several talk announcements from the various mailing lists that I subscribe to, which represent a rich array of disciplinary sources: linguistics, computer science, anthropology, sociology, communications, math, literary studies, marketing, and so on. Usually I can figure out from the title what the presentation is going to be about — but sometimes my first guess is wrong in an interesting way.
[VHM: This is a guest post by Chris Button. It will be primarily of interest to specialists in the phonological history of Sinitic. Since there are quite a few such scholars on Language Log, I expect that it will occasion the usual lively debate that follows posts on such subjects. It will also undoubtedly be of interest to historical phonologists in general, as well as to a broad spectrum of Sinologists and their colleagues focusing on other Asian cultures and languages.]
I've been thinking about the etymological associations of Hàn 漢. It's often reconstructed with an aspirated coronal nasal as *hn-, in spite of the Middle Chinese x- then being somewhat unexpected (Baxter and Sagart put it down to dialects), largely on the basis of the *n- in 難. But its etymological association with 艱 and its velar *k- make this problematic. A regular source of MC x- would be *hŋ- which then at least would be a velar onset to parallel *k-. The *n- in 難 could perhaps be put down to some sort of assimilation of *ŋ- with the *-n coda (one might compare 般 *pán < *pám where there is dissimilation of the coda unlike in its phonetic 凡 *bàm) . At the very least, 漢 most likely went back to something like *hŋáns and then *xáns with a velar onset and the -s eventually becoming qu-sheng. An alternative option is rhinoglottophilia whereby a *ʔ became *n- as attested in cases like 憂 *ʔə̀w and 獶(夒) *nə́w a I mentioned here.
I live in a city with a large immigrant population in general and a large Bosnian population in particular (Utica, NY [VHM: population around 60,000; between Syracuse and Schenectady]). As such, I see "BiH" bumper stickers once in a while on the road. Most of the Bosnian population either came during the breakup of Yugoslavia or are children of those immigrants, so they are probably following the American trend of putting round stickers on your car for things you like or identify with, rather than the European usage of using them to identify country of origin.