A fuller and more specific version of the title of this post would be "Chinese transcriptions of Indic terms in the translations of An Shigao (Chinese: 安世高; pinyin: Ān Shìgāo; Wade–Giles: An Shih-kao, Korean: An Sego, Japanese: An Seikō, Vietnamese: An Thế Cao) (fl. 148-180 CE) and Lokakṣema (लोकक्षेम, Chinese: 支婁迦讖; pinyin: Zhī Lóujiāchèn) (fl. 147-189)".
With the collaboration of Jan Nattier, Nathan Hill was able to digitize some data from Han Buddhist transcriptions back in 2017 and has now published them as a dataset on Zenodo:
Hill, Nathan, Nattier, Jan, Granger, Kelsey, & Kollmeier, Florian. (2020). Chinese transcriptions of Indic terms in the translations of Ān Shìgāo 安世高 and Lokakṣema 支婁迦讖 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3757095
[VHM: This is a guest post by Chris Button. It will be primarily of interest to specialists in the phonological history of Sinitic. Since there are quite a few such scholars on Language Log, I expect that it will occasion the usual lively debate that follows posts on such subjects. It will also undoubtedly be of interest to historical phonologists in general, as well as to a broad spectrum of Sinologists and their colleagues focusing on other Asian cultures and languages.]
I've been thinking about the etymological associations of Hàn 漢. It's often reconstructed with an aspirated coronal nasal as *hn-, in spite of the Middle Chinese x- then being somewhat unexpected (Baxter and Sagart put it down to dialects), largely on the basis of the *n- in 難. But its etymological association with 艱 and its velar *k- make this problematic. A regular source of MC x- would be *hŋ- which then at least would be a velar onset to parallel *k-. The *n- in 難 could perhaps be put down to some sort of assimilation of *ŋ- with the *-n coda (one might compare 般 *pán < *pám where there is dissimilation of the coda unlike in its phonetic 凡 *bàm) . At the very least, 漢 most likely went back to something like *hŋáns and then *xáns with a velar onset and the -s eventually becoming qu-sheng. An alternative option is rhinoglottophilia whereby a *ʔ became *n- as attested in cases like 憂 *ʔə̀w and 獶(夒) *nə́w a I mentioned here.
A little over a week ago, I described how I mistyped "stalk" for "stock". That led to a vigorous discussion of precisely how people pronounce "stalk". (As a matter of fact, in my own idiolect I do pronounce "stock" and "stalk" identically.) See:
As someone who’s studied a bit of Persian and a few other Indo European languages, I’ve always found it odd that most all of the kinship terms in Persian—mādar, pedar, barādar, dokhtar, pesar (cf. ‘puer’ in Latin and ‘pais’ in Greek, I assume)—have easy equivalents to my ear, /except/ ‘khāhar,’ sister. Wiktionary suggests it’s still related.
One quite recent finding of mine in PIE. As you probably know, 'Baghdad' is not an Arabic name, but a Persian one. It's composed of 'Bagh,' God (not the word used today), and 'Dād,' Given/Gift. Now I'm familiar with Bagh, ultimately, from listening to way too much Russian choral music and hearing Church Slavonic 'Bozhe.' Similarly, in the deep corners of my Greek student mind I remember names like 'Mithradates'—gift of Mithra or something along those lines—popping up as rulers/governors of city states in Classical Anatolia. What I /didn't/ pick out was the exact same construct as 'Baghdad' hiding in front of my eyes all along. There are two active NBA players named 'Bogdan(ović).' It's the same name as the city, only it's popped up in Serbo-Croatian. Funny stuff.
Our recent discussions about syllabicity ("Readings" below) made me wonder whether it's possible to have syllables, words, and whole sentences without vowels. That led me to this example from Nuxalk on Omniglot:
Sample
clhp'xwlhtlhplhhskwts' / xłp̓χʷłtłpłłskʷc̓
IPA transcription
xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ
Translation
Then he had had in his possession a bunchberry plant.
This is an example of a word with no vowels, something that is quite common in Nuxalk.
Souce: Nater, Hank F. (1984). The Bella Coola Language. Mercury Series; Canadian Ethnology Service (No. 92). Ottawa: National Museums of Canada.
In "Syllables" (2/24/2020), I showed that a very simple algorithm finds syllables surprisingly accurately, at least in good quality recordings like a soon-to-published corpus of Mandarin Chinese. Commenters asked about languages like Berber and Salish, which are very far from the simple onset+nucleus pattern typical of languages like Chinese, and even about English, which has more complex syllable onsets and codas as well as many patterns where listeners and speakers disagree (or are uncertain) about the syllable count.
I got a few examples of Berber and Salish, courtesy of Rachid Ridouane and Sally Thomason, and may report on them shortly. But it's easy to run the same program on a well-studied and easily-available English corpus, namely TIMIT, which contains 6300 sentences, 10 from each of 630 speakers. This is small by modern standards, but plenty large enough for test purposes. So for this morning's Breakfast Experiment™, I tested it.
From a physical point of view, syllables reflect the fact that speaking involves oscillatory opening and closing of the vocal tract at a frequency of about 5 Hz, with associated modulation of acoustic amplitude. From an abstract cognitive point of view, each language organizes phonological features into a sort of grammar of syllabic structures, with categories like onsets, nuclei and codas. And it's striking how directly and simply the physical oscillation is related to the units of the abstract syllabic grammar — there's no similarly direct and simple physical interpretation of phonological features and segments.
This direct and simple relationship has a psychological counterpart. Syllables seems to play a central role in child language acquisition, with words following a gradual development from very simple syllable patterns, through closer and closer approximations to adult phonological and phonetic norms. And as Lila Gleitman and Paul Rozin observed in 1973 ("Teaching reading by use of a syllabary", Reading Research Quarterly), "It is suggested on the basis of research in speech perception that syllables are more natural units than phonemes, because they are easily pronounceable in isolation and easy to recognize and to blend."
In 1975, Paul Mermelstein published an algorithm for "Automatic segmentation of speech into syllabic units", based on "assessment of the significance of a loudness minimum to be a potential syllabic boundary from the difference between the convex hull of the loudness function and the loudness function itself." Over the years, I've found that even simpler methods, based on selecting peaks in a smoothed amplitude contour, also work quite well (see e.g. Margaret Fleck and Mark Liberman, "Test of an automatic syllable peak detector", JASA 1982; and slides on Dinka tone alignment from EFL 2015).
In this post, I'll present a simple language-independent syllable detector, and show that it works pretty well. It's not a perfect algorithm or even an especially good one. The point is rather that "syllables" are close enough to being amplitude peaks that the results of a simple-minded, language-independent algorithm are surprisingly good, so that maybe self-supervised adaptation of a more sophisticated algorithm could lead in interesting directions.
Back in 2018 your post Pinyin for phonetic annotation planted an idea in my head that I’ve been gradually expanding ever since. I am now at a stage where I routinely create annotated Chinese text for myself; this (pdf) is what one such document looks like.
In "On beyond the (International Phonetic) Alphabet", 4/19/2018, I discussed the gradual lenition of /t/ in /sts/ clusters, as in the ending of words like "motorists" and "artists". At one end of the spectrum we have a clear, fully-articulated [t] sound separating two clear [s] sounds, and at the other end we have something that's indistinguishable from a single [s] in the same context. I ended that post with these thoughts:
My own guess is that the /sts/ variation discussed above, like most forms of allophonic variation, is not symbolically mediated, and therefore should not be treated by inventing new phonetic symbols (or adapting old ones). Rather, it's part of the process of phonetic interpretation, whereby symbolic (i.e. digital) phonological representations are related to (continuous, analog) patterns of articulation and sound.
It would be a mistake to think of such variation as the result of universal physiological and physical processes: though the effects are generally in some sense natural, there remain considerable differences across languages, language varieties, and speaking styles. And of course the results tend to become "lexicalized" and/or "phonologized" over time — this is one of the key drivers of linguistic change.
Similar phenomena are seriously understudied, even in well-documented languages like English. Examine a few tens of seconds of (even relatively careful and formal) speech, and you'll come across some examples. To introduce another case, listen to these eight audio clips, and ask yourself what sequences of phonetic segments they represent:
I've eaten in this hot pot (huǒguō / WG huo3-kuo1 / IPA [xwò.kwó] 火锅 / 火鍋) restaurant at 3717 Chestnut St. on a number of occasions, and each time I go, I am struck by the creative sign out front:
When confirming reservations on the phone with clerical folks in certain southeast Asian countries, Paul Midler noticed they often used variations of the NATO phonetic alphabet. “D for Dog” and “L for Love” seemed to be a couple consistent additions. Passing through a travel agency in Thailand, he saw this: