Archive for Phonetics and phonology

Historical dialectology and the Poetry Classic

[This is a guest post by John Carlyle written in response to the following comment by E. Bruce Brooks to "Similes for female pulchritude in an ancient Chinese poem" (7/1/20):

The formation of the Shr* corpus is currently under serious study, and it can be said with some certainty at this preliminary stage that this particular poem was added to the growing Shr collection at the end of the 05c. How much older it may be, in its own country (Wei) will depend on scrutiny of its dialect position: some of the poems from that area show traces of (original) local pronunciation; others do not. Stay tuned.

*Shījīng 詩經, aka Poetry Classic, Classic of Poetry, Shijing, Shih-ching, Book of Songs, Book of Odes, Odes, or Poetry.]

   There is justification that Wey's 衛 dialect position might suggest something about the age of some of the poems in the Wey airs. The dialect position of Wey is better understood for the later period. What that might suggest about earlier poetry is still not clear. I'll try to give a quick summary of what we know so far.

   At least by the time of Fangyan 《方言》, Wey belonged to an eastern group of Chinese dialects. The exact limits of this eastern group are not entirely settled nor are the phonological features shared by the group since studies of Fangyan are primarily lexical. Since the time of Lin Yutang's (1927) first approximation of Fangyan dialect boundaries, the dialects of Wey and Song have been grouped together. Later scholars also included neighboring states like Qi (but not "Eastern Qi") and Lu. More recently, Matsue Takashi (1999, 2006, 2013) argues that the eastern group's boundaries extent as far as Chen and that the dialect of Chen was a transitional dialect between the eastern and southern groups due to Chu incursion (2006, 2013).

Read the rest of this entry »

Comments (12)


If you're looking for words with lots of consonants and few or seemingly no vowels, try Eastern Europe, especially Czechia.

I have a friend named Stu Cvrk, and I asked him the story of his surname and how to pronounce it.  Here's what he told me:

It is Czech. The Czech pronunciation is "tsverk". My grandparents Americanized it a bit to make it easier to say, as we now pronounce it "swerk."

The story of its derivation according to family lore is this:

Read the rest of this entry »

Comments (86)

Welp, sup, yep, yup, nope

This morning, someone sent me a message that began, "Welp, at least word boundaries are respected…".  I had no idea what he meant by the first syllable.  It didn't even seem like a word to me.  Or, I thought, perhaps it's a typo for "well".

Still, I was curious, so I looked around a bit, and found this entry in Merriam-Webster:

"Yes, 'Welp' Is a Real Word:  'Welp' is over 70 years old"

Update: This word was added in March 2018.

Social media is a place where informal language flourishes, which means that lexicographers get to chronicle the exploits of words that don't have much written use in edited prose—words like welp.

Chicago Tribune Tweet:  Welp, here comes the 1st accumulating snowfall of this winter….

Yep, the Chicago Tribune used welp. And yet:

Doug Lambert Tweet:  "Welp" is not a word….

Read the rest of this entry »

Comments (104)

Khmer historical phonology

[This is a guest post by John Whitman]

I have a Thai student writing a dissertation on Khmer historical phonology who wrote a qualifying paper using the Zhenla Fengtuji 真臘風土記, a late 13th century gazetteer on Cambodia written by one Zhou Daguan, who was sent to the Angkor court as an emissary. The most cited source on this text is a 1951 translation by Pelliot There is a more recent English translation by Harris (2007), but it relies on Pelliot for linguistic matters. Pelliot identifies and transcribes 37 of the 44 Khmer words in the text.

Like Chinese (but probably slightly later), Khmer was undergoing loss of its voicing distinction in obstruents, but in a different way: Old Khmer voiceless obstruents became implosives, and voiced obstruents voiceless. For reasons that he doesn’t explain very well, Pelliot assumed that Zhou was using early Mandarin values for his Chinese transcription characters, with aspirated Chinese initials representing Old Khmer voiceless initials, and unaspirated initials to represent OK voiced initials. This leads to chaotic correspondences with the Khmer material.

Read the rest of this entry »

Comments (11)

Rire la Rémumligne!

Phonetics to the rescue (for francophones only, alas):

Read the rest of this entry »

Comments (7)

Bats in Chinese language and culture: Early Sinitic reconstructions

The May 2020 issue of a scientific journal, Emerging Infectious Diseases, shows a rank badge of Qing Dynasty officialdom.  There are five bats in this piece of ornate embroidery (can you spot them?):

Artist Unknown. Rank Badge with Leopard, Wave and Sun Motifs, late 18th century. Silk, metallic thread. 10 3/4 in x 11 1/4 in / 27.31 cm x 28.57 cm. Public domain digital image courtesy of the Metropolitan Museum of Art, New York, NY, USA.; Bequest of William Christian Paul, 1929. Accession no.30.75.1025.


Read the rest of this entry »

Comments (7)

Chinese transcriptions of Indic terms in Buddhist translations of the 2nd c. AD

A fuller and more specific version of the title of this post would be "Chinese transcriptions of Indic terms in the translations of An Shigao (Chinese: 安世高; pinyin: Ān Shìgāo; Wade–Giles: An Shih-kao, Korean: An Sego, Japanese: An Seikō, Vietnamese: An Thế Cao) (fl. 148-180 CE) and Lokakṣema (लोकक्षेम, Chinese: 支婁迦讖; pinyin: Zhī Lóujiāchèn) (fl. 147-189)".

With the collaboration of Jan Nattier, Nathan Hill was able to digitize some data from Han Buddhist transcriptions back in 2017 and has now published them as a dataset on Zenodo:

Hill, Nathan, Nattier, Jan, Granger, Kelsey, & Kollmeier, Florian. (2020). Chinese transcriptions of Indic terms in the translations of Ān Shìgāo 安世高 and Lokakṣema 支婁迦讖 [Data set]. Zenodo.

Read the rest of this entry »

Comments (5)

The historical phonology of "Han", the main Chinese ethnonym

[VHM:  This is a guest post by Chris Button.  It will be primarily of interest to specialists in the phonological history of Sinitic.  Since there are quite a few such scholars on Language Log, I expect that it will occasion the usual lively debate that follows posts on such subjects.  It will also undoubtedly be of interest to historical phonologists in general, as well as to a broad spectrum of Sinologists and their colleagues focusing on other Asian cultures and languages.]

I've been thinking about the etymological associations of Hàn 漢. It's often reconstructed with an aspirated coronal nasal as *hn-, in spite of the Middle Chinese x- then being somewhat unexpected (Baxter and Sagart put it down to dialects), largely on the basis of the *n- in 難. But its etymological association with 艱 and its velar *k- make this problematic. A regular source of MC x- would be *hŋ- which then at least would be a velar onset to parallel *k-. The *n- in 難 could perhaps be put down to some sort of assimilation of *ŋ- with the *-n coda (one might compare 般 *pán < *pám where there is dissimilation of the coda unlike in its phonetic 凡 *bàm) . At the very least, 漢 most likely went back to something like *hŋáns and then *xáns with a velar onset and the -s eventually becoming qu-sheng. An alternative option is rhinoglottophilia whereby a *ʔ became *n- as attested in cases like 憂 *ʔə̀w and 獶(夒) *nə́w a I mentioned here.

Read the rest of this entry »

Comments (26)

"One I first saw": more on homophonically induced typing errors

A little over a week ago, I described how I mistyped "stalk" for "stock".  That led to a vigorous discussion of precisely how people pronounce "stalk".  (As a matter of fact, in my own idiolect I do pronounce "stock" and "stalk" identically.)  See:

"Take stalk of: thoughts on philology and Sinology" (3/29/20)

I just now typed "One I first saw…" when I meant "When I first saw…".

Read the rest of this entry »

Comments (46)

Take stalk of: thoughts on philology and Sinology

In a note I was composing to some friends, I just wrote "let's take stalk of…", was surprised and smiled, corrected myself, and continued writing.

But then I paused to reflect….

Read the rest of this entry »

Comments (79)

More on Persian kinship terms; "daughter" and the laryngeals

Following up on "Turandot and the deep Indo-European roots of 'daughter'" (3/16/20), John Mullan (student of Arabic, master calligrapher, and expert chorister) writes:

As someone who’s studied a bit of Persian and a few other Indo European languages, I’ve always found it odd that most all of the kinship terms in Persian—mādar, pedar, barādar, dokhtar, pesar (cf. ‘puer’ in Latin and ‘pais’ in Greek, I assume)—have easy equivalents to my ear, /except/ ‘khāhar,’ sister. Wiktionary suggests it’s still related.

One quite recent finding of mine in PIE. As you probably know, 'Baghdad' is not an Arabic name, but a Persian one. It's composed of 'Bagh,' God (not the word used today), and 'Dād,' Given/Gift. Now I'm familiar with Bagh, ultimately, from listening to way too much Russian choral music and hearing Church Slavonic 'Bozhe.' Similarly, in the deep corners of my Greek student mind I remember names like 'Mithradates'—gift of Mithra or something along those lines—popping up as rulers/governors of city states in Classical Anatolia. What I /didn't/ pick out was the exact same construct as 'Baghdad' hiding in front of my eyes all along. There are two active NBA players named 'Bogdan(ović).' It's the same name as the city, only it's popped up in Serbo-Croatian. Funny stuff.

Read the rest of this entry »

Comments (20)

Words without vowels

Our recent discussions about syllabicity ("Readings" below) made me wonder whether it's possible to have syllables, words, and whole sentences without vowels.  That led me to this example from Nuxalk on Omniglot:


clhp'xwlhtlhplhhskwts' / xłp̓χʷłtłpłłskʷc̓

IPA transcription



Then he had had in his possession a bunchberry plant.

This is an example of a word with no vowels, something that is quite common in Nuxalk.

Souce: Nater, Hank F. (1984). The Bella Coola Language. Mercury Series; Canadian Ethnology Service (No. 92). Ottawa: National Museums of Canada.

Read the rest of this entry »

Comments (35)

English syllable detection

In "Syllables" (2/24/2020), I showed that a very simple algorithm finds syllables surprisingly accurately, at least in good quality recordings like a soon-to-published corpus of Mandarin Chinese. Commenters asked about languages like Berber and Salish, which are very far from the simple onset+nucleus pattern typical of languages like Chinese, and even about English, which has more complex syllable onsets and codas as well as many patterns where listeners and speakers disagree (or are uncertain) about the syllable count.

I got a few examples of Berber and Salish, courtesy of Rachid Ridouane and Sally Thomason, and will report on them shortly. But it's easy to run the same program on a well-studied and easily-available English corpus, namely TIMIT, which contains 6300 sentences, 10 from each of 630 speakers. This is small by modern standards, but plenty large enough for test purposes. So for this morning's Breakfast Experiment™, I tested it.

Read the rest of this entry »

Comments (7)